<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">BaselineAvengers at PAN 2024: Often-Forgotten Baselines for LLM-Generated Text Detection Notebook for the PAN Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ludwig</forename><surname>Lorenz</surname></persName>
							<email>ludwig.david.lorenz@uni-weimar.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Bauhaus-Universität Weimar</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Funda</forename><forename type="middle">Zeynep</forename><surname>Aygüler</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Bauhaus-Universität Weimar</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ferdinand</forename><surname>Schlatt</surname></persName>
							<email>ferdinand.schlatt@uni-jena.de</email>
							<affiliation key="aff1">
								<orgName type="institution">Friedrich-Schiller-Universität Jena</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nailia</forename><surname>Mirzakhmedova</surname></persName>
							<email>nailia.mirzakhmedova@uni-weimar.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Bauhaus-Universität Weimar</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">BaselineAvengers at PAN 2024: Often-Forgotten Baselines for LLM-Generated Text Detection Notebook for the PAN Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">D68766F76CBEF562414F7DE468D9BE17</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:58+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Authorship verification, Logistic Regression, Tf-Idf Vectorizer (N. Mirzakhmedova) 0009-0005-2410-9005 (L. Lorenz)</term>
					<term>0009-0009-6160-5074 (F. Z. Aygüler)</term>
					<term>0000-0002-6032-909X (F. Schlatt)</term>
					<term>0000-0002-8143-1405 (N. Mirzakhmedova)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The rapid advancements of Large Language Models (LLMs) make it increasingly challenging to distinguish between human-written and machine-generated texts, which raises concerns regarding their potential misuse. This paper describes our submission to the PAN: Generative AI Authorship 2024 verification task, which involves identifying the human-authored text from a pair of texts, one written by a human and the other by an LLM. Our approach is based on the assumption that LLMs use a distinct vocabulary. We propose a simple and interpretable method using non-neural machine learning classifiers with lexical features. We evaluate several classification models and feature sets on a validation split and find logistic regression and SVM models using tf-idf feature vectors to be highly effective. Our submissions offer a more effective alternative to all baseline approaches while also being more efficient and interpretable.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>With the rapid advancements of Large Language Models (LLMs), distinguishing between human-written and machine-generated texts becomes more and more challenging. As a result, the need for reliable authorship verification methods becomes even more pressing. The ability to distinguish between human-written and machine-generated texts is crucial for various applications, such as plagiarism detection <ref type="bibr" target="#b0">[1]</ref>, forensic linguistics <ref type="bibr" target="#b1">[2]</ref>, and content moderation <ref type="bibr" target="#b2">[3]</ref>. Multiple approaches have been proposed to address this problem, including complex feature engineering and stylometric analysis, linguistic analysis, and machine learning-based methods <ref type="bibr" target="#b3">[4]</ref>. However, the increasing sophistication of LLMs poses a significant challenge to existing authorship verification methods. In response to this challenge, PAN <ref type="bibr" target="#b4">[5]</ref> introduced the Voight-Kampff Generative AI Authorship Verification task to test the feasibility of distinguishing between human-written and LLM-generated texts <ref type="bibr" target="#b5">[6]</ref>.</p><p>In this paper, we present our submission to the PAN shared task, where we address the generative authorship verification problem using non-neural machine learning classifiers based on lexical features. Our decision to employ non-neural models is motivated by the observation that simple models are often overlooked in recent research, despite their proven effectiveness and their ability to serve as efficient baselines for comparison with more complex models <ref type="bibr" target="#b6">[7]</ref>. Moreover, our emphasis on lexical features is based on the hypothesis that LLMs use a distinct vocabulary, which may be sufficient to differentiate between human-authored and machine-generated texts.</p><p>In our work, we experimented with three classification models and two lexical feature sets. We found logistic regression and SVM models using tf-idf feature vectors are highly effective for the task. Motivated by the performance of our approach, we conducted a qualitative analysis of the most significant lexical features to test our hypothesis that LLMs employ a distinct vocabulary. Our analysis revealed that there is a small set of words that can indicate whether a text is written by an LLM. Overall, our approach offers a more effective alternative to all baseline approaches while also being more efficient and interpretable.</p><p>The remainder of this paper is structured as follows. In Section 2, we provide background information on the PAN: Generative AI Authorship Verification task and review the related work. In Section 3, we describe our system and the components of our submission. In Section 4, we present the results of our submission. Section 5 provides a qualitative analysis of the most important lexical features. We conclude with a discussion of our results in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>Task Description The PAN: Generative AI Authorship Verification task is organized in collaboration with the Voight-Kampff Task at the ELOQUENT Lab in a builder-breaker style. PAN participants build systems to tell human and machine-generated texts apart, while ELOQUENT participants investigate novel text generation and obfuscation methods to avoid detection. The task is defined as follows:</p><p>Given two texts, one authored by a human, one by a machine: pick out the human.</p><p>More formally, given a pair of texts (𝑡 1 , 𝑡 2 ), one of which is written by a human and the other by an LLM, the system must output a confidence score 𝑠 ∈ [0.0, 1.0]. A score 𝑠 &lt; 0.5 indicates that text 𝑡 1 is believed to be human-authored, while a score 𝑠 &gt; 0.5 indicates that text 𝑡 2 is believed to be human-authored. A score of exactly 0.5 means the case is undecidable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset</head><p>The task participants were provided with a training dataset of 1,359 U.S. news articles. To ensure that the articles were human-authored, the task organizers collected the articles from Google News, focusing on the period before the release of GPT-3.5. The articles were summarized using GPT-4-Turbo, and the summaries were used as input for 13 downstream LLMs to generate new articles. The dataset consists of pairs of articles, one human-authored and one LLM-generated, and is split into training, validation, and test sets.</p><p>To further test the robustness of submissions, the task organizers provided additional test datasets, each applying a different obfuscation technique to the original test dataset. The obfuscation techniques include switching the text encoding, prompting the LLMs to generate German instead of English, using contrastive decoding, cropping the text to 35 words, etc. In total, 65 different test datasets were created by obfuscation, with ELOQUENT providing another five.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">System Overview</head><p>Scoring Function As follows from the task description (cf. Section 2), the generative authorship verification task is formulated as a pairwise classification problem. Given a pair of texts (𝑡 1 , 𝑡 2 ), the goal is to determine which text is human-authored. However, we approach this task as a pointwise binary classification problem. That is, given a text 𝑡 𝑖 , we aim to predict the probability 𝑃 (human|𝑡 𝑖 ) that the text is human-authored.</p><p>By definition, the probability 𝑃 (human|𝑡 𝑖 ) is equal to 1 − 𝑃 (LLM|𝑡 𝑖 ). Given that we need to predict the probability that 𝑡 1 is human-authored while taking into account 𝑡 2 , we average the probabilities of the first text being written by a human and the second text not being written by a human to obtain the final score 𝑠(human|𝑡 1 ): </p><formula xml:id="formula_0">𝑠(human|𝑡 1 ) = 𝑃 (human|𝑡 1 ) + 1 − 𝑃 (LLM|𝑡 2 ) 2<label>(1)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Feature Extraction</head><p>To capture the distinctive vocabulary of LLM-generated texts, we use a bag-ofwords model to represent the texts. We experiment with two feature sets: term frequencies and tf-idf values for all tokens in the training dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Classification Models</head><p>We experiment with three classifiers: Multinomial Naive Bayes, logistic regression, and a support vector machine (SVM) with a linear kernel. We test the classifiers with both term frequencies and tf-idf values to identify the most effective model and feature combination.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model and Feature Selection</head><p>To evaluate the performance of the different models and feature sets, we use 100 samples from the training dataset as a validation split. The results of the validation are used to select the most effective model and feature combination.</p><p>Table <ref type="table" target="#tab_0">1</ref> shows the accuracy achieved on the validation split for each model. Overall, logistic regression and SVM are more effective than multinomial Naive Bayes. The differences in effectiveness for different feature sets for logistic regression and SVM are minimal. Interestingly, the performance of multinomial naive Bayes is significantly better using raw term frequencies compared to tf-idf values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Evaluation Setup</head><p>The PAN: Generative AI Authorship Verification task employed the TIRA platform <ref type="bibr" target="#b7">[8]</ref> to ensure the reproducibility and comparability of submissions. The platform provides a standardized environment for running submissions and evaluates the submissions using the following metrics:</p><p>• ROC-AUC: The area under the ROC (Receiver Operating Characteristic) curve • Brier: The complement of the Brier score (mean squared loss) • C@1: A modified accuracy score that assigns non-answers (score = 0.5) the average accuracy of the remaining cases • F1: The harmonic mean of precision and recall • F0.5u: A modified F0.5 measure (precision-weighted F measure) that treats non-answers (score = 0.5) as false negatives • The arithmetic mean of all the metrics above.</p><p>The arithmetic mean of all metrics is used to rank the submissions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Baselines</head><p>The task organizers provided official baselines for comparison, which are based on the performance of various approaches to the task of authorship verification. The baselines include a simple text length classifier, PPMd Compression-based Cosine <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>, Authorship Unmasking <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref>, Binoculars <ref type="bibr" target="#b12">[13]</ref>, DetectLLM LRR and NPR <ref type="bibr" target="#b13">[14]</ref>, and DetectGPT <ref type="bibr" target="#b14">[15]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Evaluation Results</head><p>Table <ref type="table">2</ref> presents the evaluation results of our submissions to the task, along with the official baselines and summary statistics of all submissions. Our best performing submission (SVM) outperforms all official baselines across all metrics, with the other two submissions (Multinomial Naive Bayes and Logistic Regression) not outperforming only the Binoculars baseline for the algorithmic mean of all metrics (0.965 vs. 0.956 and 0.958 respectively).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Overview of the performance of our approaches, baselines, and the summary statistics of the performance of all submissions in the competition. We report ROC-AUC, Brier, C@1, F 1 , F 0.5𝑢 and their arithmetic mean. Table <ref type="table" target="#tab_1">3</ref> shows the summarized results averaged (arithmetic mean) over 10 obfuscated variants of the test dataset. Each dataset variant applies one potential technique to measure the robustness of authorship verification approaches (cf. Section 2). The results show that all our submissions are robust to the obfuscation techniques, as the performance does not drop significantly compared to the baseline approaches. For example, the minimum achieved score for our best submission (SVM) is 0.832, while the minimum score for the best baseline (Binoculars) is 0.342.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Approach</head><p>Overall, our approach demonstrates that simple and interpretable models can be highly effective for the task of generative authorship verification. The results suggest that the distinctive vocabulary used by LLMs can indeed be effectively captured using simple lexical features and machine learning classifiers. Moreover, our submissions showed to be robust to obfuscation techniques, making them a promising alternative to more complex and computationally expensive methods.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Qualitative Analysis</head><p>In addition to the quantitative evaluation of our submissions, we conducted a qualitative analysis of the most important lexical features identified by the models. This analysis aims to highlight key tokens that contribute to distinguishing between human-written and LLM-generated texts.</p><p>The implementation of the multinomial Naive Bayes model allows us to extract the log probabilities of each token belonging to the human-written and LLM-generated classes. By comparing these probabilities, we can identify the tokens that contribute most to the classification decision. We use the following equation to calculate the difference in log probabilities for each token 𝑤 𝑖 in the feature set:</p><formula xml:id="formula_1">log_diff(𝑤 𝑖 ) = log(𝑃 (𝑤 𝑖 |LLM)) − log(𝑃 (𝑤 𝑖 |human))<label>(2)</label></formula><p>The log difference values are then sorted in descending order to identify the tokens with the largest differences. The resulting values are interpreted as the importance of each token in distinguishing between human-written and LLM-generated texts. Positive values indicate higher probabilities for LLMgenerated texts, while negative values indicate higher probabilities for human-written texts. Figure <ref type="figure" target="#fig_0">1</ref> presents the top 50 tokens with the largest differences in log probabilities for the multinomial Naive Bayes model. Here, we observe that LLM-generated texts frequently use specific terms such as "article", "importance", "emphasized", "context", and "despite". These terms often relate to structured and formal writing, which is often characteristic of LLM-generated content. On the other hand, human-written texts show a higher probability of tokens related to everyday language and temporal expressions such as "told", "says", "asked", "wrote", and "really". These tokens indicate a more narrative and less formal style typical of human writing. The frequent use of days of the week such as "Wednesday", "Thursday", and "Friday" and terms like "afternoon" and "morning" in human-written texts can be attributed to their common use in chronological events or planning. Humans often refer to specific days when recounting events, discussing plans, or setting contexts within their narratives. This is particularly relevant in our news articles dataset, where providing temporal context is essential for accurate and engaging reporting. The word "told" is particularly prominent in human-written texts, as it is frequently used in direct and indirect speech, which is also common in news articles. In contrast, LLM-generated texts often prioritize structured content delivery and formal exposition over narrative elements, resulting in frequent use of terms such as "emphasized", "stating", and "highlights". The term "conclusion" is also prevalent in LLM-generated texts, indicating a structured and formal writing style that often includes a summary or final remarks, which is uncommon in human-written news articles. Figure <ref type="figure" target="#fig_1">2</ref> presents the top 20 most important tokens for identifying LLM-generated texts based on the coefficients assigned to them by the trained logistic regression and SMV models. Tokens with larger coefficients have a greater impact on the model's decision function. Similarly to the Naive Bayes model, some of the most notable tokens both in logistic regression and the SVM models include "significant", "article", "importance", "despite", "stating" and "conclusion". This suggests that LLM-generated texts often contain terms that convey formality, which might be less prevalent in human-written texts. The overlap in key tokens between the logistic regression and SVM models underlines the consistency of these patterns in distinguishing LLM-generated texts. The frequent appearance of the word "significant" in LLM-generated texts can be attributed to the tendency of language models to produce content that is polished and systematic. Language models are typically trained on large datasets that include a large amount of academic, technical, and professional writing. This extensive exposure to formal texts influences the models to emulate this style.</p><p>Our qualitative analysis supports the hypothesis that LLMs use a distinctive vocabulary that can be captured using lexical features. The presence of terms related to formality and structured discourse in LLM-generated texts contrasts with the more narrative and less formal vocabulary found in humanwritten texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this paper, we presented our submission to the PAN: Generative AI Authorship Verification task. Our approach is based on the assumption that LLMs use a particular vocabulary, which can be captured using lexical features. We experiment with three classifiers and two feature sets to identify the most effective model and feature combination. Our results show that logistic regression and SVM models using tf-idf feature vectors are highly effective for the task. We find that our submissions outperform all official baselines, demonstrating that simple and interpretable models can be more effective than complex and computationally expensive methods. Our qualitative analysis of the most important lexical features confirms that LLM-generated texts often contain terms distinct from human-written texts, which can be effectively captured using lexical features. The robustness of our submissions to obfuscation techniques further highlights the effectiveness of our approach. Overall, our results offer a more effective alternative to all baseline approaches while also being more efficient and interpretable.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Top 50 tokens with the largest differences in log probabilities for multinomial Naive Bayes. Positive values indicate the probability is higher for LLM-generated texts, negative values indicate the probability is higher for human-written texts.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Top 20 tokens for identifying LLM-generated texts using Logistic Regression (left) and SVM (right).The importance of each token is based on the size of the coefficients assigned to them by the trained models.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Overview of the different classifiers (rows) and features (columns) evaluated on the validation set.</figDesc><table><row><cell>Classifier</cell><cell cols="2">tf-idf Term Frequencies</cell></row><row><cell cols="2">Multinomial Naive Bayes 0.77</cell><cell>0.874</cell></row><row><cell>Logistic Regression</cell><cell>0.927</cell><cell>0.922</cell></row><row><cell>SVM</cell><cell>0.932</cell><cell>0.925</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>Overview of the performance of our approaches, baselines, and the summary statistics of the performance of all submissions in the competition over 10 variants of the test set. We report the minimum, 25-th quantile, median, 75-th quantile, and maximum of the arithmetic mean of all metrics.</figDesc><table><row><cell></cell><cell cols="5">ROC-AUC Brier C@1 F 1 F 0.5𝑢 Mean</cell></row><row><cell>naive-bayes</cell><cell></cell><cell>0.998</cell><cell cols="3">0.859 0.975 0.975 0.974 0.956</cell></row><row><cell>logistic-regression</cell><cell></cell><cell>0.996</cell><cell cols="3">0.884 0.97 0.97 0.97 0.958</cell></row><row><cell>svm</cell><cell></cell><cell>0.994</cell><cell cols="3">0.923 0.976 0.976 0.975 0.969</cell></row><row><cell>Baseline Binoculars</cell><cell></cell><cell>0.972</cell><cell cols="3">0.957 0.966 0.964 0.965 0.965</cell></row><row><cell cols="2">Baseline Fast-DetectGPT (Mistral)</cell><cell>0.876</cell><cell>0.8</cell><cell cols="2">0.886 0.883 0.883 0.866</cell></row><row><cell>Baseline PPMd</cell><cell></cell><cell>0.795</cell><cell cols="3">0.798 0.754 0.753 0.749 0.77</cell></row><row><cell>Baseline Unmasking</cell><cell></cell><cell>0.697</cell><cell cols="3">0.774 0.691 0.658 0.666 0.697</cell></row><row><cell>Baseline Fast-DetectGPT</cell><cell></cell><cell>0.668</cell><cell cols="3">0.776 0.695 0.69 0.691 0.704</cell></row><row><cell>95-th quantile</cell><cell></cell><cell>0.995</cell><cell cols="3">0.986 0.988 0.988 0.989 0.989</cell></row><row><cell>75-th quantile</cell><cell></cell><cell>0.971</cell><cell cols="3">0.925 0.954 0.935 0.942 0.945</cell></row><row><cell>Median</cell><cell></cell><cell>0.911</cell><cell cols="3">0.889 0.887 0.869 0.867 0.889</cell></row><row><cell>25-th quantile</cell><cell></cell><cell>0.714</cell><cell cols="3">0.771 0.683 0.657 0.670 0.697</cell></row><row><cell>Min</cell><cell></cell><cell>0.131</cell><cell cols="3">0.265 0.005 0.006 0.007 0.224</cell></row><row><cell>Approach</cell><cell cols="6">Minimum 25-th Quantile Median 75-th Quantile Max</cell></row><row><cell>naive-bayes</cell><cell>0.884</cell><cell></cell><cell>0.935</cell><cell>0.945</cell><cell>0.967</cell><cell>0.969</cell></row><row><cell>logistic-regression</cell><cell>0.837</cell><cell></cell><cell>0.941</cell><cell>0.957</cell><cell>0.963</cell><cell>0.989</cell></row><row><cell>svm</cell><cell>0.832</cell><cell></cell><cell>0.949</cell><cell>0.969</cell><cell>0.974</cell><cell>0.999</cell></row><row><cell>Baseline Binoculars</cell><cell>0.342</cell><cell></cell><cell>0.818</cell><cell>0.844</cell><cell>0.965</cell><cell>0.996</cell></row><row><cell>Baseline Fast-DetectGPT (Mistral)</cell><cell>0.095</cell><cell></cell><cell>0.793</cell><cell>0.842</cell><cell>0.929</cell><cell>0.958</cell></row><row><cell>Baseline PPMd</cell><cell>0.270</cell><cell></cell><cell>0.546</cell><cell>0.750</cell><cell>0.770</cell><cell>0.863</cell></row><row><cell>Baseline Unmasking</cell><cell>0.250</cell><cell></cell><cell>0.653</cell><cell>0.673</cell><cell>0.697</cell><cell>0.762</cell></row><row><cell>Baseline Fast-DetectGPT</cell><cell>0.159</cell><cell></cell><cell>0.579</cell><cell>0.677</cell><cell>0.719</cell><cell>0.982</cell></row><row><cell>95-th quantile</cell><cell>0.875</cell><cell></cell><cell>0.973</cell><cell>0.985</cell><cell>0.989</cell><cell>1.000</cell></row><row><cell>75-th quantile</cell><cell>0.758</cell><cell></cell><cell>0.875</cell><cell>0.935</cell><cell>0.959</cell><cell>0.994</cell></row><row><cell>Median</cell><cell>0.605</cell><cell></cell><cell>0.629</cell><cell>0.876</cell><cell>0.889</cell><cell>0.946</cell></row><row><cell>25-th quantile</cell><cell>0.350</cell><cell></cell><cell>0.481</cell><cell>0.658</cell><cell>0.697</cell><cell>0.709</cell></row><row><cell>Min</cell><cell>0.015</cell><cell></cell><cell>0.038</cell><cell>0.231</cell><cell>0.235</cell><cell>0.252</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work originates from a programming assignment from the "Introduction to Natural Language Processing" course at Bauhaus-Universität Weimar during the summer term of 2024. We would like to thank the teaching staff who recognized the potential of our approach and encouraged us to participate in the PAN task. Together we turned these ideas into writing.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An evaluation framework for plagiarism detection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/C10-2115" />
	</analytic>
	<monogr>
		<title level="m">Coling 2010: Posters, Coling 2010 Organizing Committee</title>
				<editor>
			<persName><forename type="first">C.-R</forename><surname>Huang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<meeting><address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="997" to="1005" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Language as evidence: Doing forensic linguistics</title>
		<author>
			<persName><forename type="first">V</forename><surname>Guillén-Nieto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Stein</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
			<publisher>Springer Nature</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Detection and moderation of detrimental content on social media platforms: current status and future directions</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">U</forename><surname>Gongane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">V</forename><surname>Munot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Anuse</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Social Network Analysis and Mining</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page">129</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of the Authorship Verification Task at PAN 2022</title>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kredens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pezik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Heini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3180/paper-184.pdf" />
	</analytic>
	<monogr>
		<title level="m">CLEF 2022 Labs and Workshops, Notebook Papers</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Hanbury</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">3180</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">B</forename><surname>Casals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dementieva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Elnagar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Freitag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Korenčić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Smirnova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taulé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ustalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mulhem</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Quénot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schwab</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">M D</forename><surname>Nunzio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>De Herrera</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Overview of the &quot;Voight-Kampff&quot; Generative AI Authorship Verification Task at PAN and ELOQUENT</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Karlgren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gogoulou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Talman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galušč'akov'a</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>Herrera</surname></persName>
		</editor>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Linear classifier: An often-forgotten baseline for text classification</title>
		<author>
			<persName><forename type="first">Y.-C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-A</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.acl-short.160</idno>
		<ptr target="https://aclanthology.org/2023.acl-short.160.doi:10.18653/v1/2023.acl-short.160" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Rogers</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Boyd-Graber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Okazaki</surname></persName>
		</editor>
		<meeting>the 61st Annual Meeting of the Association for Computational Linguistics<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1876" to="1888" />
		</imprint>
	</monogr>
	<note>Short Papers), Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Continuous Integration for Reproducible Shared Tasks with TIRA</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fr"obe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kolyada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Grahm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elstner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Loebe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hagen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-28241-6_20</idno>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Maistro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Joho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Davis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Gurrin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Kruschwitz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Caputo</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="236" to="241" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Compression and machine learning: a new perspective on feature space vectors</title>
		<author>
			<persName><forename type="first">D</forename><surname>Sculley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Brodley</surname></persName>
		</author>
		<idno type="DOI">10.1109/DCC.2006.13</idno>
	</analytic>
	<monogr>
		<title level="m">Data Compression Conference (DCC&apos;06)</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="332" to="341" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">On the usefulness of compression models for authorship verification</title>
		<author>
			<persName><forename type="first">O</forename><surname>Halvani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Graner</surname></persName>
		</author>
		<idno type="DOI">10.1145/3098954.3104050</idno>
		<idno>doi:10.1145/3098954.3104050</idno>
		<ptr target="https://doi.org/10.1145/3098954.3104050" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES &apos;17</title>
				<meeting>the 12th International Conference on Availability, Reliability and Security, ARES &apos;17<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Authorship verification as a one-class classification problem</title>
		<author>
			<persName><forename type="first">M</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schler</surname></persName>
		</author>
		<idno type="DOI">10.1145/1015330.1015448</idno>
		<idno>doi:10.1145/1015330.1015448</idno>
		<ptr target="https://doi.org/10.1145/1015330.1015448" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twenty-First International Conference on Machine Learning, ICML &apos;04</title>
				<meeting>the Twenty-First International Conference on Machine Learning, ICML &apos;04<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page">62</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Generalizing unmasking for short texts</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hagen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1068</idno>
		<ptr target="https://aclanthology.org/N19-1068.doi:10.18653/v1/N19-1068" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Doran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</editor>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="654" to="659" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Spotting llms with binoculars: Zero-shot detection of machine-generated text</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schwarzschild</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Cherepanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kazemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Goldblum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Geiping</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Goldstein</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2401.12070.arXiv:2401.12070" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text</title>
		<author>
			<persName><forename type="first">J</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Y</forename><surname>Zhuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2306.05540.arXiv:2306.05540" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Fast-detectgpt: Efficient zero-shot detection of machinegenerated text via conditional probability curvature</title>
		<author>
			<persName><forename type="first">G</forename><surname>Bao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Teng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2310.05130.arXiv:2310.05130" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
