<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Token Prediction as Implicit Classification for Generative AI Authorship Verification Notebook for the PAN Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Zhanhong</forename><surname>Ye</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Foshan University</orgName>
								<address>
									<settlement>Foshan</settlement>
									<region>Guangdong</region>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yutong</forename><surname>Zhong</surname></persName>
							<email>yutongz115@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Foshan University</orgName>
								<address>
									<settlement>Foshan</settlement>
									<region>Guangdong</region>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Zhen</forename><surname>Huang</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">South China Normal University</orgName>
								<address>
									<settlement>Guangzhou</settlement>
									<region>Guangdong</region>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Leilei</forename><surname>Kong</surname></persName>
							<email>kongleilei@fosu.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="institution">Foshan University</orgName>
								<address>
									<settlement>Foshan</settlement>
									<region>Guangdong</region>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Token Prediction as Implicit Classification for Generative AI Authorship Verification Notebook for the PAN Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">1F30DE46A899313E33356D6C1863DB3A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents a method leveraging Next Token Prediction as Implicit Classification for Voight-Kampff Generative AI Authorship Verification. The rationale behind this approach is that token prediction can effectively perform text classification tasks. Consequently, we utilize the Token Prediction method to directly identify whether the input text was authored by a specific AI model or by a human. We assessed the effectiveness of our method using the Generative AI Authorship Verification datasets provided by PAN. We then selected model weights that demonstrated the best performance on the dataset given by PAN. Finally, on the test set, our performance metrics at the Minimum, 25-th Quantile, Median, 75-th Quantile, and Max were 0.527, 0.896, 0.922, 0.926, and 0.947 respectively.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, generative LLMs have gained recognition for their impressive ability to produce coherent language across different domains. Consequently, detecting machine-generated text has become increasingly vital. The Generative AI Authorship Verification task regarded as detecting machine-generated text task involves two texts, one authored by a human and one by a machine. The primary objective is to determine which of the two texts was written by a human and which was generated by a machine. Furthermore, the Generative AI Authorship Verification task can aid in ensuring the authenticity of information is critical, such as legal proceedings.</p><p>Research <ref type="bibr" target="#b0">[1]</ref> utilizes Token Prediction as an Implicit Classification for Generative AI Authorship Verification. By assigning distinct tokens to different labels and reformulating the multi-class classification task into a next-token prediction task, this method identifies whether the input sentence was generated by a particular model or authored by a human <ref type="bibr" target="#b0">[1]</ref>. The purpose of this approach is to leverage the model's next-token prediction capability for this specific task. Recent studies <ref type="bibr" target="#b1">[2]</ref> have employed the fine-tune transformer-based method, which achieved the LLMsgenerated text detection task by training transformer-based classifiers. However, one of the biggest challenges in fine-tuning transformer-based methods is not to directly leverage the next-token prediction capability of the model for this particular task <ref type="bibr" target="#b0">[1]</ref>. Fine-tune transformer-based method will increase the gap between downstream tasks and pre-training tasks compared to next-token prediction <ref type="bibr" target="#b2">[3]</ref>. Hence here are better solutions than simply fine-tuning transformer-based methods. In this paper, we leverage research <ref type="bibr" target="#b0">[1]</ref> to predict whether a given sample text is authored by a human or paraphrased by a machine. Unlike the fine-tuning transformer-based method, we employ the Token Prediction as an Implicit Classification approach. This involves establishing a bijection 𝑓 :𝑌 → 𝒴, where 𝑌 ⊂ Σ. 𝒴 serves as proxy labels such as 'human', 'GPT-3.5', etc. 𝑌 represents the ground truth label. The model then predicts the corresponding proxy labels based on the input text. We have established two sets of proxy labels which are proxy labels in method 1 and proxy labels in method 2. In method 1, the proxy labels can be translated into three outcomes: one indicating human authorship, one indicating AI model rewrites, and one indicating undecidable. Method 2, proxy labels are translated into two outcomes: one for human authorship and one for AI model rewrites. This differentiation allows us to determine whether the text is human-authored, machine-generated, or falls into another category. In detail, the model comprises two parts. The first part is the long-T5 <ref type="bibr" target="#b3">[4]</ref> model, which encodes the input text. The second part is a linear layer designed to project the output of long-T5 onto a dimension equivalent to the vocabulary size. This projects the probabilities of the proxy labels, thereby determining whether the input text under examination was generated by a model, authored by a human or undecidable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Network Architecture</head><p>First, the language model is presented with a series of sentences to be tested, each consisting of tokens from 𝐸 1 to 𝐸 𝑛 and 𝐸 &lt;𝑠&gt; . The goal is to utilize the longT5 model to implement the Generative AI Authorship Verification task. The core feature of the model is the method of next-token prediction. After inputting the tokens from 𝐸 1 to 𝐸 𝑛 and 𝐸 &lt;𝑠&gt; into longT5, it obtains the probabilities of the proxy labels. The predicted proxy labels for each sentence are then determined by selecting the label with the highest probability. Then, we convert the proxy labels into the final result, determining whether the text was authored by a human or paraphrased by a machine. According to the model shown in Figure <ref type="figure">1</ref>, it comprises a longT5 backbone, a next-token prediction layer, and a filter. The first component is the longT5 backbone, which is used to encode the sentences under examination. Following the next-token prediction layer in method 2, where linear layers map the output of longT5 to a dimension equivalent to the vocabulary size, enabling the calculation of probabilities for each proxy label. In method 2, the filter selects the probabilities corresponding to the proxy labels from the output of the next-token prediction layer, which are then processed through a softmax layer. Finally, the proxy labels with the highest probability are chosen, which is then translated into one of two outcomes: whether the text under examination was generated by a specific model or authored by a human. Returning to method 1, it is similar to method 2 but it identifies the text by obtaining the probabilities corresponding to the proxy labels from the next-token prediction layer. In method 1, after obtaining the proxy labels, we translate them into two outcomes: one indicating human authorship and the other AI model rewrites. For method 1, in addition to these two outcomes, we include an additional result labeled as undecidable, making three possible outcomes. The detailed process is described in section 2.1. Overall, the primary loss function ℒ can be defined as follows.</p><formula xml:id="formula_0">ℒ = ℒ 𝑁 𝐿𝐿 = −𝑙𝑜𝑔𝑃 (𝒴 𝑖 |𝑆 𝑖 ; 𝜃)<label>(1)</label></formula><p>The loss ℒ 𝑁 𝐿𝐿 is negative log-likelihood to optimize the longT5 and next-token prediction layer, 𝑆 𝑖 means the sentence under examination, 𝜃 mean the whole model's parameters, and 𝑦 means the ground truth labels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Self-attention</head><p>Feed-forward MLP </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">next-token prediction</head><p>For method 2, we assign a special token "&lt;extra_id_0&gt;" as the proxy label for human-authored text. For other models we designate similar tokens such as "&lt;extra_id_1&gt;", "&lt;extra_id_2&gt;", ... "&lt;extra_id_n&gt;", where 𝑛 ≤ 𝑘 and k represents the number of models involved in PAN dataset <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>In method 1, human-authored texts are tagged with the word "positive" as the proxy label, while all texts rewritten by AI models are labeled "negative". If the highest probability in the next token probability distribution does not fall on either "positive" or "negative", the result is deemed "undecidable".</p><p>Both methods involve the model predicting the probabilities of the proxy labels and then converting proxy labels into the actual prediction results. Next, we measure the token length of each human-authored or model-generated text. Our statistical analysis reveals that the vast majority of text lengths are within 2048 tokens. Firstly, the PAN organization has provided datasets for Generative AI Authorship Verification, which include multiple texts authored by humans and subsequently rewritten by various models. Give a batch name as ℬ. The contents of ℬ can define as {(𝑆 1 , 𝒴 1 ), (𝑆 2 , 𝒴 2 )...(𝑆 𝑖 , 𝒴 𝑖 )} ∈ ℬ, where 𝑆 𝑖 means the sentence under examination, and 𝒴 𝑖 is the proxy label.</p><p>During training, we feed the ℬ into the pre-training model which is composed of the transformer <ref type="bibr" target="#b6">[7]</ref> block to get the corresponding hidden state ℋ 𝑖 . After obtaining the hidden state ℋ 𝑖 we use the next-token prediction layer and softmax layer to obtain the probabilities for all tokens in the vocabulary. That is,</p><formula xml:id="formula_1">𝜙 𝑖 = (𝒴 1 𝑖 , 𝒴 2 𝑖 , ...𝒴 𝑉 𝑖 ) = ( 𝑒 (𝜑(ℋ 𝑖 ) 1 ) ∑︀ 𝑉 𝑣=1 𝑒 (𝜑(ℋ 𝑖 ) 𝑉 ) , 𝑒 (𝜑(ℋ 𝑖 ) 2 ) ∑︀ 𝑉 𝑣=1 𝑒 (𝜑(ℋ 𝑖 ) 𝑉 ) , ..., 𝑒 (𝜑(ℋ 𝑖 ) 𝑉 ) ∑︀ 𝑉 𝑣=1 𝑒 (𝜑(ℋ 𝑖 ) 𝑉 ) )<label>(2)</label></formula><p>where 𝜙 𝑖 is the soft label of sample i , v indicates the position of a token within the vocabulary, V represents the total number of tokens in vocabulary, 𝒴 𝑉 𝑖 represents the probability of the V -th word in vocabulary and 𝒴 𝑖 means proxy label. Then we calculate the negative log-likelihood loss for classification.</p><formula xml:id="formula_2">𝐿 𝑛𝑙𝑙 = −𝑙𝑜𝑔𝑃 (𝒴 𝑖 |𝜙 𝑖 , 𝜃)<label>(3)</label></formula><p>In the inference phase, for method 1, after obtaining 𝜙 𝑖 , we convert 𝜙 𝑖 into three predictive outcomes.</p><formula xml:id="formula_3">𝑦 ˆ= ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ 1 𝑎𝑟𝑔 max 𝒴∈𝒴 𝑉 𝑖 𝜙 𝑖 = 𝑎 0 𝑎𝑟𝑔 max 𝒴∈𝒴 𝑉 𝑖 𝜙 𝑖 = 𝑏 0.5 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒<label>(4)</label></formula><p>In method 1, 𝑎 represents the position of the word "positive" in the vocabulary, while 𝑏 represents the position of the word "negative". 𝑦 ˆrepresent predict label. 𝑦 ˆ= 1 indicates text authored by humans, 𝑦 ˆ= 0 indicates text rewritten by a machine, and 𝑦 ˆ= 0.5 indicates 'undecidable' when a clear determination cannot be made. For method 2, we initially obtain the output from the next-token prediction layer.</p><formula xml:id="formula_4">𝜑(•) = (𝜑(ℋ 𝑖 ) 1 , 𝜑(ℋ 𝑖 ) 2 , ..., 𝜑(ℋ 𝑖 ) V )<label>(5)</label></formula><p>where 𝜑(•) indicates the output of the next-token prediction layer and V is vocabulary size. We then use a filter to select the outputs associated with all the special tokens(proxy label tokens).</p><formula xml:id="formula_5">𝜑(•) ′ = (𝜑(ℋ 𝑖 ) 1 , 𝜑(ℋ 𝑖 ) 2 , ..., 𝜑(ℋ 𝑖 ) 𝑘 )<label>(6)</label></formula><p>where 𝜑(•) ′ indicates the output of the filter and 𝑘 represents the number of all special tokens. After passing through the softmax layer, we obtain the probability distribution of proxy label tokens.</p><formula xml:id="formula_6">𝜙 ′ 𝑖 = (𝒴 1 𝑖 , 𝒴 2 𝑖 , ...𝒴 𝑘 𝑖 ) = ( 𝑒 (𝜑(ℋ 𝑖 ) ′ ) 1 ∑︀ 𝑘 𝑗=1 𝑒 (𝜑(ℋ 𝑖 ) ′ ) 𝑗 , 𝑒 (𝜑(ℋ 𝑖 ) ′ ) 2 ∑︀ 𝑘 𝑗=1 𝑒 (𝜑(ℋ 𝑖 ) ′ ) 𝑗 , ..., 𝑒 (𝜑(ℋ 𝑖 ) ′ ) 𝑘 ∑︀ 𝑘 𝑗=1 𝑒 (𝜑(ℋ 𝑖 ) ′ ) 𝑗 )<label>(7)</label></formula><p>where j ∈ 𝑘 and 𝜙 ′ 𝑖 represent probability distribution of proxy label tokens. Finally, we convert 𝜙 𝑖 ′ into two predictive outcomes:</p><formula xml:id="formula_7">𝑦 ˆ= ⎧ ⎨ ⎩ 0 𝑎𝑟𝑔 max 𝒴∈𝒴 𝑉 𝑖 𝜙 𝑖 = 𝑐 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒<label>(8)</label></formula><p>where 𝑐 ∈ {1 . . . 𝑘} indicates the special tokens, 𝑦 ˆ= 1 indicates text authored by humans, and 0 indicates text rewritten by a machine.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments and Result</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Experience setting</head><p>In this work, we utilize the longT5 model for classification, which consists of 12 transformer layers, with a hidden size of 768. As for the next-token prediction layer, we use randomly initialized parameters before training. For method 1, the training parameters are set with 10 epochs, a batch size of 64, and a learning rate of 5e-4. For method 2, the settings are 15 epochs, a batch size of 16, and a learning rate of 8e-4. Both method's maximum token length is set to 2048. All experiments are conducted on an NVIDIA A800 GPU with 80GB of memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Results</head><p>We will conduct two experiments using token prediction as an implicit classification for both method 1 and method 2. After training with these methods, the resulting model weights from both experiments will be submitted to the TIRA platform <ref type="bibr" target="#b7">[8]</ref> to obtain scores. Table <ref type="table" target="#tab_1">1</ref> and 2 displays our test set results reported to the TIRA platform. Table <ref type="table" target="#tab_1">1</ref> shows the summarized results averaged (arithmetic mean) over 10 variants of the test dataset.</p><p>Each dataset variant applies one potential technique to measure the robustness of authorship verification approaches, e.g., switching the text encoding, translating the text, switching the domain, manual obfuscation by humans, etc. Table <ref type="table" target="#tab_2">2</ref> shows the results, initially pre-filled with the official baselines provided by the PAN organizers and summary statistics of all submissions to the task (i.e., the maximum, median, minimum, and 95-th, 75-th, and 25-th percentiles over all submissions to the task). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Conclusion</head><p>In this paper, we have completed the tasks set by PAN and have employed the next-token prediction method to tackle the Generative AI Authorship Verification task. Instead of using fine-tuned transformerbased method techniques, we utilize the next-token prediction method to narrow the gap between downstream tasks and pre-training tasks. Finally, on the test set, our performance metrics at the Minimum, 25-th Quantile, Median, 75-th Quantile, and Max were 0.527, 0.896, 0.922, 0.926, and 0.947 respectively. These results certify the effectiveness of our proposed method performing the Generative AI Authorship Verification task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Limitations</head><p>Firstly, the method proposed in this paper does not involve any prompts in the current LLMs-generated text detection task. Using prompts can better leverage the internal knowledge of language models. Therefore, in future work, we plan to incorporate prompts to complete this task. Additionally, transforming the task into a binary AI detection task, rather than judging which AI -authored the text, is another method to accomplish AI detection tasks. However, this approach can easily lead to data imbalance issues, where the amount of human-authored data is not equivalent to that of AI-generated data. To address this, data augmentation techniques could be employed to increase the quantity of human-authored data.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Overview of the accuracy in detecting if a text is written by an human in task 4 on PAN 2024 (Voight-Kampff Generative AI Authorship Verification). We report ROC-AUC, Brier, C@1, F 1 , F 0.5𝑢 and their mean.</figDesc><table><row><cell>Approach</cell><cell cols="3">ROC-AUC Brier C@1 F 1 F 0.5𝑢 Mean</cell></row><row><cell>method1</cell><cell>0.501</cell><cell cols="2">0.744 0.501 0.624 0.544 0.583</cell></row><row><cell>method2</cell><cell>0.984</cell><cell cols="2">0.918 0.907 0.898 0.954 0.932</cell></row><row><cell>Baseline Binoculars</cell><cell>0.972</cell><cell cols="2">0.957 0.966 0.964 0.965 0.965</cell></row><row><cell>Baseline Fast-DetectGPT (Mistral)</cell><cell>0.876</cell><cell>0.8</cell><cell>0.886 0.883 0.883 0.866</cell></row><row><cell>Baseline PPMd</cell><cell>0.795</cell><cell cols="2">0.798 0.754 0.753 0.749 0.77</cell></row><row><cell>Baseline Unmasking</cell><cell>0.697</cell><cell cols="2">0.774 0.691 0.658 0.666 0.697</cell></row><row><cell>Baseline Fast-DetectGPT</cell><cell>0.668</cell><cell cols="2">0.776 0.695 0.69 0.691 0.704</cell></row><row><cell>95-th quantile</cell><cell>0.994</cell><cell cols="2">0.987 0.989 0.989 0.989 0.990</cell></row><row><cell>75-th quantile</cell><cell>0.969</cell><cell cols="2">0.925 0.950 0.933 0.939 0.941</cell></row><row><cell>Median</cell><cell>0.909</cell><cell cols="2">0.890 0.887 0.871 0.867 0.889</cell></row><row><cell>25-th quantile</cell><cell>0.701</cell><cell cols="2">0.768 0.683 0.657 0.670 0.689</cell></row><row><cell>Min</cell><cell>0.131</cell><cell cols="2">0.265 0.005 0.006 0.007 0.224</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Overview of the mean accuracy over 9 variants of the test set. We report the minumum, median, the maximum, the 25-th, and the 75-th quantile, of the mean per the 9 datasets.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Approach Minimum 25-th Quantile Median 75-th Quantile Max</head><label></label><figDesc></figDesc><table><row><cell>method1</cell><cell>0.513</cell><cell>0.561</cell><cell>0.571</cell><cell>0.582</cell><cell>0.583</cell></row><row><cell>method2</cell><cell>0.527</cell><cell>0.896</cell><cell>0.922</cell><cell>0.926</cell><cell>0.947</cell></row><row><cell>Baseline Binoculars</cell><cell>0.342</cell><cell>0.818</cell><cell>0.844</cell><cell>0.965</cell><cell>0.996</cell></row><row><cell>Baseline Fast-DetectGPT (Mistral)</cell><cell>0.095</cell><cell>0.793</cell><cell>0.842</cell><cell>0.931</cell><cell>0.958</cell></row><row><cell>Baseline PPMd</cell><cell>0.270</cell><cell>0.546</cell><cell>0.750</cell><cell>0.770</cell><cell>0.863</cell></row><row><cell>Baseline Unmasking</cell><cell>0.250</cell><cell>0.662</cell><cell>0.696</cell><cell>0.697</cell><cell>0.762</cell></row><row><cell>Baseline Fast-DetectGPT</cell><cell>0.159</cell><cell>0.579</cell><cell>0.704</cell><cell>0.719</cell><cell>0.982</cell></row><row><cell>95-th quantile</cell><cell>0.863</cell><cell>0.971</cell><cell>0.978</cell><cell>0.990</cell><cell>1.000</cell></row><row><cell>75-th quantile</cell><cell>0.758</cell><cell>0.865</cell><cell>0.933</cell><cell>0.959</cell><cell>0.991</cell></row><row><cell>Median</cell><cell>0.605</cell><cell>0.645</cell><cell>0.875</cell><cell>0.889</cell><cell>0.936</cell></row><row><cell>25-th quantile</cell><cell>0.353</cell><cell>0.496</cell><cell>0.658</cell><cell>0.675</cell><cell>0.711</cell></row><row><cell>Min</cell><cell>0.015</cell><cell>0.038</cell><cell>0.231</cell><cell>0.244</cell><cell>0.252</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research was supported by the National Social Science Foundation of China (22BTQ101)</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Raj</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2311.08723</idno>
		<title level="m">Token prediction as implicit classification to identify llm-generated text</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2403.13335</idno>
		<title level="m">Adaptive ensembles of fine-tuned transformers for llm-generated text detection</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</title>
		<author>
			<persName><forename type="first">P</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hayashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="1" to="35" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ainslie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Uthus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ontanon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-H</forename><surname>Sung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2112.07916</idno>
		<title level="m">Longt5: Efficient text-to-text transformer for long sequences</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">B</forename><surname>Casals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dementieva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Elnagar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Freitag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Korenčić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Smirnova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taulé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ustalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mulhem</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Quénot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schwab</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">M D</forename><surname>Nunzio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>De Herrera</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Overview of the Voight-Kampff Generative AI Authorship Verification Task at PAN</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<editor>
			<persName><forename type="first">G</forename><forename type="middle">F N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>Herrera</surname></persName>
		</editor>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Continuous Integration for Reproducible Shared Tasks with TIRA</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kolyada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Grahm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elstner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Loebe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hagen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-28241-6_20</idno>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Maistro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Joho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Davis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Gurrin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Kruschwitz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Caputo</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="236" to="241" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
