<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Token Prediction as Implicit Classification for Generative AI Authorship Verification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhanhong Ye</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yutong Zhong</string-name>
          <email>yutongz115@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhen Huang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leilei Kong</string-name>
          <email>kongleilei@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan, Guangdong</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>South China Normal University</institution>
          ,
          <addr-line>Guangzhou, Guangdong</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>This paper presents a method leveraging Next Token Prediction as Implicit Classification for Voight-Kampf Generative AI Authorship Verification. The rationale behind this approach is that token prediction can efectively perform text classification tasks. Consequently, we utilize the Token Prediction method to directly identify whether the input text was authored by a specific AI model or by a human. We assessed the efectiveness of our method using the Generative AI Authorship Verification datasets provided by PAN. We then selected model weights that demonstrated the best performance on the dataset given by PAN. Finally, on the test set, our performance metrics at the Minimum, 25-th Quantile, Median, 75-th Quantile, and Max were 0.527, 0.896, 0.922, 0.926, and 0.947 respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2024</kwd>
        <kwd>Voight-Kampf Generative AI Authorship Verification 2024</kwd>
        <kwd>Next token prediction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>where  ⊂ Σ .  serves as proxy labels such as ’human’, ’GPT-3.5’, etc.  represents the ground truth
label. The model then predicts the corresponding proxy labels based on the input text.
We have established two sets of proxy labels which are proxy labels in method 1 and proxy labels in
method 2.</p>
      <p>In method 1, the proxy labels can be translated into three outcomes: one indicating human authorship,
one indicating AI model rewrites, and one indicating undecidable.</p>
      <p>Method 2, proxy labels are translated into two outcomes: one for human authorship and one for
AI model rewrites. This diferentiation allows us to determine whether the text is human-authored,
machine-generated, or falls into another category.</p>
      <p>In detail, the model comprises two parts. The first part is the long-T5 [ 4] model, which encodes the
input text. The second part is a linear layer designed to project the output of long-T5 onto a dimension
equivalent to the vocabulary size. This projects the probabilities of the proxy labels, thereby
determining whether the input text under examination was generated by a model, authored by a human or
undecidable.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Network Architecture</title>
      <p>First, the language model is presented with a series of sentences to be tested, each consisting of tokens
from 1 to  and &lt;&gt;. The goal is to utilize the longT5 model to implement the Generative AI
Authorship Verification task. The core feature of the model is the method of next-token prediction.
After inputting the tokens from 1 to  and &lt;&gt; into longT5, it obtains the probabilities of the proxy
labels. The predicted proxy labels for each sentence are then determined by selecting the label with the
highest probability. Then, we convert the proxy labels into the final result, determining whether the
text was authored by a human or paraphrased by a machine. According to the model shown in Figure 1,
it comprises a longT5 backbone, a next-token prediction layer, and a filter. The first component is the
longT5 backbone, which is used to encode the sentences under examination. Following the next-token
prediction layer in method 2, where linear layers map the output of longT5 to a dimension equivalent
to the vocabulary size, enabling the calculation of probabilities for each proxy label.
In method 2, the filter selects the probabilities corresponding to the proxy labels from the output of the
next-token prediction layer, which are then processed through a softmax layer. Finally, the proxy labels
with the highest probability are chosen, which is then translated into one of two outcomes: whether
the text under examination was generated by a specific model or authored by a human.
Returning to method 1, it is similar to method 2 but it identifies the text by obtaining the probabilities
corresponding to the proxy labels from the next-token prediction layer. In method 1, after obtaining
the proxy labels, we translate them into two outcomes: one indicating human authorship and the other
AI model rewrites. For method 1, in addition to these two outcomes, we include an additional result
labeled as undecidable, making three possible outcomes. The detailed process is described in section 2.1.
Overall, the primary loss function ℒ can be defined as follows.</p>
      <p>
        ℒ = ℒ = −  (|;  )
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
The loss ℒ is negative log-likelihood to optimize the longT5 and next-token prediction layer, 
means the sentence under examination,  mean the whole model’s parameters, and  means the ground
truth labels.
      </p>
      <p>method 1
P(positive)=</p>
      <p>P(negative)=</p>
      <p>Next-token
Probability Distribution</p>
      <p>...</p>
      <p>Encoder Block x 12
Feed-forward MLP</p>
      <p>Self-attention
E1</p>
      <p>E2
...</p>
      <p>EN</p>
      <p>Es</p>
      <p>P(human)=</p>
      <p>P(alpaca-7b)=</p>
      <p>P(gpt-3.5)=
Proxy Label Probability</p>
      <p>Distribution</p>
      <p>...</p>
      <p>Next-token
Probability Distribution
filter</p>
      <p>...</p>
      <p>Feed Forward
Masked Multi-head Attention</p>
      <p>Masked Multi-head Attention
longT5-base Backbone
Decoder Block x 12</p>
      <sec id="sec-2-1">
        <title>2.1. next-token prediction</title>
        <p>For method 2, we assign a special token "&lt;extra_id_0&gt;" as the proxy label for human-authored text. For
other models we designate similar tokens such as "&lt;extra_id_1&gt;", "&lt;extra_id_2&gt;", ..."&lt;extra_id_n&gt;",
where  ≤  and k represents the number of models involved in PAN dataset [5, 6].
In method 1, human-authored texts are tagged with the word "positive" as the proxy label, while all texts
rewritten by AI models are labeled "negative". If the highest probability in the next token probability
distribution does not fall on either "positive" or "negative", the result is deemed "undecidable".
Both methods involve the model predicting the probabilities of the proxy labels and then converting
proxy labels into the actual prediction results.</p>
        <p>Next, we measure the token length of each human-authored or model-generated text. Our statistical
analysis reveals that the vast majority of text lengths are within 2048 tokens.</p>
        <p>Firstly, the PAN organization has provided datasets for Generative AI Authorship Verification, which
include multiple texts authored by humans and subsequently rewritten by various models.
Give a batch name as ℬ. The contents of ℬ can define as {(1, 1), (2, 2)...(, )} ∈ ℬ, where 
means the sentence under examination, and  is the proxy label.</p>
        <p>
          During training, we feed the ℬ into the pre-training model which is composed of the transformer
[7] block to get the corresponding hidden state ℋ. After obtaining the hidden state ℋ we use the
next-token prediction layer and softmax layer to obtain the probabilities for all tokens in the vocabulary.
That is,
  = (1, 2, ... ) = ( ∑︀=1 ((ℋ) ) , ∑︀=1 ((ℋ) ) , ..., ∑︀=1 ((ℋ) ) )
((ℋ)1)
((ℋ)2)
((ℋ) )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
where   is the soft label of sample i , v indicates the position of a token within the vocabulary,
V represents the total number of tokens in vocabulary,  represents the probability of the V -th
word in vocabulary and  means proxy label. Then we calculate the negative log-likelihood loss for
classification.
        </p>
        <p>In the inference phase, for method 1, after obtaining   , we convert   into three predictive outcomes.
 = −  (| ,  )
⎧1
⎪
⎪
⎪
ˆ = ⎨
 ma x   =</p>
        <p>∈
0  ma x   = 
⎪⎪ ∈
⎪⎩0.5 ℎ
In method 1,  represents the position of the word “positive” in the vocabulary, while  represents the
position of the word "negative". ˆ represent predict label. ˆ = 1 indicates text authored by humans, ˆ =
0 indicates text rewritten by a machine, and ˆ = 0.5 indicates ’undecidable’ when a clear determination
cannot be made.</p>
        <p>For method 2, we initially obtain the output from the next-token prediction layer.</p>
        <p>(· ) = ((ℋ)1, (ℋ)2, ..., (ℋ)V)
where (· ) indicates the output of the next-token prediction layer and V is vocabulary size. We then
use a filter to select the outputs associated with all the special tokens(proxy label tokens).</p>
        <p>(· )′ = ((ℋ)1, (ℋ)2, ..., (ℋ))
where (· )′ indicates the output of the filter and  represents the number of all special tokens. After
passing through the softmax layer, we obtain the probability distribution of proxy label tokens.
 ′ = (1, 2, ...) = ( ∑︀=1 ((ℋ)′) , ∑︀=1 ((ℋ)′) , ..., ∑︀=1 ((ℋ)′)
)
where j ∈  and  ′ represent probability distribution of proxy label tokens. Finally, we convert  ′ into
two predictive outcomes:
((ℋ)′)1
((ℋ)′)2</p>
        <p>((ℋ)′)
ˆ = ⎧⎨0  m∈ax   =</p>
        <p>⎩1 ℎ
where  ∈ {1 . . . } indicates the special tokens, ˆ = 1 indicates text authored by humans, and 0 indicates
text rewritten by a machine.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Result</title>
      <sec id="sec-3-1">
        <title>3.1. Experience setting</title>
        <p>
          In this work, we utilize the longT5 model for classification, which consists of 12 transformer layers,
with a hidden size of 768. As for the next-token prediction layer, we use randomly initialized parameters
before training. For method 1, the training parameters are set with 10 epochs, a batch size of 64, and a
learning rate of 5e-4. For method 2, the settings are 15 epochs, a batch size of 16, and a learning rate
of 8e-4. Both method’s maximum token length is set to 2048. All experiments are conducted on an
NVIDIA A800 GPU with 80GB of memory.
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(8)
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Results</title>
        <p>We will conduct two experiments using token prediction as an implicit classification for both method 1
and method 2. After training with these methods, the resulting model weights from both experiments
will be submitted to the TIRA platform [8] to obtain scores. Table 1 and 2 displays our test set results
reported to the TIRA platform.</p>
        <p>Table 1 shows the summarized results averaged (arithmetic mean) over 10 variants of the test dataset.
Each dataset variant applies one potential technique to measure the robustness of authorship verification
approaches, e.g., switching the text encoding, translating the text, switching the domain, manual
obfuscation by humans, etc.</p>
        <p>Table 2 shows the results, initially pre-filled with the oficial baselines provided by the PAN organizers
and summary statistics of all submissions to the task (i.e., the maximum, median, minimum, and 95-th,
75-th, and 25-th percentiles over all submissions to the task).</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Conclusion</title>
        <p>In this paper, we have completed the tasks set by PAN and have employed the next-token prediction
method to tackle the Generative AI Authorship Verification task. Instead of using fine-tuned
transformerbased method techniques, we utilize the next-token prediction method to narrow the gap between
downstream tasks and pre-training tasks. Finally, on the test set, our performance metrics at the
Minimum, 25-th Quantile, Median, 75-th Quantile, and Max were 0.527, 0.896, 0.922, 0.926, and 0.947
respectively. These results certify the efectiveness of our proposed method in performing the Generative
AI Authorship Verification task.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Limitations</title>
      <p>Firstly, the method proposed in this paper does not involve any prompts in the current LLMs-generated
text detection task. Using prompts can better leverage the internal knowledge of language models.
Therefore, in future work, we plan to incorporate prompts to complete this task.</p>
      <p>Additionally, transforming the task into a binary AI detection task, rather than judging which AI
-authored the text, is another method to accomplish AI detection tasks. However, this approach can
easily lead to data imbalance issues, where the amount of human-authored data is not equivalent to
that of AI-generated data. To address this, data augmentation techniques could be employed to increase
the quantity of human-authored data.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments References</title>
      <p>This research was supported by the National Social Science Foundation of China (22BTQ101)
[8] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
978-3-031-28241-6_20.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <article-title>Token prediction as implicit classification to identify llm-generated text</article-title>
          ,
          <source>arXiv preprint arXiv:2311.08723</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Chen,
          <article-title>Adaptive ensembles of fine-tuned transformers for llm-generated text detection</article-title>
          ,
          <source>arXiv preprint arXiv:2403.13335</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          , G. Neubig,
          <article-title>Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ainslie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Uthus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ontanon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Yang,</surname>
          </string-name>
          <article-title>Longt5: Eficient text-to-text transformer for long sequences</article-title>
          ,
          <source>arXiv preprint arXiv:2112.07916</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Stamatatos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the Voight-Kampf Generative AI Authorship Verification Task at PAN 2024</article-title>
          , in: G.
          <string-name>
            <given-names>F. N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>