<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Voight-Kampf Generative AI Authorship Verification Based on T5</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhaojian Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fanzhi Zeng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yan Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiangyu Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuexia Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>This paper proposes a method for fine-tuning the pre-trained language model Text-to-Text Transfer Transformer (T5) for Generative AI Authorship Verification. During the training phase, the input sequence consists of explicit instructions and training samples, while the output sequence represents the classification results in the form of "positive &lt;/s&gt;" or "negative &lt;/s&gt;". During inference, the model's vocabulary is restricted to "positive" and "negative", selecting the word with the highest probability as the classification result. Finally, on the test set, our performance metrics scored 0.138, 0.529, 0.744, 0.874, and 0.877 for the minimum, 25th percentile, median, 75th percentile, and maximum values, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Generative AI Authorship Verification</kwd>
        <kwd>Pre-trained Language Model</kwd>
        <kwd>Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Text detection in machine-generated text is an active research area, primarily employing three distinct
approaches to diferentiate between human-written text and machine-generated text.</p>
      <p>The first approach is traditional statistical methods, which identify anomalies by analyzing statistical
characteristics of text samples. For example, the statistical method called the Giant Language Model Test
Room (GLTR), designed by Gehrmann et al. [6] This method comprises three testing steps: Tests 1 and
2 examine whether generated words are sampled from the top of the distribution, while Test 3 verifies if
the system is overly confident in its next prediction due to familiarity with previously generated contexts.
Through a study involving human subjects, GLTR successfully increased the accuracy of identifying
fake text from 54% to 72% without any pretraining, significantly enhancing human discernment of the
genuineness of generated text.</p>
      <p>The second approach is unsupervised learning methods, particularly zero-shot classification. This
method utilizes pre-trained large language models (LLMs) to detect their own generated text or text
generated by similar models. Solaiman et al. [7] proposed a baseline method that makes classification
decisions by evaluating log-probabilities and corresponding thresholds. However, compared to statistical
methods, the performance of zero-shot classification methods is typically inferior.</p>
      <p>The third approach is supervised learning methods, which involve fine-tuning existing language
models to create a text detector. For instance, Zeller et al. [8] utilized nfie-tuning linear layers on the
hidden states of the GROVER encoder to distinguish whether input text originates from the GROVER
model or from human hands. However, compared to the first two methods, supervised learning methods
require a significant amount of labeled data for model training, making the training process more
time-consuming.</p>
    </sec>
    <sec id="sec-3">
      <title>3. System Overview</title>
      <sec id="sec-3-1">
        <title>3.1. Data Source</title>
        <p>The training data set of the Generative AI Authorship Verification task is a bootstrap dataset of real
and fake news articles spanning multiple 2021 U.S. news headlines. It consists of JSON files written
by 13 diferent machine authors and 1 human author. Each file contains a list of articles, and each file
contains articles of the same topic. In all files, the ID and row order of the article are the same, so the
same row always corresponds to the same topic, but from diferent “authors”. Each document contains
24 topics and 1087 articles. Considering the token length limitations of large language models, we have
conducted a token count on these data. The total number of articles and the token length range in the
dataset written by diferent authors are shown in Table 1.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset Preprocessing</title>
        <p>For the provided training set, we initially preprocess texts authored by machines and humans. We
remove all empty texts, replace all full-width characters with half-width characters, and remove spaces
within the texts. Subsequently, we merge texts authored by machines and humans. Specifically, we
create a text tuple ("pair": ["text", "label"]), where "text" represents the content of the article, and "label"
indicates whether it is authored by a human (1 for positive, 0 for negative). We use 80% of the training
dataset for training and 20% for validation, with 11,393 samples for training and 3,797 samples for
validation.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Method</title>
        <p>Our method fine-tunes the T5 model for the text classification task. This approach transforms the text
classification problem into a sequence-to-sequence (seq-to-seq) problem, enabling the model to handle
and understand text data more flexibly.</p>
        <p>During the training phase, we utilized the training dataset provided by PAN 2024 and conducted
necessary preprocessing to ensure the data was suitable for model training. Since the T5 model was
exposed to numerous tasks with explicit instructions or prompts during its pre-training phase, adding
a prompt can fully leverage the model’s pre-training knowledge, thereby improving its performance
on specific tasks. Consequently, the input sequence consists of two parts: one is explicit instructions
(for example: "Distinguish whether the following text is written by a human"), and the other is the
text sample to be classified. The output sequence represents the classification result in the form of
"positive &lt;/s&gt;" or "negative &lt;/s&gt;", where "&lt;/s&gt;" is a sequence-ending token. To adapt the model to this
text classification task, we replaced the head of the T5 model with a randomly initialized head.</p>
        <p>During the inference phase, we restricted the model’s vocabulary to only include the words “positive”
and “negative”. The model predicts whether the input text is human-written or machine-generated
based on the probability distribution of these two words. For two disputed texts in the test dataset,
the model first predicts each text individually, outputting either “positive” or “negative”. Subsequently,
we compare the predicted labels of these two texts. If the predictions difer, a clear conclusion can be
drawn. If the predictions are the same, we compare the probability values of the predictions and choose
the result with the higher confidence. Additionally, we introduced a special &lt;PAD&gt; token in the T5
model’s decoder to help maintain consistent output formats when handling input sequences of diferent
lengths. The detailed design of the entire network architecture is illustrated in Figure 1.</p>
        <p>Next-word Probability Distribution
...</p>
        <p>P(Machine)</p>
        <p>P(Human)
“Negative”</p>
        <p>“Positive”</p>
        <p>Feed forward
Masked Muti-head Attention
Masked Muti-head Attention</p>
        <p>T5 Decoder Block ×6</p>
        <p>T5 Encoder Block ×6
Feed-forward MLP</p>
        <p>...</p>
        <p>Self-attention
E1</p>
        <p>E2
...</p>
        <p>EN</p>
        <p>E&lt;s&gt;</p>
        <p>E&lt;PAD&gt;</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <sec id="sec-4-1">
        <title>4.1. Experiment setup</title>
        <p>In this work, we adopt T5 as the pretrained model, and we use Pytorch to implement T5. Our
hyperparameters are set as follows: the batch size is 512, the loss function is cross entropy, the initial learning
rate is set to 5e-4, and 5 epochs are trained. Each training is optimized with AdamW. Our experiment
was conducted on the A800 server.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation</title>
        <p>To assess the efectiveness of our proposed model, we utilized the evaluation tool provided by PAN,
which includes the following metrics:</p>
        <p>ROC-AUC: ROC-AUC is a comprehensive evaluation of the balance between the true positive rate
and the false positive rate.</p>
        <p>Brier: The complement of the Brier score (mean squared loss).</p>
        <p>C@1: A modified accuracy score that assigns non-answers (score = 0.5) the average accuracy of the
remaining cases.</p>
        <p>F1: F1 score is the harmonic mean of precision and recall, combining both metrics into a single value.</p>
        <p>F0.5u: A modified F0.5 measure (precision-weighted F measure) that treats non-answers (score = 0.5)
as false negatives.</p>
        <p>Mean: The arithmetic mean of all the metrics above.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results</title>
        <p>We evaluated the performance of our model and baselines(Binoculars [9], Fast-DetectGPT (Mistral) [10],
PPMd [11, 12], Unmasking [13, 14] and Fast-DetectGPT [10]) on the oficial test set provided in PAN
2024.</p>
        <p>Table 2 shows the performance of our method across various metrics. Our method surpasses
Unmasking and Fast-DetectGPT in ROC-AUC, C@1, F0.5u and mean respectively, but there is still a certain gap
with Binoculars.</p>
        <p>Table 3 further shows the average accuracy of our model on diferent dataset variants, particularly on
the test sets of 9 variants. Our method surpasses Fast-DetectGPT (Mistral) on the minimum, surpasses
Unmasking and Fast-DetectGPT on the median, surpasses PPMd, Unmasking, and Fast-DetectGPT on
the 75-th quantile, and surpasses PPMd and Unmasking on the max.</p>
        <p>Compared with the quantile results of other participants, our model is close to or exceeds the
25th quantile model on most indicators, and exceeds the Min model on all indicators, indicating that
our method performed poorly on the test set and still has a significant gap compared to the current
state-of-the-art methods.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Minimum 25-th Quantile Median 75-th Quantile
The article comprehensively elaborates on our research progress in the field of Voight-Kampf Generative
AI Authorship Verification in 2024. In this study, we fine-tuned a T5 pre-trained model to enhance
the detection capability of AI text generation. The experimental results demonstrate that this method
efectively enhances text detection capability, but there still exists a certain gap compared to the
stateof-the-art methods. In the future, we plan to continue optimizing and refining this method to achieve
higher levels of precision and eficiency. Additionally, we will explore the potential applications of this
method in a wider range of natural language processing tasks, aiming to expand its scope of application
and further increase its practical value.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Natural Science Foundation of China (No. 61972091), Natural
Science Foundation of Guangdong Province of China (No. 2022A1515010101, No. 2021A1515012639).
the limits of transfer learning with a unified text-to-text transformer, Journal of machine learning
research 21 (2020) 1–67.
[4] Y. Chen, H. Kang, V. Zhai, L. Li, R. Singh, B. Raj, Gpt-sentinel: Distinguishing human and chatgpt
generated content, arXiv preprint arXiv:2305.07969 (2023).
[5] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
978-3-031-28241-6_20.
[6] S. Gehrmann, H. Strobelt, A. M. Rush, Gltr: Statistical detection and visualization of generated
text, arXiv preprint arXiv:1906.04043 (2019).
[7] I. Solaiman, M. Brundage, J. Clark, A. Askell, A. Herbert-Voss, J. Wu, A. Radford, G. Krueger, J. W.</p>
      <p>Kim, S. Kreps, et al., Release strategies and the social impacts of language models, arXiv preprint
arXiv:1908.09203 (2019).
[8] R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, Y. Choi, Defending against
neural fake news, Advances in neural information processing systems 32 (2019).
[9] A. Hans, A. Schwarzschild, V. Cherepanova, H. Kazemi, A. Saha, M. Goldblum, J. Geiping, T.
Goldstein, Spotting llms with binoculars: Zero-shot detection of machine-generated text, arXiv preprint
arXiv:2401.12070 (2024).
[10] G. Bao, Y. Zhao, Z. Teng, L. Yang, Y. Zhang, Fast-detectgpt: Eficient zero-shot detection of
machine-generated text via conditional probability curvature, arXiv preprint arXiv:2310.05130
(2023).
[11] O. Halvani, C. Winter, L. Graner, On the usefulness of compression models for authorship
verification, in: Proceedings of the 12th international conference on availability, reliability and
security, 2017, pp. 1–10.
[12] D. Sculley, C. E. Brodley, Compression and machine learning: A new perspective on feature space
vectors, in: Data Compression Conference (DCC’06), IEEE, 2006, pp. 332–341.
[13] M. Koppel, J. Schler, Authorship verification as a one-class classification problem, in: Proceedings
of the twenty-first international conference on Machine learning, 2004, p. 62.
[14] J. Bevendorf, B. Stein, M. Hagen, M. Potthast, Generalizing unmasking for short texts, in:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 654–659.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stakovskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Stamatatos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the Voight-Kampf Generative AI Authorship Verification Task at PAN 2024</article-title>
          , in: G.
          <string-name>
            <given-names>F. N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Exploring
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>