<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Author Authentication of Generative AI Based on Bert by Regularization Method</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Baijian Huang</string-name>
          <email>huangbaijianh@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Changle Zhong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kai Yan</string-name>
          <email>yankai@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yong Han</string-name>
          <email>hanyong2005@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>Rapid development and widespread use of large language models (LLMs) have made distinguishing between human-written and machine-generated text increasingly challenging. This paper proposes a novel author verification method for generative AI based on a BERT model to address this issue. The model is trained by concatenating input text. By using the Regularized Dropout(R-Drop) method to constraint output and analyze similarities and diferences between various samples, the model learns internal data characteristics and identifies unique features of diferent authors' writing styles. Experimental results of a bert model trained using regularization techniques show that the model has an ROC-AUC measure of 0.942, ranking 13th out of 30 in the submission ranking and beating all baselines. The experimental results show that the proposed method can efectively distinguish between human-generated text and machine-generated text in diferent test scenarios, providing a robust solution for author verification in the context of generative AI.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Authorship Verification</kwd>
        <kwd>BERT</kwd>
        <kwd>Regularized Dropout</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The author verification task aims to determine which of two texts was generated by a human by
analyzing various linguistic features and stylistic elements of the text. Efective author authentication
techniques enable the identification of human-generated articles in fields such as literature, law, and
journalism. This process is crucial for ensuring texts’ authenticity, enhancing textual content’s quality
and credibility, combating plagiarism, and maintaining academic integrity, among other important
objectives.</p>
      <p>
        PAN at CLEF 2024 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] introduces a task where participants are given pairs of texts on the same
topic—one authored by a human and the other by a machine—and are challenged to identify the
human-written text. The Generative AI Authorship Verification Task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], in collaboration with the
Voight-Kampf Task ELOQUENT Lab, adopts a builder-breaker format. PAN participants will develop
systems to diferentiate between human-written and AI-generated texts, while ELOQUENT participants
will explore new text generation and obfuscation techniques to evade detection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our approach
consists of two key steps: First, we describe the same topic for two texts in the PAN 2024 text dataset,
which we form into a pair of texts where one human-generated text is linked to each of the other 13
machine-generated texts.Each combined text (text1 and text2) is truncated to ensure that its total length
is within 512 token. In addition to competing datasets, we also introduced an external dataset Kaggle
dataset: LLM-Detect AI Generated Text Train 1 to enhance our generalization of the model. The training
set of the Kaggle competition dataset consists of essays written for two of the seven prompts. Almost
all of the essays in the training set are written by students, and only a few examples of generated essays
are included. It does not distinguish between topics and treats them with random machine-generated
      </p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>1This data set can be found in https://www.kaggle.com/competitions/llm-detect-ai-generated-text/data
text. Finally, the model is trained with (t1,t2) as the sample. After fine-tuning the BERT model under
the transformers library as our base model, we verify the authorship of the generated AI through the
ability to embed a powerful pre-trained context.</p>
      <p>
        To solve and mitigate the overfitting problem, we also implemented BERT model by using the R-Drop
method [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The R-Drop method is a regularization technique designed to reduce variance and improve
the robustness of BERT models by ensuring that the model’s predictions remain consistent across
multiple dropout masks and enhance classification ability, Kullback-Leibler (KL) divergence is used to
ensure output consistency, and dropout is used to make slightly diferent model paths.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        In the realm of AI text detection, numerous methodologies have been proposed. Prominent examples
include GROVER[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and GLTR[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. GROVER generates news snippets and incorporates a fine-tuned
generation model for snippet detection. Conversely, GLTR renders machine-generated text discernible
by computing each word’s histogram of logarithmic likelihood values. Furthermore, DetectGPT[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
generates text variants by perturbing the model and evaluating the probabilistic curvature of these
variants. Recently, Fast-DetectGPT[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] has enhanced this approach by estimating conditional probability
curvature via a single model forward pass, thus substantially enhancing detection eficiency and
accuracy. These investigations underscore the significance and potential of statistical features and
language models in the realm of machine-generated text detection for PAN’s generative AI author
verification task. Conducting detailed comparative experiments requires a lot of computing resources
and time. My research is limited in resources and time, so I focus on my methods and results. Inspired by
these methods, we combined regularization and processed the data to improve the author’s verification
task. The subsequent experimental results also demonstrated the efectiveness of our method. The
accuracy of the test results in multiple datasets is 1, and the average accuracy of the results is 0.942.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Data Imbalance</title>
        <p>In the field of AI text detection, data imbalance is a common and important challenge. Data imbalance
means that the number of samples in some categories is much smaller than that in other categories, which
causes the model to favor majority-class samples and ignore minority-class samples during training,
thus afecting the detection efect. There are many ways to deal with data imbalance. Traditional
methods, such as oversampling and cost-sensitive learning, are still efective, and new methods, such as
deep learning and transfer learning, show better prospects. Combining and optimizing these methods
can help improve the detection performance of the model on imbalanced datasets and promote the
further development of text detection tasks.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. R-Drop manner</title>
        <p>
          The concept of R-Drop[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] entails each data sample traversing the same model with dropout repeatedly.
Subsequently, KL divergence is utilized twice to constrain the output, maximizing its consistency. Owing
to the stochastic nature of dropout, the path network traversed by input x can be approximated as two
subtly diferent models. This process is illustrated in Figure 1.
        </p>
        <p>One solution to balance the detection of semantic and non-semantic transfer is to calculate the
distance score in the feature space of both the pre-trained and fine-tuned models. This approach
accounts for the detection of both types of transfer cases.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Model</title>
      <p>We improve the performance of the model by training a BERT model fine-tuned by the R-Drop method.
1
∣
2</p>
      <p>∣</p>
      <sec id="sec-3-1">
        <title>Softmax</title>
        <p>Feed-Forward
Self-Attention
X
1
∣</p>
      </sec>
      <sec id="sec-3-2">
        <title>Softmax</title>
        <p>2
∣</p>
      </sec>
      <sec id="sec-3-3">
        <title>Softmax</title>
        <p>FF
SA
X
N x
N x
N x
units
dropped units</p>
        <sec id="sec-3-3-1">
          <title>3.1. Model architecture</title>
          <p>We introduce the R-Drop method into the training process. Firstly, the input text is encoded through
the BERT coding layer to generate the representation vector. These vectors are processed through the
Dropout layer to prevent overfitting. The vectors are then mapped through a linear classification layer
to the classification label space, generating multiple outputs. The regularization term is increased by
calculating the KL divergence between each pair of outputs. Finally, the loss function combines KL
divergence and classification loss, and updates the model parameters through backpropagation, thus
completing the whole model training process. The overall architecture of the model is shown in Figure
2.</p>
          <p>R-Drop imposes consistency constraints on the output layer of the model, which may indirectly afect
the weight update and learning process of each layer within the model (including the dropout layer).
Specifically, since R-Drop requires the model to produce similar outputs under diferent dropout settings,
this may prompt the model to learn more robust and universal feature representations during training,
thereby afecting the weights and outputs of the input embedding, multi-head attention, feed-forward,
and other layers. The R-Drop method mainly acts on the BERT model’s output layer and improves the
model’s generalization ability and robustness by imposing consistency constraints. It may indirectly
afect the weight update and learning process of the dropout layer of the input embedding, multi-head
attention, or feed-forward layer.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.2. Loss function</title>
          <p>The loss function is as follows: The training data is (, ) Where  represents the input feature, 
represents the label or target value corresponding to , and the model uses (, ) in the training data
to learn how to predict the target value  from the input feature . , the model is  (|), and the
cross entropy of each sample is:
 = − log  (|)
(1)</p>
          <p>In the case of "dropout twice," we can assume that the sample has passed through two slightly diferent
models, denoted as:
consistent as possible:
1 = − log 
(1)(|) − log 
(2)(|)
The other part of the loss function is the KL divergence, which aims to make the two outputs as
2 =
2
1 [︁KL(
(1)(|) ‖ 

(2)(|)) + KL(
(2)(|) ‖ 
(1)(|))

]︁</p>
          <p>The total loss function is the weighted sum of the cross entropy loss and the KL divergence loss,
where  is a hyperparameter that balances the importance of the two parts of the loss. The formula of
the total loss function is as follows:</p>
          <p>We add the R-drop method before the softmax function of the BERT model to limit the output
constraints. The experimental results also show that our method is efective.</p>
          <p>= 1 +  2</p>
          <p>Input text
BERT Encoder
Dropout Layer</p>
          <p>Linear Layer</p>
          <p>...
output1
output2</p>
          <p>output3 ... outputn
Compute KL Divergence</p>
          <p>Loss Function
Back propagation
Parameter Update
(2)
(3)
(4)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset and Preprocessing</title>
        <p>The training dataset provided by PAN@CLEF 2024 consists of 13 machine-generated datasets and one
human-generated dataset, each containing 1,087 pieces of data. These datasets include news articles,
Wikipedia introductory texts, and fan fiction. Additionally, PAN@CLEF 2024 ofers a guide dataset
comprising multiple real and fake news articles that made headlines in the United States in 2021. This
guide dataset is presented in a JSONL file format, with each file separated by line breaks. Each file
contains a list of articles, all maintaining the same article ID and line order, except for the model-specific
prefix before the first file. This ensures that the same line always corresponds to the same topic but
from a diferent "author".</p>
        <p>We extracted additional human-generated data from Kaggle to balance the training dataset, creating
a set of 1,681 human data points in the same format as the provided training data.</p>
        <p>The format of the test data is diferent from that of the training data set. Each line in the test data
contains a pair of texts, and the ID is garbled. Our task is to predict which text in each pair is manually
generated. For each test case in the input file, the ID of each text pair and a confidence score are
output, which indicates the possibility of human generation versus machine generation. A score of 1
indicates a strong belief that the text is manually generated, while a score of 0 indicates that the text is
machine-generated. If the prediction score is less than 0.5, text1 is considered manually generated; if it
is greater than 0.5, text2 is manually generated. The basic dataset composition is shown in Table1. In
human.jsonl 1087
search_human.jsonl 1681
alpaca-7b.jsonl 1087
bigscience-bloomz-7b1.jsonl 1087
chavinlo-alpaca-13b.jsonl 1087
gemini-pro.jsonl 1087
gpt-3.5-turbo-0125.jsonl 1087
gpt-4-turbo-preview.jsonl 1087
meta-llama-llama-2-7b-chat-hf.jsonl 1087
meta-llama-llama-2-70b-chat-hf.jsonl 1087
mistralai-mistral-7b-instruct-v0.2.jsonl 1087
mistralai-mixtral-8x7b-instruct-v0.1.jsonl 1087
qwen-qwen1.5-72b-chat-8bit.jsonl 1087
text-bison-002.jsonl 1087
vicgalle-gpt2-open-instruct-v1.jsonl 1087
deep learning, preprocessing is important, and more text data means more text features a model can
learn. The original training dataset has 11 files with 1,087 entries each. Unlike the training set, the test
set lacks labels for human-machine-generated text. We processed each entry to match the test dataset
format.</p>
        <p>The ratio of the human-generated dataset to the machine-generated dataset is 1:13, leading to the
problem of the unbalanced dataset. Every line in all files corresponds to the same topic, we combine
one artificial text and one machine-generated text for the same topic, one ID of the article corresponds
to two text entries, and the label distinguishes which text is manually generated.</p>
        <p>Hope that these data can better capture text features and avoid creating imbalanced training datasets,
which would result in poor generalization for minority class samples. By processing the data this way,
we can enhance the model’s ability to learn the characteristics of minority samples more efectively.</p>
        <p>Finally, the dataset is preprocessed, and the collected artificial data is randomly paired with
machinegenerated data to form text pairs, thereby expanding the dataset for model training. Finally, 14,135
training data are obtained, and the positive and negative sample ratio (text1 and text2 are manually set)
is maintained at about 1:1. The dataset details are shown in Table2. The training, validation, and test
sets are randomly divided into 6:3:1.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation Metrics</title>
        <p>
          To evaluate the proposed models, we used the evaluation tool oficially proposed by tira[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to evaluate
our model. The evaluation indicators are as follows:
• ROC-AUC: The area under the ROC (Receiver Operating Characteristic) curve.
• Brier: The complement of the Brier score (mean squared loss).
• C@1: A modified accuracy score that assigns non-answers (score = 0.5) the average accuracy of
the remaining cases.
• F1: The harmonic mean of precision and recall.
• F0.5: A modified F0.5 measure (precision-weighted F measure) that treats non-answers (score =
0.5) as false negatives.
        </p>
        <p>• Mean: The arithmetic mean of all the metrics above.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Experiment setting</title>
        <p>In this paper, the bert-base model is selected, and the parameters are as follows: L=12: there are 12
layers of transformer encoder. H=768 indicates that the number of hidden units in each layer is 768.
A=12 indicates that each layer has 12 self-attention heads. Total parameters=110M: indicates that the
model has about 110 million parameters.</p>
        <p>The BERT-base model is selected as the pre-trained model, and the BERT and fully connected network
classification models are constructed using PyTorch. Our hyperparameters are set as follows: the batch
size is 32, the maximum sequence length is 512, the initial learning rate is set to 2e-5, and the model is
trained for 10 epochs. Optimization is performed using AdamW, and the best model weights are saved
based on the accuracy of the test set at each epoch.</p>
        <p>After setting the above hyperparameters, we also trained a single text. Specifically, we added a label
"label" to the oficial dataset, using "0" or "1" to indicate whether the text was artificially generated.
Then we used this method to train the model. However, the verification efect was not ideal.</p>
        <p>Late-bit is a training method without adding R-Drop.Bitter-metaphor is a method where we apply the
R-Drop regularization method on top of the previous approach.Nutty-combat integrates four training
models based on text training. These four trained models predict our test data, and we select the most
frequent prediction results as the final value.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Results</title>
        <p>We reassembled the training dataset proposed by PAN24 and evaluated the model’s performance
on this reassembled dataset. Additionally, we tested our model on the PAN24 generative AI author
authentication test dataset. The test results are presented in Table3.</p>
        <p>By comparing the late-bit and nutty-combat methods, table3 shows that the model fusion method is
not as good as the single model in all evaluation indicators. This may be due to using a 0.5 threshold
for predictions, which led to performance degradation. Specifically, when the predicted results are the
same, a score of 0.5 is assigned, resulting in diminished performance on the test data.</p>
        <p>When comparing bitter-metaphor and late-bit, it is evident that bitter-metaphor achieves better test
results across all metrics. This demonstrates the efectiveness of the R-Drop method for the AI author
verification task.</p>
        <p>It can be seen from Tables 3 and 4 that the scores of our model for most data sets are higher than the
baseline. The Median is greater than 0.936, which generally exceeds the baseline median. Through the
result analysis, the results for the data sets "gpt-4-turbo-preview-german" and "text-bison-002-german"
are unsatisfactory. The scores of other data sets are close to 1. This may be due to the diferent language
structures leading to model judgment errors.</p>
        <p>F0.5U
- -
0.996 0.997 0.999
1.0 1.0 1.0
1.0 1.0 1.0
0.553 0.52 0.523
1.0 1.0 1.0
1.0 1.0 1.0
0.996 0.996 0.994
0.996 0.996 0.994
1.0 1.0 1.0</p>
        <p>Mean</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this article, we describe our approach to author authentication for PAN 2024. We reorganize the
training dataset to match the format of the test dataset, thereby creating a new, larger training dataset.
By augmenting the data in this way, we aim to improve our model’s ability to distinguish between text
generated by human authors and text generated by machines. Additionally, we introduce a regularization
method to learn the intrinsic characteristics of the data by comparing the similarities and diferences
between diferent samples. This approach helps the model learn the unique characteristics of diferent
authors’ writing styles.</p>
      <p>We also hope to explore various model fusion techniques to combine models with strong predictive
performance with good generalization ability. Of course, there are other methods to improve
generalization ability that are also worth studying, and we hope to train better models to identify authors in
the future.
This work is supported by the Natural Science Foundation of Guangdong Province, China
(No.2022A1515011544)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the “Voight-Kampf” Generative AI Authorship Verification Task at PAN</article-title>
          and
          <article-title>ELOQUENT 2024</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>a. a. Chulvi, Overview of the Voight-Kampf Generative AI Authorship Verification Task at PAN 2024</article-title>
          , in: Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          , et al.,
          <article-title>Continuous Integration for Reproducible Shared Tasks with TIRA.io</article-title>
          , in: J.
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maistro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Kruschwitz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Caputo (Eds.),
          <source>Advances in Information Retrieval. 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T.-Y. Liu, et al., R-drop:
          <article-title>Regularized dropout for neural networks</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>34</volume>
          (
          <year>2021</year>
          )
          <fpage>10890</fpage>
          -
          <lpage>10905</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Nikula</surname>
          </string-name>
          ,
          <article-title>Linguistic feature analysis of real and fake news: Human-written vs. grover-written (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Strobelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Gltr:
          <article-title>Statistical detection and visualization of generated text</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>04043</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khazatsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Finn</surname>
          </string-name>
          , Detectgpt:
          <article-title>Zero-shot machine-generated text detection using probability curvature</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>24950</fpage>
          -
          <lpage>24962</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Fast-detectgpt:
          <article-title>Eficient zero-shot detection of machine-generated text via conditional probability curvature</article-title>
          ,
          <source>arXiv preprint arXiv:2310.05130</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Continuous Integration for Reproducible Shared Tasks with TIRA.io</article-title>
          , in: J.
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maistro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Kruschwitz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Caputo (Eds.),
          <source>Advances in Information Retrieval. 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>