<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>D. Jiménez); hcalvo@cic.ipn.mx (H. Calvo)
 https://github.com/Pherjev (F. Aguilar-Canto); https://www.cic.ipn.mx/index.php/francisco-hiram-calvo-castro
(H. Calvo)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>GPT-2 versus GPT-3 and Bloom: LLMs for LLMs Generative Text Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fernando Aguilar-Canto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Cardoso-Moreno</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diana Jiménez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hiram Calvo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational Cognitive Sciences Laboratory, Center for Computing Research, Instituto Politécnico Nacional</institution>
          ,
          <addr-line>Mexico City 07700</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>With the advent and proliferation of advanced Large Language Models (LLMs) such as BLOOM, GPT series, and ChatGPT, there is a growing concern regarding the potential misuse of this technology. Consequently, it has become imperative to develop machine learning techniques that can discern whether a given text has been generated by an LLM or authored by a human. In this paper, we present our approach in the AuTexTification shared task, where we fine-tuned BERT-based models and GPT-2 Small. Remarkably, GPT-2 Small achieved the highest F1-macro score in the validation set, prompting us to evaluate its performance on the testing set. We achieved an F1-macro score of 0.74134, securing the third position in the benchmark. Furthermore, we extended our fine-tuning eforts to the model attribution subtask, yielding a F1-macro score of 0.52282.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Generative Text Detection</kwd>
        <kwd>Large Language Models (LLMs)</kwd>
        <kwd>Model Attibution</kwd>
        <kwd>AuTexTification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, numerous companies and research centers have introduced a variety of Large
Language Models (LLMs) to the field, including the Generative Pre-trained Transformer (GPT)
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], GPT-2 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], ChatGPT, GPT-3 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Pathways Language Model (PaLM) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], GPT-4 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], BLOOM
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and others. While the development of LLMs has achieved remarkable success, it has
also raised ethical and public concerns regarding their potential misuse in generating and
disseminating fake news or inaccurate information [
        <xref ref-type="bibr" rid="ref3 ref7 ref8">7, 3, 8</xref>
        ], spam [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], propaganda [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and
even facilitating academic cheating [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In light of these concerns, it becomes crucial to develop efective technologies for the detection
and classification of text generated by LLMs, to mitigate the consequences associated with the
aforementioned issues. However, the task of classifying generated text from LLMs, such as
GPT3, has proven to be no better than random when performed by trained human evaluators [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
This highlights the need to leverage LLMs themselves to aid in the classification of generated
text.
      </p>
      <p>
        In this paper, we evaluate multiple LLMs performance in diferentiating machine-generated
text from human-written text, as well as in the area of model attribution for LLMs. To accomplish
this, we utilize the benchmark proposed by Sarvazyan et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], named AuTexTification
(Automated Text Identification), which was presented in the context of the IberLEF 2023 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
The benchmark includes two subtasks: the first involves identifying whether a given text is
human or machine-generated, while the second requires performing model attribution.
      </p>
      <p>This paper is structured as follows: Section 2 provides an overview of the related works in
the field. Section 3 introduces the methods and models implemented in our study. Section 4
presents the empirical results obtained from our experiments. In Section 5, we discuss and
analyze the results in detail. Finally, Section 6 ofers a conclusion summarizing our observations
and suggestions for potential future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        High-quality text generation has emerged as a recent development, leading to a relatively limited
number of classifiers specifically designed for this task in recent language models. However, it
is important to note that text generation itself is not a novel task, as evidenced by earlier studies
such as [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. Despite the historical context, recent Large Language Models (LLMs) present
unique challenges in terms of classification, as highlighted by studies like [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. These studies
have demonstrated the dificulties in accurately classifying text generated by contemporary
LLMs. In this literature review, we discuss the related works implemented on recent LLMs:
GPT-2, GPT-3, ChatGPT, and other contemporary models. Most papers focus on identifying the
machine-generated text, while model attribution has barely been studied [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
2.1. GPT-2
Diferent Machine/Deep Learning methods have been proposed for identifying
machinegenerated texts [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. When it comes to text generated by GPT-2, Solaiman et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] utilized
classical machine learning classifiers such as logistic regression. Additionally, Gehrmann et al.
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] introduced Giant Language model Test Room (GLTR), a set of baseline statistical methods for
analyzing statistical diferences between human and machine-generated text. Their observations
revealed that humans tend to use rare words more frequently compared to GPT-2-generated
text.
      </p>
      <p>
        LLMs have also been employed to detect text generated by other language models. In the
case of GPT-2, Solaiman et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] utilized zero-shot classification with GPT-2 o GLOVER [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
but did not achieve better comparative results than classical approaches. Fine-tuned LLMs
have demonstrated better performance in this task. For GPT-2, Ippolito et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] proposed the
usage of human evaluators and the BERT model [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. In contrast, Zellers et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] found that
their model, GLOVER, was more appropriate for this task compared to BERT and detectors like
fastText [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Other authors, such as Uchendu et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], also utilized GLOVER.
      </p>
      <p>
        Another commonly used LLM for detecting generated content is RoBERTa [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Solaiman et
al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] achieved approximately 95% accuracy in detecting web pages generated by GPT-2 XL
using RoBERTa. In this study, they also employed GPT-2 models to identify content generated
by other GPT-2 models of varying sizes. The findings indicated that a larger GPT-2 model
can identify content generated by a smaller model, but the reverse relationship does not hold.
Overall, the study concluded that a RoBERTa model of the same capacity as GPT-2 is better
suited for the task at hand. In the case of machine-generated tweets, Fagni et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] found that
RoBERTa outperformed classical classifiers, Convolutional Neural Networks (CNNs), Recurrent
Networks, and other LLMs such as BERT, DistilBERT [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], and XLNet [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Similar results were
achieved by Tourille et al. [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] and Kumarage et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] in the same context. RoBERTa is also
utilized in the task of detecting technically generated text [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
      <p>
        For languages other than English, there are few works. For instance, Harrag et al. [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] utilized
AraBERT [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] for classifying Arabic GPT-2 machine-generated texts versus human-written
texts.
2.2. GPT-3
In the case of GPT-3, researchers have explored classical approaches, including feature-based
identification methods [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. One notable method proposed for identifying text generated by
GPT-3 is DetectGPT [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], which utilizes a statistical criterion. In certain tasks, this criterion has
shown superior performance compared to large models like RoBERTa.
      </p>
      <p>
        Uchendu et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced TuringBench, a benchmark that encompasses various LLMs
such as GROVER, GPT-2, GLTR, BERT, and RoBERTa, for the purpose of classifying
humanwritten versus machine-generated text. Regarding GPT-3, the GPT-2 classifier achieved an
F1-score of only 0.5293, while the best-performing model (BERT) attained a higher F1-score of
0.7944.
      </p>
      <sec id="sec-2-1">
        <title>2.3. Beyond GPT-3: ChatGPT and GPT-4</title>
        <p>
          Other recent LLMs, including GPT-3.5-turbo and ChatGPT, have also been subject to study.
Various classifiers have been proposed, ranging from Random Forest with stylometric features
[
          <xref ref-type="bibr" rid="ref35">35</xref>
          ], XGBoost with feature extraction [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ], RoBERTa [
          <xref ref-type="bibr" rid="ref37 ref38 ref39 ref40">37, 38, 39, 40</xref>
          ], DistilBERT [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ], to
OPT125M [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ].
        </p>
        <p>
          Combining diferent LLMs has also been explored. Wang et al. [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ] employed RoBERTa,
GLTR, and selected features to identify mixed content generated by ChatGPT, GPT-3, GLOOM,
and others. Li et al. [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ] used GLTP, fastText, LongFormer [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ], and DetectGPT to detect text
produced by LLMs such as GPT-3.5, OPT [
          <xref ref-type="bibr" rid="ref45">45</xref>
          ], and LLaMA [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ].
        </p>
        <p>
          However, despite the eforts made to detect content generated by modern LLMs like ChatGPT,
Pegoraro et al. [
          <xref ref-type="bibr" rid="ref47">47</xref>
          ] conducted an analysis revealing that none of the modern proposals are
capable of accurately identifying text generated by ChatGPT. In the case of benchmarks like
MGTBench [
          <xref ref-type="bibr" rid="ref48">48</xref>
          ], the ChatGPT detector proposed by Guo et al. [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ] demonstrated superior
performance compared to other methods evaluated, although it can still be evaded with minor
perturbations.
        </p>
        <p>
          Few models have considered GPT-4 identification. Works by Zaitsu and Jin [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] and Zhang et
al. [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ] have included GPT-4 in their investigations.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        1. BERT [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
2. RoBERTa [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
3. XLM-RoBERTa [
        <xref ref-type="bibr" rid="ref49">49</xref>
        ]
4. DeBERTA [
        <xref ref-type="bibr" rid="ref50">50</xref>
        ]
5. GPT-2 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
In our study, we compared the following LLMs for the classification of machine-generated
versus human-generated texts:
      </p>
      <p>The selection of BERT-based models was based on their performance in similar tasks, as
mentioned in the literature review of the previous section. As for GPT-2, we chose this model
because it is one of the largest openly available models.</p>
      <p>
        We employed the default hyperparameters provided by HuggingFace for each model. The
evaluation metrics we reported include F1-macro, F1-micro, F1-weighted, and accuracy. In this
task, the F1-macro metric is considered the most important [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. All models were fine-tuned
using the training set partition of the provided dataset and the HuggingFace framework. For
the GPT-2 models, a dense layer was added to the model’s output for classification purposes.
Hyperparameter optimization was not implemented, and default hyperparameters provided by
HuggingFace were utilized instead.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>
          The AuTexTification dataset [
          <xref ref-type="bibr" rid="ref51">51</xref>
          ] comprises a collection of machine-generated texts produced
by the following models:
        </p>
        <p>A : BLOOM-1b7
B : BLOOM-3b
C : BLOOM-7b1
D : GPT-3 Babbage
E : GPT-3 Curie</p>
        <p>F : GPT-3 DaVinci-003
For the first subtask, the training set consists of 33,845 text samples labeled as either
machinegenerated or human. The testing set for the second subtask comprises 22,416 machine-generated
text samples, each labeled with one of the letters A-F corresponding to the model that generated
the text. The dataset exhibits low-class imbalance, with an entropy ratio (/ log ) of
approximately 0.99996 for subtask 1 and 0.99975 for subtask 2, where  represents the number of classes
and  denotes the Shannon entropy. The texts in both subtasks are short, with a maximum of
98 words for subtask 1 and 97 words for subtask 2. The fixed testing set for subtask 1 consists of
21,832 samples, while the testing set for subtask 2 comprises 5,605 samples. It is worth noting
that the AuTexTification dataset provides separate datasets for English and Spanish languages,
and in this study, we focused solely on the English subtasks.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the quantitative results reported for this submision (CIC-IPN-CsCog group) in the
AuTexTification benchmark are based on the F1-macro score. Confidence intervals for these
scores are also provided in the paper. However, other metrics, such as accuracy, F1-micro, and
others, were not reported in the paper, and therefore, we included them in our analysis.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Subtask 1: Human or Generated</title>
        <p>The first subtask of the AuTexTification dataset involves distinguishing between texts generated
by LLMs and those written by humans. To evaluate the performance of the models, we randomly
split 20% of the original training set as a validation set. We chose to train most models for 3
epochs, except for BERT-base-cased where we used 6 epochs to explore diferent results. Due to
the computational cost associated with the GPT-2 model, we trained it for a single epoch. All
models achieved similar metrics, with F1-scores and accuracy around 90%. However, GPT-2
outperformed the other models in various metrics, including F1-macro, F1-micro, F1-weighted,
and accuracy. The detailed numerical results on the validation set are presented in Table 1.
Furthermore, the confusion matrix of the best-performing model, GPT-2, is visualized in Figure
1a.</p>
        <p>
          Subsequently, we proceeded to evaluate the performance of the GPT-2 model on the provided
training set from the AuTexTification dataset [
          <xref ref-type="bibr" rid="ref51">51</xref>
          ]. The main results of both subtasks are
summarized in Table 2. Additionally, Figure 1b presents the confusion matrix of the GPT-2
model specifically for subtask 1 in the testing set.
        </p>
        <p>
          In subtask 1, the GPT-2 model achieved an F1-macro score of 0.74134, with a confidence
interval of (73.53, 74.72). It is worth noting that our proposed model secured the third position
in the original AuTexTification benchmark as described in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Upon examining the results,
we observe that the fine-tuned GPT-2 model exhibits a relatively high recall but a low precision
in the testing set, leading to a considerable number of false negatives.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Subtask 2: Model attribution</title>
        <p>In the second subtask of the AuTexTification benchmark, we utilized the GPT-2 model for
classifying text generated by specific models, known as model attribution. It is important to
note that for this subtask, we fine-tuned the GPT-2 model independently without transferring
any knowledge from the first subtask, according to the benchmark rules. The main results
obtained on the validation set are provided in Table 3. Additionally, Figure 2a illustrates the
confusion matrix generated by the GPT-2 model for the second subtask in the validation set.</p>
        <p>In contrast to the first subtask, the results obtained in the testing set for the second subtask
were quite similar. Table 2 presents the numerical results in the testing set, while Figure 2b
depicts the confusion matrix for the same partition.</p>
        <p>
          The GPT-2 model demonstrated the ability to accurately diferentiate between models A
(BLOOM-1b7), B-C (BLOOM-3b - BLOOM-7b1), D-E (GPT-3 Babbage - GPT-3 Curie), and F
(GPT-3 DaVinci-003). However, most of the errors occurred in confusions between models B-C
and D-E. Overall, the distinction between BLOOM and GPT-3 models was achieved correctly. It
is worth noting that as reported in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], this particular implementation of GPT-2 did not rank
among the top models for this specific subtask, securing the 26th position. Nevertheless, it is
possible to trace a hierarchical clustering of the models (figure 3).
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>
        The results obtained in this article suggest that the GPT-2 model is suitable for identifying
machine-generated content produced by GPT-3 and BLOOM models. However, it may not be the
most efective approach for model attribution. Surprisingly, the smaller GPT-2 model was able
to classify content generated by larger models, challenging the findings of the previous study of
Solaiman et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], where smaller GPT-2 models struggled to identify content generated by
larger GPT-2 models.
      </p>
      <p>
        Another point of contention is the efectiveness of BERT-based classifiers compared to GPT
models, as discussed in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In this study, GPT-2 outperformed BERT, RoBERTa, XLM-RoBERTa,
and DeBERTa in the task of identifying machine-generated content. However, it should be
noted that the comparison with RoBERTa was only made when the number of parameters in the
GPT-2 model was similar, and a direct comparison between the two models was not performed.
      </p>
      <p>
        It is important to mention that the AuTexTification dataset used in this study consists of
samples with less than 100 words, which is diferent from other approaches such as Yan et al.
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] where the classifier input is an essay. Therefore, this approach focuses more on short pieces
of text rather than complete texts.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this paper, we propose the use of fine-tuned GPT-2 models for both identifying
machinegenerated content by language models (LLMs) and the model attribution subtask. Given the
widespread use of modern LLMs, both subtasks are crucial for monitoring and regulating the
potential misuse of this technology.</p>
      <p>To accomplish this, we compared the performance of BERT, RoBERTa, XLM-RoBERTa,
DeBERTa, and GPT-2 models in the first subtask. Interestingly, the fine-tuned GPT-2 small model
outperformed the other models, achieving a high F1-macro score of 0.91668 on the validation
set. In the testing set, this model achieved a F1-macro score of 0.74134.</p>
      <p>Furthermore, we applied the GPT-2 model to the second subtask, which involves model
attribution. In this task, the GPT-2 model obtained a F1-macro score of 0.51988.</p>
      <p>These results demonstrate the efectiveness of fine-tuned GPT-2 models for both identifying
machine-generated content and performing model attribution. The utilization of GPT-2 small
model showed promising performance, highlighting its potential for controlling and regulating
the use of modern LLMs.</p>
      <sec id="sec-6-1">
        <title>6.1. Further reseach</title>
        <p>While this study focused on using the GPT-2 small model, there are other LLMs that can be
considered for the same task, such as GPT-2 large, GPT-3, ChatGPT, and GPT-4. Future research
can explore the performance of these models in identifying machine-generated content and
model attribution. In addition, it must be explained the gap between the validation set F1-score
and testing set F1-score in the subtask 1.</p>
        <p>Furthermore, this study does not provide an explicit explanation for why GPT-2 small
outperformed the BERT-based models. Exploring the explainability of these results would be an
interesting avenue for further research. Understanding the factors that contribute to the internal
decisions of the tested models can provide insights into how humans can be better prepared to
distinguish between bot-generated and human-generated content.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments References</title>
      <p>The authors wish to thank the support of the Instituto Politécnico Nacional (COFAA, SIP-IPN,
Grant SIP 20230140) and the Mexican Government (CONACYT, SNI).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Improving Language Understanding by Generative Pre-Training</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <source>Language Models are Unsupervised Multitask Learners, OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <source>Language Models are Few-Shot Learners, Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          , et al.,
          <source>PaLM: Scaling Language Modeling with Pathways</source>
          ,
          <source>arXiv preprint arXiv:2204.02311</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] OpenAI, GPT-4
          <source>Technical Report, ArXiv abs/2303</source>
          .08774 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Akiki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pavlick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ilić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hesslow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Castagné</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Luccioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yvon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gallé</surname>
          </string-name>
          , et al.,
          <article-title>BLOOM: A 176B-Parameter Open-Access Multilingual Language Model</article-title>
          , arXiv preprint arXiv:
          <volume>2211</volume>
          .05100 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zellers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bisk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Roesner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <source>Defending Against Neural Fake News, Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Uchendu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Authorship Attribution for Neural Text Generation</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8384</fpage>
          -
          <lpage>8395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Weiss</surname>
          </string-name>
          , Deepfake Bot Submissions to Federal
          <source>Public Comment Websites Cannot Be Distinguished from Human Submissions, Technology Science</source>
          <volume>2019121801</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Weidinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mellor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rauh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Grifin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uesato</surname>
          </string-name>
          , P.-S. Huang, M. Cheng, M. Glaese,
          <string-name>
            <given-names>B.</given-names>
            <surname>Balle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kasirzadeh</surname>
          </string-name>
          , et al.,
          <article-title>Ethical and social risks of harm from language models</article-title>
          ,
          <source>arXiv preprint arXiv:2112.04359</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fauss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hao</surname>
          </string-name>
          , W. Cui,
          <article-title>Detection of AI-generated Essays in Writing Assessment</article-title>
          ,
          <source>Psychological Testing and Assessment Modeling</source>
          <volume>65</volume>
          (
          <year>2023</year>
          )
          <fpage>125</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>August</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Serrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Haduong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gururangan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>All that's' human'is not gold: Evaluating human evaluation of generated text</article-title>
          ,
          <source>arXiv preprint arXiv:2107.00061</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>A. M. Sarvazyan</surname>
            ,
            <given-names>J. Á.</given-names>
          </string-name>
          <string-name>
            <surname>González</surname>
            ,
            <given-names>M. Franco</given-names>
          </string-name>
          <string-name>
            <surname>Salvador</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chulvi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
          </string-name>
          , Overview of AuTexTification at IberLEF 2023:
          <article-title>Detection and Attribution of MachineGenerated Text in Multiple Domains</article-title>
          ,
          <source>in: Procesamiento del Lenguaje Natural</source>
          , Jaén, Spain,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes-y</surname>
          </string-name>
          <string-name>
            <surname>Gómez</surname>
          </string-name>
          ,
          <source>Overview of IberLEF 2023: Natural Language Processing Challenges for Spanish and other Iberian Languages, Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Matthiessen</surname>
          </string-name>
          ,
          <article-title>Nigel: A Systemic Grammar for Text Generation</article-title>
          .,
          <source>Technical Report, UNIVERSITY OF SOUTHERN CALIFORNIA MARINA DEL REY INFORMATION SCIENCES INST</source>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>McKeown</surname>
          </string-name>
          ,
          <string-name>
            <surname>Text</surname>
            <given-names>generation</given-names>
          </string-name>
          , Cambridge University Press,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Merkhofe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaudhari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Manville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gante</surname>
          </string-name>
          ,
          <source>Machine Learning Model Attribution Challenge, arXiv preprint arXiv:2302.06716</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jawahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdul-Mageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Laks Lakshmanan</surname>
          </string-name>
          ,
          <article-title>Automatic Detection of Machine Generated Text: A Critical Survey</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2296</fpage>
          -
          <lpage>2309</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>I.</given-names>
            <surname>Solaiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brundage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kreps</surname>
          </string-name>
          , et al.,
          <source>Release Strategies and the Social Impacts of Language Models</source>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>09203</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Harvard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Strobelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          , GLTR:
          <article-title>Statistical Detection and Visualization of Generated Text</article-title>
          ,
          <source>in: Proceedings of System Demonstrations</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ippolito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Duckworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eck</surname>
          </string-name>
          ,
          <article-title>Automatic detection of generated text is easiest when humans are fooled</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>00650</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , Volume
          <volume>1</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching Word Vectors with Subword Information, Transactions of the association for computational linguistics 5 (</article-title>
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <article-title>RoBERTa: A Robustly Optimized BERT Pretraining Approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fagni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Falchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gambini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Martella</surname>
          </string-name>
          , M. Tesconi, TweepFake: About detecting deepfake tweets,
          <source>Plos one 16</source>
          (
          <year>2021</year>
          )
          <article-title>e0251415</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , T. Wolf,
          <article-title>DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>01108</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>XLNet: Generalized Autoregressive Pretraining for Language Understanding</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tourille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <article-title>Automatic Detection of Bot-generated Tweets</article-title>
          ,
          <source>in: Proceedings of the 1st International Workshop on Multimedia AI against Disinformation</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kumarage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Trapeznikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruston</surname>
          </string-name>
          , H. Liu,
          <article-title>Stylometric Detection of AI-Generated Text in Twitter Timelines</article-title>
          ,
          <source>arXiv preprint arXiv:2303.03697</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shamsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          , Cross-Domain
          <source>Detection of GPT-2- Generated Technical Text, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1213</fpage>
          -
          <lpage>1233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>F.</given-names>
            <surname>Harrag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dabbah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Darwish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdelali</surname>
          </string-name>
          ,
          <string-name>
            <surname>BERT</surname>
          </string-name>
          <article-title>Transformer model for Detecting Arabic GPT2 AutoGenerated Tweets</article-title>
          ,
          <source>in: Proceedings of the Fifth Arabic Natural Language Processing Workshop</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>W.</given-names>
            <surname>Antoun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Baly</surname>
          </string-name>
          , H. Hajj,
          <article-title>AraBERT: Transformer-based Model for Arabic Language Understanding</article-title>
          ,
          <source>in: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Ofensive Language Detection</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fröhling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <article-title>e443</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khazatsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , C. Finn, DetectGPT:
          <string-name>
            <surname>Zero-Shot MachineGenerated Text Detection using Probability Curvature</surname>
          </string-name>
          ,
          <source>arXiv preprint arXiv:2301.11305</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaitsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Distinguishing ChatGPT</surname>
          </string-name>
          (-
          <volume>3</volume>
          .5,-
          <fpage>4</fpage>
          )
          <article-title>-generated and human-written papers through Japanese stylometric analysis</article-title>
          ,
          <source>arXiv preprint arXiv:2304.05534</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shijaku</surname>
          </string-name>
          , E. Canhasi,
          <article-title>Chatgpt generated text detection</article-title>
          ,
          <source>ResearchGate</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>B.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          , How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection, arXiv preprint arXiv:
          <volume>2301</volume>
          .07597 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yi</surname>
          </string-name>
          , Q. Cheng, Y. Huang,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          , X. Liu,
          <article-title>AI vs</article-title>
          .
          <article-title>Human-Diferentiation Analysis of Scientific Content Generation</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>00650</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mireshghallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mattern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shokri</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          Berg-Kirkpatrick,
          <article-title>Smaller Language Models are Better Black-box Machine-Generated Text Detectors</article-title>
          ,
          <source>arXiv preprint arXiv:2305.09859</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          , P. Stenetorp, G3Detector:
          <string-name>
            <surname>General GPT-Generated Text</surname>
            <given-names>Detector</given-names>
          </string-name>
          ,
          <source>arXiv preprint arXiv:2305.12680</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mitrović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Andreoletti</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Ayoub,</surname>
          </string-name>
          <article-title>ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text</article-title>
          ,
          <source>arXiv preprint arXiv:2301.13852</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsvigun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Whitehouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Afzal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mahmoud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Aji</surname>
          </string-name>
          , et al.,
          <article-title>M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection</article-title>
          ,
          <source>arXiv preprint arXiv:2305.14902</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Deepfake Text Detection in the Wild,
          <source>arXiv preprint arXiv:2305.13242</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <article-title>Longformer: The Long-Document Transformer</article-title>
          , arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>05150</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Roller,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Artetxe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dewan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Diab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. V.</given-names>
            <surname>Lin</surname>
          </string-name>
          , et al.,
          <source>OPT: Open Pre-trained Transformer Language Models, arXiv preprint arXiv:2205.01068</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
          </string-name>
          , et al.,
          <source>LLaMA: Open and Eficient Foundation Language Models, arXiv preprint arXiv:2302.13971</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pegoraro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fereidooni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-R.</given-names>
            <surname>Sadeghi</surname>
          </string-name>
          , To ChatGPT, or not to ChatGPT:
          <article-title>That is the question!</article-title>
          ,
          <source>arXiv preprint arXiv:2304.01487</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Backes</surname>
          </string-name>
          , Y. Zhang, MGTBench: Benchmarking MachineGenerated Text Detection,
          <source>arXiv preprint arXiv:2303.14822</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , É. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised Cross-lingual Representation Learning at Scale</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
          <lpage>8451</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>DeBERTa: Decoding-enhanced BERT with Disentangled Attention</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>03654</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarvazyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Ángel</given-names>
            <surname>González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          , P. Rosso,
          <source>AuTexTification Dataset (Full data)</source>
          ,
          <year>2023</year>
          . URL: https://doi.org/10.5281/zenodo.7956207. doi:
          <volume>10</volume>
          .5281/zenodo.7956207.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>