<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Team ap-team at PAN: LLM Adapters for Various Datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Galina Boeva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>German Gritsai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Grabovoy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advacheck OÜ</institution>
          ,
          <addr-line>Tallinn</addr-line>
          ,
          <country country="EE">Estonia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université Grenoble Alpes (UGA)</institution>
          ,
          <addr-line>Grenoble</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>The recent breakthrough in text generation ensures that the quality level of generation increases with each new model. On the other hand, the task associated with the use of generated text is relevant. Spreading false information, spamming, generating scientific articles and texts are all problems that have arisen from this outburst. Binary text classification methods have been proposed to control the situation. This research provides an approach based on aggregating QLoRA adapters which are trained for multiple distributions of generative model families. Our method LAVA (LLM Adapters for Various dAtasets) demonstrates comparable results with the primary baseline provided by the PAN organizers. The proposed method provides an eficient and fast detector with high performance of the target metrics, in view of the possibility of parallel training of adapters for the language models. It makes detecting process straightforward and flexible to tailor the adapter to appearing distributions and add it to an existing approach. Furthermore, each learns dependencies separately from the others, after which the outputs are aggregated.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;natural language processing</kwd>
        <kwd>large language models</kwd>
        <kwd>machine-generated text</kwd>
        <kwd>lora adapter</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Novelty advances in text generation include the development of machine and deep learning approaches
and models. The main direction of growth is the modernization of existing techniques based on
Transformers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], since they are able to identify dependencies within a sequence. To build a meaningful
model, not only the architecture itself is important, but many other factors are also crucial. The
approaches well-known nowadays, such as ChatGPT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Google Bard [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Jasper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], YaGPT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
GigaChat [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], have been trained on vast amounts of data to study the dependencies of tokens from
related fields. These approaches are used widely and assist people in writing code, generating text,
answering questions, and a host of other functions.
      </p>
      <p>
        Furthermore, due to training on large numbers of data, significant generalizing ability and identify
various patterns in the data. Such knowledge could greatly help in solving the problem of classifying
generated texts [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ]. However, training large models each time on huge datasets is not necessarily
feasible due to resource and time constraints. Therefore, in this study, we propose an approach with
training several QLoRA adapters [
        <xref ref-type="bibr" rid="ref10">10, 11</xref>
        ] for multiple data types thereby submitting our solution to
the Voight-Kampf Generative AI Authorship Verification competition by PAN [ 12, 13, 14]. Most of the
previously described methods mainly use a classifier that is trained on a single dataset. However, if
we desire to capture as many features of diferent language models as possible, we need to extend the
dataset with examples of each, but at some point this becomes ineficient. Having numerous examples
to train a single model will trigger the “forgetting" efect inherent in Transformer architecture models
due to the limited number of parameters. To solve this problem, we divided the training examples
into families and collected separate datasets for them. One QLoRA adapter will be trained on each
such dataset, which will then be combined with the others when aggregating the results. Therefore, by
teaching adapters these steps, we want to learn the distributions of the model family and the relevant
topics.
      </p>
      <p>Contribution
• The evolution of artificial text detection methods is analyzed;
• Multiple datasets from diferent families of generation models are collected, each of which contains
texts written by machine and human on the topics specified in the competition description;
• Implemented a basic approach that is trained on the dataset we gathered;
• An approach is proposed that includes the use of QLoRA adapters to pre-trained model on newly
collected data that correspond to diferent families of generated texts.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        The field based on the detection of artificial texts is developing quite rapidly. This section presents the
main approaches that are currently used for detection, as well as basic models with significant features.
Basic approaches for detection The paper [15] is based on the assessment of various syntactic,
semantic and empirical characteristics in order to improve traditional language models. In this work,
the substantial part was the compilation of artificial texts based on trigrams, and AdaBoost [ 16] was
also used as a classification model. The authors of [ 17] consider the problem of recognizing fake
news regarding linguistic features. Fake content has a number of characteristics related to syntactic,
lexical and semantic levels, which in turn creates a number of features. For example, the calculation
of punctuation types and the readability index were used. Moreover, more sophisticated methods are
used to determine dynamic and local characteristics in a sequence of tokens. One such article [18]
combines a convolutional neural network [19] and a bidirectional LSTM [20] to obtain representations
to which MLP is applied for final classification. This vector representation becomes more meaningful
by concatenating diferent pairs of speaker attributes (a dataset on political statements is viewed).
Attributes are the features that have been considered in the dataset LIAR [21]. The merging took place
after each individual attribute of the speaker’s profile was passed through a diferent layer.
Attention-based approaches Recently, the main direction of generation development has been
the use of attention-based models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These approaches are decent at identifying dependencies
within a sequence of tokens. When we classify texts, an essential component is the creation of vector
representations that will fully describe the input sequence. To create vector representations,
BERTlike [22] models are often used, which capture the context clearly. In the article [23], the authors
mention the problem of collecting global information on the dictionary, so they propose combining
BERT and the convolutional network of the dictionary graph, which helps to remove both local and
global changes. What is more, there are also various approaches to decoding tokens and sampling
methods. A comparison of three popular sampling-based decoding strategies such as untruncated
random sampling, nucleus sampling and topk was conducted by the authors of the article [24]. This
study examines the ability of discriminators to correctly distinguish texts created by machines from
human-written ones.
      </p>
      <p>Limitations of attention-based models A well-known limitation of transformers is their inability
to work with long sequences. This problem raises the topic of the dependence of the detection quality
on the length of the input. In article [25], a comparative analysis of the quality of discriminators when
training models with the same parameters but diferent context lengths is carried out in order to identify
the importance of context length and its influence on the quality of detection. There is an optimal
length for diferent languages, which may vary, but at the same time there is a certain level that allows
to maintain the quality of classification. All the same classifiers based on Transformer architecture
become especially vulnerable to adversarial attacks [26]. That is, the input data of a machine learning
model is maliciously manipulated to force it to make incorrect predictions.</p>
    </sec>
    <sec id="sec-3">
      <title>3. System Overview</title>
      <sec id="sec-3-1">
        <title>3.1. Problem statement</title>
        <p>Let  = {(, )}=1,  is the number of elements in the dataset . Each pair (, ) consists of
a text , i.e. a sequence of tokens, and its corresponding label . This label indicates whether the
text has been artificially generated or not. Let  = {1, . . . ,  } and  ∈ , where  describes all
possible tokens corresponding to the language,  — the number of tokens in one text, this parameter
can be diferent for each text.</p>
        <p>Let’s set the model  :  →  , where  = {0, 1} then the task is to find the binary classifier that
minimizes an empirical risk on the dataset:
 = arg min</p>
        <p>∑︁
(,)∈
[() ̸= ].</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Methods</title>
        <p>3.2.1. Basic approach
For a more complete study of the problem of classifying machine-generated texts, we first considered
several basic approaches. It was decided to give the result of one of them — large language model
Miscrosoft Phi-2 [27] with OneClassSVM [28]. This method is based on the research of features
describing the characteristics of human-written texts. These features are taken from LLM output, by
splitting the text into tokens and getting logits from the model.</p>
        <p>Here is a list of the significant features:
• median of the entropy vector;
• median of the surprise vector;
• average number of characters per word;
• standard deviation of the surprise vector;
• 5th percentile of the entropy vector;
• 5th percentile of the surprise vector;
• maximum of the surprise vector.</p>
        <p>After collecting features with the assistance of a language model then they are fed to the OneClassSVM
algorithm input, which learns the distribution of such data. Also, during the inference the model will
better understand that machine-generated texts do not come from a similar to human distribution.</p>
        <p>The main results for this approach are presented in Table 1. This algorithm training stage was
conducted on our collected dataset, that will be described in Section 3.2.2. The test stage was considered
on the dataset with news that was provided by the competition organizers for research. It can be seen
that this method does not show the best result, but it allows us to evaluate the ability to extract useful
features using large language models. We cannot compare the quality of the models presented by the
organizers as a baseline solution and our baseline solution, due to diferent datasets for testing.</p>
        <p>One important note, LLM+OneClassSVM method was not used as our final solution at the PAN
competition. Here we have tested the hypothesis of quality of statistics from large language models for
artificial text detection metrics.</p>
        <p>The table also shows the model performance for a diferent number of features to reveal the change in
the quality of classification of artificial texts. The number of signs varies in this table. For the 3-feature
approach, we considered median of the entropy vector, 5th percentile of the entropy vector and average
number of characters per word. When using 4 features (our best result), median of the surprise vector
was added. For the third model with 5 features, standard deviation of the surprise vector was added.
3.2.2. LAVA
In the context of the competition, the data provided is one of the main components. Only a limited
news dataset was ensured by the PAN organizers. Given its small size, we decided to collect our own
dataset for training. We did not generate anything ourselves, but only searched among open sources for
datasets on relevant topics specified in the description of the competition. To apply the idea of training
multiple adapters, we split this set into several:
• Dataset A with texts from the GPT family models (GPT-3, GPT-3.5, GPT-4);
• Dataset B with texts from the LLama and Mistral family models (Llama-2, Llama-2, Mistral-v0.1,</p>
        <p>Mistral-v0.2);
• Dataset C with a small number of texts from more diferent models (Vicuna, OPT, BLOOM, Alpaca,</p>
        <p>
          Gemini Pro);
For training, the idea of adaptation, i.e. reusing the same base model with diferent adapters for diferent
problems, was utilized. We are working with an upgraded version of LoRA [29] (Figure 1) — QLoRA [
          <xref ref-type="bibr" rid="ref11">30</xref>
          ],
which quantizes the precision of the weight parameters in the pre-trained LLM with a precision of up
to 4 bits. So, our main approach is to train lightweight adapters for the Mistral-v0.2 language model,
where each adapter is trained on a separate dataset containing examples from diferent distributions.
Thus, we have three adapters for one language model.
        </p>
        <p>The inference stage was conducted using a competition system. Having trained the adapters, we
aggregate them to improve the performance of the resulting model. Combining, merging and measuring
the weights of three adapters did not result in metrics increase, but the combination of answers from
each of them contributed to the high performance of the target metrics. If all three adapters in the
example predict class 0 (human-written), only then do we put class 0 in the final markup, otherwise - 1
(machine-generated). This way of aggregation is necessary to get the best accuracy in detecting human
texts. If none of the trained adapters tends to select a generated label, it means that the text is more
than likely written by a human.</p>
        <p>
          By training multiple adapters, we reward the model with high generalization capability since each
adapter captures the dependencies of each distribution. We consider the Mistral [
          <xref ref-type="bibr" rid="ref12">31</xref>
          ] approach, as it
shows itself perfectly in solving diferent tasks [
          <xref ref-type="bibr" rid="ref13">32</xref>
          ]. The idea of lightweight adapters helps us avoid
learning again while we maintain knowledge of the model. As new model families become available,
we can simply add a new trained adapter to the current ones and take it into account when aggregating.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>0.937
0.967
0.972
0.876
0.795
0.697
0.668
95-th quantile
75-th quantile
Median
25-th quantile</p>
      <p>Min</p>
      <p>The Table 3 shows that the proposed LAVA (“mariented-pantone") approach wins almost all baselines
in terms of quality, except Binoculars baseline. It is also worth noting that our method is not limited by
resources, since we can train any number of adapters for a diferent family of data generated by models.
In addition, since we aggregate the outputs, our approach has more confidence in predicting the class,
depending on the results of the adapters.</p>
      <p>As for approach "ferocious-coot", behind it there is not aggregation of adapters, but the use of only
one of all the adapters trained in the LAVA method. Its quality turned out to be lower, so aggregation
was used as the final choice. Which in turn gives the insight that using diferent adapters on diferent
family of data makes sense for quality gains.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, a study was conducted on the classification of machine-generated texts according to
the PAN competition. The analysis of the area and the research were carried out in existing methods
for searching machine-generated excerpts. A new approach is proposed based on training a group of
ROC-AUC Brier C@1</p>
      <p>F0.5 Mean
QLoRA adapters to identify patterns in the distribution of data, which helps to make a more accurate
classification of texts. Adapters make it possible to easily obtain a high-quality model in diferent
narrow domains without having to retrain the entire large language model. We have tried diferent
approaches to aggregating predictions, but we used the OR operation. Our model outputs class 0 only if
all trained adapters output such a class. From comparing the approaches, it can be seen that our model
works above almost all the presented baselines, with mean value across competition’s metrics equal to
0.954. This method is eficient and flexible in terms of running time and resource intensity.</p>
      <p>In future work, we would like to investigate more the use of multiple adapters for the artificial text
detection task. We aim to compare other aggregation methods, as well as to assign weights to each
adapter depending on the importance and popularity of the family for the given task. In addition, it is
necessary to try this approach on other language models, because this method is not tied to only one
model.
[11] R. Wang, D. Tang, N. Duan, Z. Wei, X. Huang, G. Cao, D. Jiang, M. Zhou, et al., K-adapter: Infusing
knowledge into pre-trained models with adapters, arXiv preprint arXiv:2002.01808 (2020).
[12] A. A. Ayele, N. Babakov, J. Bevendorf, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag,
M. Fröbe, D. Korenčić, M. Mayerl, D. Moskovskiy, A. Mukherjee, A. Panchenko, M. Potthast,
F. Rangel, N. Rizwan, P. Rosso, F. Schneider, A. Smirnova, E. Stamatatos, E. Stakovskii, B. Stein,
M. Taulé, D. Ustalov, X. Wang, M. Wiegmann, S. M. Yimam, E. Zangerle, Overview of PAN 2024:
Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking
Analysis, and Generative AI Authorship Verification, in: L. Goeuriot, P. Mulhem, G. Quénot,
D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro
(Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of
the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in
Computer Science, Springer, Berlin Heidelberg New York, 2024.
[13] J. Bevendorf, M. Wiegmann, J. Karlgren, L. Dürlich, E. Gogoulou, A. Talman, E. Stamatatos,
M. Potthast, B. Stein, Overview of the “Voight-Kampf” Generative AI Authorship Verification
Task at PAN and ELOQUENT 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera
(Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR
Workshop Proceedings, CEUR-WS.org, 2024.
[14] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
978-3-031-28241-6_20.
[15] S. Badaskar, S. Agarwal, S. Arora, Identifying real or fake articles: Towards better language
modeling, in: Proceedings of the Third International Joint Conference on Natural Language
Processing: Volume-II, 2008.
[16] Y. Freund, R. Schapire, N. Abe, A short introduction to boosting, Journal-Japanese Society For</p>
      <p>Artificial Intelligence 14 (1999) 1612.
[17] V. Pérez-Rosas, B. Kleinberg, A. Lefevre, R. Mihalcea, Automatic detection of fake news, arXiv
preprint arXiv:1708.07104 (2017).
[18] A. Roy, K. Basak, A. Ekbal, P. Bhattacharyya, A deep ensemble framework for fake news detection
and classification, 2018. arXiv:1811.04670.
[19] L. O. Chua, CNN: A paradigm for complexity, volume 31, World Scientific, 1998.
[20] Z. Huang, W. Xu, K. Yu, Bidirectional lstm-crf models for sequence tagging, arXiv preprint
arXiv:1508.01991 (2015).
[21] LIAR, https://datasets.activeloop.ai/docs/ml/datasets/liar-dataset/, 2017.
[22] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[23] Z. Lu, P. Du, J.-Y. Nie, Vgcn-bert: augmenting bert with graph embedding for text classification,
in: Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020,
Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I 42, Springer, 2020, pp. 369–382.
[24] D. Ippolito, D. Duckworth, C. Callison-Burch, D. Eck, Automatic detection of generated text is
easiest when humans are fooled, arXiv preprint arXiv:1911.00650 (2019).
[25] G. Gritsay, A. Grabovoy, Y. Chekhovich, Automatic detection of machine generated texts: Need
more tokens, in: 2022 Ivannikov Memorial Workshop (IVMEM), IEEE, 2022, pp. 20–26.
[26] M. Behjati, S.-M. Moosavi-Dezfooli, M. S. Baghshah, P. Frossard, Universal adversarial attacks on
text classifiers, in: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), IEEE, 2019, pp. 7345–7349.
[27] Microsoft LLM, https://huggingface.co/microsoft/phi-2, 2023.
[28] OneClassSVM, https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html,
2001.
[29] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank</p>
    </sec>
    <sec id="sec-6">
      <title>A. Obtained collection sources</title>
      <p>
        GPT-wiki-intro-extension [
        <xref ref-type="bibr" rid="ref14">33</xref>
        ]
TinyStories v.1 [
        <xref ref-type="bibr" rid="ref15">34</xref>
        ]
ChatGPT-Research-Abstracts
Ghostbuster [
        <xref ref-type="bibr" rid="ref16">35</xref>
        ]
HC-Var [
        <xref ref-type="bibr" rid="ref17">36</xref>
        ]
TinyStories v.2
ShareGPT
Source
GPT-3
GPT-3.5
GPT-3.5
GPT-3.5
GPT-3.5
GPT-4
GPT-4
ChatGPT-Detection-PR-HPPT [
        <xref ref-type="bibr" rid="ref18">37</xref>
        ]
SniferBench
MULTITuDE [
        <xref ref-type="bibr" rid="ref19">38</xref>
        ]
Cosmopedia v0.1 [
        <xref ref-type="bibr" rid="ref20">39</xref>
        ]
      </p>
      <p>Llama</p>
      <p>Llama, Alpaca
Llama, Alpaca, OPT, Vicuna</p>
      <p>
        Mistral
MAGE [
        <xref ref-type="bibr" rid="ref21">40</xref>
        ]
M4 [
        <xref ref-type="bibr" rid="ref22">41</xref>
        ]
HANSEN [
        <xref ref-type="bibr" rid="ref23">42</xref>
        ]
      </p>
      <p>BLOOM
BLOOM, Cohere
Vicuna, GPT-3.5</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] ChatGPT by OpenAI</article-title>
          , https://chat.openai.com,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Google</given-names>
            <surname>Bard</surname>
          </string-name>
          , https://google-bard-ai.com/try-bard/,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Jasper</surname>
          </string-name>
          , https://www.jasper.
          <source>ai/</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] YaGPT by Yandex</article-title>
          , https://yandex.ru/project/alice/yagpt,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] GigaChat by SberDevices</article-title>
          , https://developers.sber.ru/portal/products/gigachat,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Zhang,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , X. Cheng, Y. Zhang, H. Hu,
          <article-title>Argugpt: evaluating, understanding and identifying argumentative essays generated by gpt models</article-title>
          ,
          <source>arXiv preprint arXiv:2304.07666</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yi</surname>
          </string-name>
          , Q. Cheng, Y. Huang,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          , X. Liu,
          <article-title>Ai vs. human-diferentiation analysis of scientific content generation</article-title>
          ,
          <source>arXiv preprint arXiv:2301.10416</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gritsay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. V.</given-names>
            <surname>Chekhovich</surname>
          </string-name>
          ,
          <article-title>Artificially generated text fragments search in academic documents</article-title>
          , in: Doklady Mathematics, volume
          <volume>108</volume>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>S434</fpage>
          -
          <lpage>S442</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>He</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ding</surname>
          </string-name>
          , L. Cheng, J.-W. Low,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <article-title>On the efectiveness of adapter-based tuning for pretrained language model adaptation</article-title>
          ,
          <source>arXiv preprint arXiv:2106.03164</source>
          (
          <year>2021</year>
          ).
          <source>adaptation of large language models</source>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2106</volume>
          .
          <fpage>09685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pagnoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          , L. Zettlemoyer, Qlora: Eficient finetuning of quantized llms,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>14314</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Mistral</surname>
            <given-names>AI</given-names>
          </string-name>
          , https://huggingface.co/docs/transformers/main/model_doc/mistral,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>LMSYS</given-names>
            <surname>Chatbot Arena</surname>
          </string-name>
          <string-name>
            <surname>Leaderboard</surname>
          </string-name>
          , https://chat.lmsys.org/?leaderboard,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Aaditya</surname>
            <given-names>Bhat</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gpt-</surname>
          </string-name>
          wiki-intro
          <source>(revision 0e458f5)</source>
          ,
          <year>2023</year>
          . URL: https://huggingface.co/datasets/ aadityaubhat/GPT-wiki-intro.
          <source>doi:10</source>
          .57967/hf/0326.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>R.</given-names>
            <surname>Eldan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Tinystories: How small can language models be and still speak coherent english</article-title>
          ?,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.07759. arXiv:
          <volume>2305</volume>
          .
          <fpage>07759</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>V.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fleisig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tomlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klein</surname>
          </string-name>
          , Ghostbuster:
          <article-title>Detecting text ghostwritten by large language models</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2305.15047. arXiv:
          <volume>2305</volume>
          .
          <fpage>15047</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cui</surname>
          </string-name>
          , A. Liu, H. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>On the generalization of training-based chatgpt detection methods</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>01307</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Is chatgpt involved in texts? measure the polish ratio to detect chatgptgenerated text</article-title>
          ,
          <source>APSIPA Transactions on Signal and Information Processing</source>
          <volume>13</volume>
          (????).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>D.</given-names>
            <surname>Macko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Moro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Uchendu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lucas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yamashita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pikuliak</surname>
          </string-name>
          , I. Srba,
          <string-name>
            <given-names>T.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Simko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bielikova</surname>
          </string-name>
          , Multitude:
          <article-title>Large-scale multilingual machine-generated text detection benchmark</article-title>
          ,
          <source>in: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2023</year>
          . URL: http://dx.doi.org/10.18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>616</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>616</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ben Allal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lozhkov</surname>
          </string-name>
          , G. Penedo,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          , L. von Werra, Cosmopedia,
          <year>2024</year>
          . URL: https:// huggingface.co/datasets/HuggingFaceTB/cosmopedia.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Mage:
          <article-title>Machine-generated text detection in the wild</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>13242</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsvigun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Whitehouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. Mohammed</given-names>
            <surname>Afzal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mahmoud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sasaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , P. Nakov,
          <article-title>M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection</article-title>
          , in: Y. Graham, M. Purver (Eds.),
          <source>Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics, St. Julian's, Malta</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1369</fpage>
          -
          <lpage>1407</lpage>
          . URL: https: //aclanthology.org/
          <year>2024</year>
          .
          <article-title>eacl-long</article-title>
          .
          <volume>83</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>N. I.</given-names>
            <surname>Tripto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Uchendu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Setzu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Hansen: Human and ai spoken text benchmark for authorship analysis</article-title>
          ,
          <source>arXiv preprint arXiv:2310.16746</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>