<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Authorial Language Models For AI Authorship Verification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Weihang Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jack Grieve</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of English Language and Linguistics, University of Birmingham</institution>
          ,
          <addr-line>Edgbaston, Birmingham, B152TT</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>In this paper, we introduce the use of Authorial Language Models (ALMs) for AI Authorship Verification (AIAV). Given two texts, where one is written by a human and one is written by a machine, AIAV is the task of determining which text was written by the machine (or alternatively by the human). Our approach to resolving this task involves using a support vector machine to predict which text is written by a machine based on perplexity scores for a set of language models that were each fine-tuned on texts generated by a set of language models. We submitted our method as a docker-contained software for independent evaluation on the main testing dataset, and its variants that are obfuscated against detection. On the main dataset, we have been informed that our method has achieved a score of approximately 0.979 on all proposed evaluation measures, including ROC-AUC Brier C@1 F1 and F0.5, beating all baseline methods. And on the variants of main datasets, we achieve a median score of 0.935 which also beats all baselines. We attributes the success of ALMs in this context to the power of using many fine-tuned authorial language models, which we believe improves the resilience of our approach by maximising the amount of potentially discriminating information drawn from the underlying textual data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM Detection</kwd>
        <kwd>Large Language Model</kwd>
        <kwd>Perplexity</kwd>
        <kwd>Authorship Verification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        Large Language Models (LLMs) are models, consisting of millions to billions of parameters, that
predict the probability distribution of tokens given their observed context. Most LLMs are based on the
transformer deep learning architecture, which was introduced in 2017 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While LLMs consisting of
millions of parameters (e.g. GPT-2) have been capable of producing texts with human-level fluency for
years, the more recent development of LLMs with billions of parameters (e.g. GPT-3.5, GPT-4, Llama)
has made it possible to generate text via prompting. This now allows almost anyone to easily and
quickly generate machine-written texts of very high quality. Although this type of automated writing
has great potential value for society, there is also considerable concern about its misuse [
        <xref ref-type="bibr" rid="ref7 ref8 ref8">7, 8, 9, 10, 8, 10</xref>
        ].
While LLM security is a broad topic that requires eforts from across academia and industry to mitigate
the risk of LLM misuse and abuse, LLM detection is clearly an indispensable parts of this endeavor.
      </p>
      <p>LLM Detection is a family of tasks within authorship analysis that involve identifying
machinewritten texts and distinguishing machine writing from human writing. From this broad definition,
several more specific types of LLM detection tasks can be identified [ 11]. Arguably, at the most basic
level, the problem is to distinguish between pairs of texts, where one text is written by a human and one
text is written by a machine [11]. This task, which is the focus of the PAN@CLEF2024 shared task, is
referred to as AIAV. Several solutions to AIAV have been proposed, including PPMd Compression-based
Cosine, Authorship Unmasking, Binoculars, DetectLLM LRR and NPR, DetectGPT and Fast-DetectGPT,
which act as the baselines for this shared task[11].</p>
      <p>Perplexity (PPL) and perplexity-related measures have been at the core of many attempts to
automate LLM detection. Perplexity is defined as the exponentiated mean log-likelihood of a text over
a LLM, as described in the following formula.</p>
      <p>{︃</p>
      <p>1 ∑︁  ( (|&lt;))
   (, ) =  − 

}︃
where  = {1, 2, ..., } is the sequence of tokens (e.g., the text),  is the length of the sequence
(i.e. number of tokens),  is the LLM, and  (|&lt;) is the predicted probability of the th token
given an LLM and the preceding tokens in the sequence. Perplexity measures the predictability of a
text over the LLM, and is commonly used in LLM training as a potential loss function and evaluation
metric. The higher perplexity is, the less predictable a text is to the LLM.</p>
      <p>
        Perplexity for LLM detection is a common approach because LLM-generated texts are generally
assumed to be associated with a lower perplexity than human-written texts for any given LLM. This
approach is especially common in hybrid LLM-detection solutions that are designed to assist humans
distinguishing human-written texts from LLM-generated ones [
        <xref ref-type="bibr" rid="ref8">8, 14</xref>
        ]. LLM detection via perplexity,
however, also has clear limitations. From a technical standpoint, it relies heavily on using LLMs for
the calculation of perplexity. More fundamentally, it is also entirely possible for human texts to be
associated with relatively low perplexity scores for generic LLMs. In general, such approaches therefore
appear to be overly simplistic. To mitigate the risks of prediction failure, especially avoiding false
positive, where texts written by real human authors are flagged as being machine-generated, researchers
have therefore expanded on this approach, for example, by incorporating a pair of pertained LLMs
rather than a single LLM into their LLM detection systems[15].
      </p>
      <p>Authorial Language Models (ALMs) is a paradigm for authorship analysis that relies on training a
set of fine-tuned authorial models based on the available writing samples for each candidate author[ 12].
Unlike most previous LLM-based approaches to authorship analysis that use only one LLM[16, 17],
ALMs involves using multiple LLMs, one for each candidate author, to better capture authorial variation
in token predictability. This makes ALMs more resilient to exceptional or extreme cases because this
approach does not relying on a single LLM, while allowing for greater amounts of information to be
extracted from the underlying textual data, as the LLMs are fine-tuned on a candidate-by-candidate
basis. Furthermore, ALMs is also more interpretable as it can provide token-level predictability metrics
for the questioned document for each candidate author. Because of these advantages, we have found
that ALMs outperforms all other state-of-the-art methods (N-grams NN, BERT, and PPM) for human
authorship attribution on the Blogs50 dataset, while nearly matching the performance of N-grams NN,
which achieves the best results, on the CCAT50 datasets[12]. For this shared task, we have therefore
modified ALMs for AIAV, as we detail in the next section.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Datasets</title>
      <p>The PAN@CLEF2024 AIAV shared tasks involves two groups of datasets: the bootstrap group and the
testing group.</p>
      <p>The bootstrap group was open for method development. In the bootstrap group, one dataset contains
1087 texts that were generated by 13 widely-used LLMs ranging from Llama[18] to GPT-4[19], together
with 1087 texts that were authored by humans. Regardless of the author, texts in bootstrap dataset
are full or trimmed news articles. In this study, we used the bootstrap datasets for the fine-tuning of
authorial language models and the training of the support vector machine classifier. We then developed
our method and submitted it to the tira.io[13],</p>
      <p>Meanwhile, the testing group is retained by PAN2024 organizers and was not made available to
participants in the shared task. Rather, this dataset was used to independently test the systems submitted
to tira.io for assessment. Specifically, the PAN2024 organizers tested our system on the testing group of
datasets, which includes one main dataset, plus nine variants of datasets that are obfuscated against AI
verification methods. For the details of testing group datasets, PAN2024 organizers plan to release the
basic facts and compilation details in the overview notebook[11].</p>
    </sec>
    <sec id="sec-4">
      <title>4. System Overview</title>
      <p>ALMs for AIAV is a version of ALMs that is tailored to the needs of AIAV shared task. ALMs for AIAV
is based on the idea of using perplexity for LLM detection, where human-authored texts are assumed
to have substantially higher perplexity compared to LLM-authored ones. However, this assumption
has exceptions if we only consider perplexity from a single LLM: there are human-authored texts
with relative low perplexity, and LLM-authored texts with relatively high perplexity, both of which
undermine LLM detection using this approach. Hence, during the development of our method, we
hypothesized that these exceptions could result from a lack of any attempt to represent in the styles of
diferent LLMs, which we believe could afect LLM detection in two ways.</p>
      <p>On the one hand, confounding variables in training corpus, such as genres, registers, and topics, can
possibly distort perplexity: for example, a human-written texts in a register that is over-represented
in the training corpus would tend to be associated with a relatively low perplexity, whereas an
LLMauthored text written in a register that is underrepresented in the training corpus would tend to be
associated with a relatively high perplexity. On the other hand, the diferences in language modeling
and text generating pipelines can also lead exceptional perplexity values: for example, a LLMs that
uses a distinctively unique pipeline would potentially generate texts that are more unexpected to other
existing LLMs and hence be associated with relatively high perplexity.</p>
      <p>Although these issues cannot be completely eliminated, we believe these issues can be mitigated
by using not one but many LLMs. By fine-tuning pre-trained LLMs to correspond to each of the
potential LLMs in the detection task, we can build perplexity array that take into account the styles of
diferent LLMs. Meanwhile, based on this perplexity array, perplexity values for each of the LLMs can
be compared against one another, which further mitigates the risk from under-representing LLM styles.</p>
      <p>Like ALMs for human authorship attribution, the first step for ALMs for AIAV is the fine-tuning of a
series of pre-trained LLMs that correspond to each of the potential LLM "authors". These fine-tuned
authorial language models are then used to extract a perplexity array for each pair of question texts.
The perplexity arrays are then used as feature vectors in a pre-trained Support Vector Machine (SVM)
classifier to decide which of the two texts is most likely written by a human. Finally, the prediction
result is outputted as the _ℎ score, as requested by the shared tasks[11]. _ℎ ranges
between 0 and 1, where 0 means the first text is considered human-written, and 1 means the second text
is considered human-written. The workflow of ALMs for AIAV is described as flowchart in Figure 1,
Figure 2, and Figure 3. The details of each step are described in the following subsections.</p>
      <sec id="sec-4-1">
        <title>4.1. Fine-tuning Authorial Language Models</title>
        <p>The first step of the ALMs for AIAV is the fine-tuning of pre-trained LLMs on the texts from each
candidate author. However, in the AIAV shared task, candidates are grouped by whether it is human(e.g
LLMs group and human group). Though the number of authors in human group is unclear, the number
of authors in the LLM groups is specified. Therefore, we can take the LLMs listed in the bootstrap
dataset[20] as potential "authors" for the fine-tuning of the authorial language models. We take 80% for
each of the LLM datasets as the training data for LLM fine-tuning, and we retain the remaining 20% for
use in further steps. As most of these potential models are causal language models, we choose GPT-2
base, a canonical causal language model as the pre-trained model for fine-tuning. We then fine-tune 13
GPT-2 models on the texts from 13 potential LLM "authors". For each case, we fine-tune each LLM for
20 epochs with a weight decay of 0.01, an initial learning rate of 0.00002, and a gradient accumulation</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Perplexity Array Extraction</title>
        <p>Although perplexity is defined as the exponentiated mean log-likelihood of all tokens in a text, for
eficiency purposes, we calculate perplexity based on cross entropy using the formula below:
   (, ) =  { (, )}</p>
        <p>Given an input text , and a fine-tuned authorial GPT-2 model  , we first pass  to the GPT-2
BPE Tokenizer to extract a token sequence .  is then passed to  for language modeling, whose
output is . Here  reflects the predicted probabilities of all tokens in , where  represents
the ground truth. Therefore, in the next step, we measure the predictability of all tokens in  by
comparing the predicted  and the ground truth  via cross entropy, which we calculate using
ℎ.. from PyTorch. Finally, we obtain the perplexity of  under  by
exponentiated cross entropy.</p>
        <p>For each input text , we calculate its perplexity under each of the 13 fine-tuned, authorial language
models. We store these perplexity values in an 13*1 array, which we flag as a perplexity array for input
text . The perplexity array is then used in the next step as a feature array to make a prediction on
each questioned text pair.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Authorship Prediction via Support Vector Machine</title>
        <p>Given an input question text pair 1 and 2, we first extract their perplexity arrays from the 13
authorial language models. We then move to authorship prediction based on the two perplexity arrays.
For this stage, we trained an SVM using a reconstituted dataset composed of paired perplexity arrays
from the human data in bootstrap dataset and the remaining 20% of LLM-generated texts that we
retained during ALMs fine-tuning. In this dataset, we paired perplexity arrays following the description
of the shared tasks, where, for each pair of texts, we guarantee that one text is human authored and
the other text is LLM-generated. We trained the SVM classifier with a radial basis function kernel, a
regularization parameter of 1.0, and a tolerance for stopping criterion of 0.001. We did not impose hard
training steps or epochs for the SVM classifier.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We submitted our ALMs for AIAV as docker contained software for the benchmarking on the tira.io
group of testing datasets[13]. During testing, our method was labeled as "greasy-chest". Table 1 shows
the results, initially pre-filled with the oficial baselines provided by the PAN organizers and summary
statistics for all submissions to the shared task (i.e., the maximum, median, minimum, and 95-th, 75-th,
and 25-th percentiles over all submissions to the task). We find that our method beat all existing
baselines for all evaluation measures and performs in the top 25% for all submissions to this shared task.</p>
      <p>In addition, Table 2 shows the summarized results averaged (arithmetic mean) over all 10 variants of
the test dataset. Each dataset variant applies one potential technique to measure the robustness of the
AIAV systems, including but not limited to switching the text encoding, translating the text, switching
the domain, and manual obfuscation by humans. Our method achieves a median score of 0.935 over the
9 variants, which also surpasses all existing baselines and is among the top 25% of all submissions to
this shared task. However, we also notice that our method has a relative low minimum score over 9
variants, suggesting that further investigations are needed for the most challenging dataset variant.
Our submission(as team "jaha") scores 17th out of 30 on the leaderboard with a ranking score of 0.683.
6. Conclusion
In this paper, we have introduced ALMs For AIAV, a generative AI verification method that utilizes
ifne-tuned authorial language models and a support vector machine classifier to predict which text is
written by human in a human- and machine-written text pair. We found that our method has a score
of 0.979 in ROC-AUC Brier C@1 F1 and F0.5, which is better than all baseline methods. We attribute
the excellent performance of ALMs for AIAV’s to the ALMs paradigm, which uses many fine-tuned
authorial language models, providing greater flexibility and resilience than is possible if only one LLM
is used, as has often been the case in previous methods for authorship identification.</p>
      <p>Future research may focus on the improvement of authorial prediction methods and use a regressor
instead of classifier as proposed in this paper. In addition, it is also worthwhile to experiment with
in-context learning (ICL) of LLMs with billions of parameters to see whether ICL could be an efective
replacement for the fine-tuning approach we have taken, since ICL would potentially enable a few-shot
implementation of our method.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We would like to thanks PAN2024 Organizers for their eforts, and Maik Fröbe for the information
provided during the submission and evaluation of our software. We would also like to thank Akira
Murakami for his support in the development of ALMs.</p>
      <p>This research is supported in part by the Ofice of the Director of National Intelligence (ODNI),
Intelligence Advanced Research Projects Activity (IARPA), via the HIATUS Program contract
#202222072200006. The views and conclusions contained herein are those of the authors and should not
be interpreted as necessarily representing the oficial policies, either expressed or implied, of ODNI,
IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints
for governmental purposes notwithstanding any copyright annotation therein.
[9] Y. Tian, H. Chen, X. Wang, Z. Bai, Q. Zhang, R. Li, C. Xu, Y. Wang, Multiscale positive-unlabeled
detection of ai-generated texts (2023). URL: http://arxiv.org/abs/2305.18149, arXiv:2305.18149 [cs].
[10] K. Wu, L. Pang, H. Shen, X. Cheng, T.-S. Chua, Llmdet: A large language models detection tool
(2023). URL: http://arxiv.org/abs/2305.15004, arXiv:2305.15004 [cs].
[11] J. Bevendorf, M. Wiegmann, J. Karlgren, L. Dürlich, E. Gogoulou, A. Talman, E. Stamatatos,
M. Potthast, B. Stein, Overview of the “Voight-Kampf” Generative AI Authorship Verification
Task at PAN and ELOQUENT 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera
(Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR
Workshop Proceedings, CEUR-WS.org, 2024.
[12] W. Huang, A. Murakami, J. Grieve, Alms: Authorial language models for authorship attribution,
2024. arXiv:2401.12005.
[13] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
978-3-031-28241-6_20.
[14] S. Chakraborty, A. S. Bedi, S. Zhu, B. An, D. Manocha, F. Huang, On the Possibilities of AI-Generated</p>
      <p>Text Detection, 2023. URL: http://arxiv.org/abs/2304.04736, arXiv:2304.04736 [cs].
[15] A. Hans, A. Schwarzschild, V. Cherepanova, H. Kazemi, A. Saha, M. Goldblum, J. Geiping, T.
Goldstein, Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text, 2024.</p>
      <p>URL: http://arxiv.org/abs/2401.12070, arXiv:2401.12070 [cs].
[16] J. Tyo, B. Dhingra, Z. C. Lipton, On the state of the art in authorship attribution and authorship
verification (2022). URL: http://arxiv.org/abs/2209.06869, arXiv:2209.06869 [cs].
[17] G. Barlas, E. Stamatatos, Cross-Domain Authorship Attribution Using Pre-trained Language
Models, volume 583 of IFIP Advances in Information and Communication Technology, Springer
International Publishing, Cham, 2020, p. 255–266. URL: http://link.springer.com/10.1007/
978-3-030-49161-1_22. doi:10.1007/978-3-030-49161-1_22.
[18] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, et al., Llama 2: Open Foundation
and Fine-Tuned Chat Models, 2023. URL: https://arxiv.org/abs/2307.09288v2.
[19] OpenAI, GPT-4 Technical Report, 2023. URL: http://arxiv.org/abs/2303.08774, arXiv:2303.08774
[cs].
[20] J. Bevendorf, M. Wiegmann, M. Potthast, B. Stein, E. Stamatatos, PAN24 Voight-Kampf Generative
AI Authorship Verification, 2024. URL: https://doi.org/10.5281/zenodo.10718757. doi: 10.5281/
zenodo.10718757.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Koppel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Argamon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Winter</surname>
          </string-name>
          , The “Fundamental Problem” of Authorship Attribution,
          <source>English Studies</source>
          <volume>93</volume>
          (
          <year>2012</year>
          )
          <fpage>284</fpage>
          -
          <lpage>291</lpage>
          . URL: http://www.tandfonline.com/doi/abs/10.1080/0013838X.
          <year>2012</year>
          .
          <volume>668794</volume>
          . doi:
          <volume>10</volume>
          .1080/0013838X.
          <year>2012</year>
          .
          <volume>668794</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Heini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kredens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          , R. OrtegaBueno, P. Pęzik,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wolska</surname>
          </string-name>
          , E. Zangerle, Overview of pan 2022:
          <article-title>Authorship verification, profiling irony and stereotype spreaders, and style change detection</article-title>
          , in: A.
          <string-name>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , G. Da San Martino, M. Degli
          <string-name>
            <surname>Esposti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Pasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>382</fpage>
          -
          <lpage>394</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chinea-Ríos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franco-Salvador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Heini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Körner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kredens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pundefinedzik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wolska</surname>
          </string-name>
          , E. Zangerle, Overview of pan 2023:
          <article-title>Authorship verification, multi-author writing style analysis, profiling cryptocurrency influencers, and trigger detection: Extended abstract</article-title>
          ,
          <source>in: Advances in Information Retrieval: 45th European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2023</year>
          , Dublin, Ireland, April 2-
          <issue>6</issue>
          ,
          <year>2023</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>III</given-names>
          </string-name>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2023</year>
          , p.
          <fpage>518</fpage>
          -
          <lpage>526</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>60</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>60</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stakovskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention Is All You Need,
          <year>2017</year>
          . URL: http://arxiv.org/abs/1706.03762, arXiv:
          <fpage>1706</fpage>
          .03762 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , Language Models are Unsupervised Multitask Learners (????)
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Hudson</surname>
          </string-name>
          , E. Adeli,
          <string-name>
            <given-names>R.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>von</article-title>
          <string-name>
            <surname>Arx</surname>
          </string-name>
          , et al.,
          <article-title>On the opportunities and risks of foundation models (</article-title>
          <year>2022</year>
          ). URL: http://arxiv.org/abs/2108.07258, arXiv:
          <fpage>2108</fpage>
          .07258 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Strobelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Gltr:
          <article-title>Statistical detection and visualization of generated text, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , p.
          <fpage>111</fpage>
          -
          <lpage>116</lpage>
          . URL: https://www.aclweb.org/anthology/P19-3019. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          -3019.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>