<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Team Nexus Interrogators at PAN: Voight-Kampf Generative AI Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samiya Ali Zaidi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huzaifah Tariq Ahmed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarrah Ali Akbar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ziaullah Shakeel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faisal Alvi</string-name>
          <email>faisal.alvi@sse.habib.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdul Samad</string-name>
          <email>abdul.samad@sse.habib.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dhanani School of Science and Engineering, Habib University</institution>
          ,
          <addr-line>Karachi</addr-line>
          ,
          <country country="PK">Pakistan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The Voight-Kampf task at PAN CLEF 2025 challenges participants to detect and categorize AI-generated text in an era of increasingly human-like language models. In this work, we develop a two-stage system leveraging ifne-tuned transformer architectures to tackle both binary and multi-class authorship verification. For Subtask 1, we fine-tune a bert-base-uncased model to distinguish human-written from machine-generated text, achieving near-perfect performance across genres with minimal false positives. For Subtask 2, we address severe class imbalance in multi-class collaborative authorship detection by augmenting underrepresented categories using backtranslation, synonym/antonym replacement, and random deletion. Fine-tuning a roberta-large model on this enriched dataset yields significant gains, particularly in minority classes. Our results underscore the efectiveness of combining targeted data augmentation with robust transformer-based models to capture subtle distinctions in authorship, ofering a scalable foundation for detecting generative AI involvement in real-world texts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Voight-Kampf</kwd>
        <kwd>AI-Generated Text Detection</kwd>
        <kwd>Authorship Verification</kwd>
        <kwd>Transformer Models</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>Fine-tuning</kwd>
        <kwd>PAN Lab</kwd>
        <kwd>CLEF 2025</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The increasing use of large language models (LLMs) in content creation has introduced new challenges
in distinguishing between human- and AI-generated text. While generative AI has shown remarkable
capabilities in mimicking human writing, this raises concerns related to academic integrity,
misinformation, and authorship transparency. As AI-assisted writing becomes more sophisticated, robust detection
systems are needed to identify the degree of machine involvement in written texts.</p>
      <p>
        The Voight-Kampf Generative AI Detection 2025 task [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], part of the PAN shared task series
with the ELOQUENT Lab [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], addresses this problem by evaluating detection systems across two
key subtasks. Subtask 1 focuses on binary classification of texts as either entirely human-written or
machine-generated, even in cases where the AI attempts to imitate a specific human writing style
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This tests the sensitivity and robustness of detection methods against adversarial obfuscation and
unseen model outputs.
      </p>
      <p>
        Subtask 2 extends the challenge by introducing multi-class classification of collaborative human-AI
texts, requiring systems to detect nuanced degrees of machine involvement. This includes identifying
when humans post-edit AI-generated drafts, co-write with AI models, or minimally edit
machinegenerated outputs. The goal is not only to improve detection accuracy but also to understand the
spectrum of human-AI collaboration [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>To tackle these challenges, this paper explores a range of techniques, including data augmentation
strategies, finetuning, ensemble methods, and neural classifiers. Our system builds upon prior research
in authorship verification and leverages recent advances in supervised learning, fine-tuning, and hybrid
modeling. We focus on robustness across genres and model types, addressing both fully and partially
machine-generated content.</p>
      <p>The rest of this paper is structured as follows: section 2 presents a review of the related works
focusing on the approaches commonly used for authorship detection. Section 3 describes our approach
to solving both subtasks. Section 4 presents our validation results. Lastly, section 5 concludes our paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Authorship verification has evolved from stylistic analysis of human writing to the detection of
AIgenerated content. Recent work has leveraged both traditional machine learning and deep learning
models for this task. Fine-tuned transformer architectures such as DeBERTa [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and RoBERTa have
achieved high performance in binary classification of human vs. AI text [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], while hybrid models that
combine BERT with CNNs enhance local and contextual feature extraction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Some systems introduce data augmentation and R-Drop regularization [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to improve robustness,
employing loss functions that combine cross-entropy and KL divergence. Ensemble learning approaches
using multiple transformer models (e.g., BERT, RoBERTa, DeBERTa) have shown further improvements
in ROC-AUC scores [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Meanwhile, instructional prompting with T5 has been explored to reframe
authorship detection as a sequence-to-sequence task [8].
      </p>
      <p>Beyond transformers, research has explored lightweight classifiers with embeddings like LUAR for
low-resource scenarios [9], and stylometric analysis using Graph Neural Networks (GNNs) alongside
pre-trained models [10]. Approaches such as Tri-Sentence Analysis [11] and hybrid models like BertT
[12] demonstrate efectiveness in handling short texts and improving generalization.</p>
      <p>Despite promising results, many systems struggle with generalization to novel AI models or obfuscated
styles, highlighting the importance of continual adaptation and diverse training data in generative AI
authorship verification.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In this section, we provide details about the datasets for each task, followed by our methodology for
both subtasks individually.</p>
      <sec id="sec-3-1">
        <title>3.1. Datasets</title>
        <p>The datasets for this task are provided as newline-delimited JSON files. In subtask 1’s dataset, each
entry includes an identifier, the text content, the originating model (human or specific AI model), a
label (0 for human, 1 for AI), and a genre indicator (e.g., essays, news, fiction).</p>
        <p>The dataset for subtask 2, on the other hand, comprises multi-domain documents drawn from
academic sources, journalism, and social media. The data includes a mixture of human-written and
machine-generated samples (produced by models such as GPT-4, Claude, and PaLM) and is annotated
to indicate the type of human-AI collaboration. The dataset spans multiple languages and provides
detailed labels for each collaboration category.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Subtask 1: Voight-Kampf AI Detection Sensitivity</title>
        <p>In this task, our primary objective was to accurately distinguish between human-written and
AIgenerated text. This binary classification problem required a robust modeling pipeline that could
leverage the nuanced diferences between the two categories. The distribution of the dataset used for
this task is illustrated in Figure 1, providing insight into the balance of the data across both classes.</p>
        <p>The first phase of our workflow involved data preprocessing. The original dataset was provided in a
.jsonl format, which is commonly used for storing structured data in a line-delimited manner. To
facilitate data handling and analysis, we first converted this .jsonl file into a Pandas DataFrame. From
this structure, we extracted only the essential fields required for our task: ‘id’, ‘text’, ‘label’.
These fields represent, respectively, the unique identifier of each sample, the content of the text, and
its associated label indicating whether the text was AI-generated or written by a human (0 means
human-written, and 1 means AI-generated).</p>
        <p>After isolating the relevant information, we transformed the dataset into the Hugging Face Dataset
format. This conversion optimized the data pipeline for fine-tuning pre-trained models. The Hugging
Face Dataset object also provides eficient shufling, batching, and tokenization utilities, which are
particularly useful for handling text data at scale.</p>
        <p>With the dataset prepared, we proceeded to the model fine-tuning phase. We leveraged the Hugging
Face transformers library due to its modularity, ease of use, and strong support for state-of-the-art
pre-trained language models. We used the AutoModelForSequenceClassification interface to
load the bert-base-uncased model with two output labels (human and AI), and the AutoTokenizer
for consistent input preprocessing. We selected this variant of BERT for its proven efectiveness in
various natural language understanding tasks, particularly in text classification. The fine-tuning process
involved training the model on the labeled dataset to adapt BERT’s pretrained representations to our
specific task of authorship classification.</p>
        <p>Training was carried out using the Trainer API, which provided integrated training and evaluation
loops, model checkpointing, and metric logging. All hyperparameters used during training, including
learning rate, batch size, and number of epochs, are detailed in Table 1. These parameters were chosen
based on standard practices for fine-tuning transformer models and adjusted to fit the computational
constraints and performance needs of our project.</p>
        <p>We monitored performance after each epoch and retained the best-performing model. For evaluation,
we used the evaluate library to compute micro-averaged F1 scores, ensuring that performance was
balanced across both classes. At inference time, predictions were generated using the Trainer API and
analyzed via a detailed classification report, giving us insights into precision, recall, and F1 score for
both human and AI text classes. This setup ensured a reliable, reproducible training pipeline aligned
with modern standards for fine-tuning transformer-based classifiers.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Subtask 2: Human-AI Collaborative Text Classification</title>
        <p>For this sub-task, our objective was to determine the extent of AI involvement in the generation of a
given piece of text. Unlike the binary classification task described earlier, this problem was framed as a
multi-class classification challenge, where each sample was categorized into one of six distinct labels
based on the degree and type of human-machine collaboration. The classification labels are as follows:
• 0: fully human-written
• 1: human-written, then machine-polished
• 2: machine-written, then machine-humanized
• 3: human-initiated, then machine-continued
• 4: deeply-mixed text, where some parts are written by a human and some are generated by a
machine
• 5: machine-written, then human-edited</p>
        <p>The distribution of samples across these six categories is visualized in Figure 2, which highlights a
substantial class imbalance in the dataset. This imbalance posed a significant challenge, particularly for
training a model capable of accurately distinguishing underrepresented categories.</p>
        <p>As with the earlier task, the dataset was initially provided in .jsonl format. To facilitate
preprocessing and further transformations, we first converted the data into a Pandas DataFrame. From the available
ifelds, only the text and label columns were retained, as these were essential for the classification
task.</p>
        <p>Given the imbalance in class distribution, we implemented several data augmentation techniques
targeting the three least represented classes – 3, 4, and 5. These augmentation strategies were designed
to increase the diversity and volume of examples in the minority classes, thereby helping to mitigate
the efects of class imbalance during training. The augmentation methods used include:
• Backtranslation
• Synonym Replacement
• Antonym Replacement
• Random Deletion</p>
        <p>Each of these strategies was applied separately to the minority classes, after which the augmented
datasets were merged to form an enriched and more balanced training set, as depicted in Table 2.</p>
        <p>This enhanced dataset was utilized to fine-tune the state-of-the-art Roberta-Large Model. The
large variant was chosen to efectively capture the nuances and nonlinearities present in such a complex
dataset. By training on both the original and augmented data, the model became better equipped to
generalize across all six categories of AI-human text interaction. The hyperparameters used are detailed
in Table 1, and the entire workflow is visually represented in Figure 3.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <sec id="sec-4-1">
        <title>4.1. Subtask 1: Voight–Kampf AI Detection Sensitivity</title>
        <p>The exceptionally high recall for the AI class (0.9983) suggests that the detector rarely misses
machinegenerated instances, even when those instances employ novel obfuscation methods. Conversely, the
slight asymmetry in recall (0.9687) for the human-authored class highlights a small proportion of false
positives—AI texts misclassified as human—which could stem from particularly human-like AI outputs.
Overall, the model’s balanced precision and recall showcase its robustness and sensitivity in the face of
adversarial style-mimicking.</p>
        <p>The ROC curves and confusion matrix visualized in Figure 4 further reinforce the model’s high
discriminative ability. The curves show excellent separation between the classes, and the confusion
matrix reveals very few misclassifications, aligning with the reported metrics.</p>
        <p>The scores obtained after running the model on TIRA [13] are presented in Table 4. It showcases our
model’s flawless performance across all genres in the validation phase, achieving a perfect ROC-AUC
of 1.0 and consistently high scores across C1, F1, F0.5U, and Brier metrics—underscoring both its
discriminative power and calibration quality.
0.983
0.980
0.983
0.985</p>
        <p>F1</p>
        <p>Furthermore, the confusion matrices for each genre in the test dataset are shown in Figure 5, which
provides further insight through confusion matrices for each genre on the test set. The model
demonstrates perfect recall in Essays and News (no false negatives), with only 13 and 16 false positives,
respectively, highlighting its conservative and accurate labeling of AI-generated text. In Fiction,
although a small number of misclassifications occur (28 false positives, 4 false negatives), the model still
exhibits strong performance, efectively handling the complexity of creative writing.</p>
        <p>Finally, Table 5 benchmarks our model against leading baselines on the test dataset, where it
outperforms across all major metrics—achieving the highest ROC-AUC (0.865), F1 (0.860), and mean score
(0.879), while maintaining the lowest False Positive Rate (0.131). These results confirm that the model
generalizes well and remains reliable across genres, balancing precision and recall better than all
competing approaches.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Subtask 2: Human–AI Collaborative Text Classification</title>
        <p>Subtask 2 involves a multi-class classification challenge with six distinct levels of collaboration. To
tackle the significant class imbalance, especially for Classes 3, 4, and 5, we implemented targeted data
augmentation techniques. These techniques included back-translation, antonym/synonym substitution,
and random deletion, all aimed at enhancing the representation of the underrepresented categories.</p>
        <p>We fine-tuned a RoBERTa-Large model on the augmented dataset and observed decent scores,
especially in the performance of minority classes. Table 6 summarizes the per-class precision, recall,
F1-score, and overall performance metrics.</p>
        <p>The macro-averaged F1-score of 0.632 shows balanced performance across classes, highlighting the
success of our augmentation strategy in addressing bias toward majority classes. Classes 4 and 5, once
underrepresented, have also seemed to perform well. Class 3 has high precision (0.899) but low recall
(0.336), indicating conservative predictions potentially due to overlap with other classes. Class 1, on the
other hand, has high recall (0.935) but low precision (0.403), suggesting overprediction.</p>
        <p>Therefore, to evaluate the impact of each augmentation method, we fine-tuned separate models
using one technique at a time. Table 7 displays the class-wise precision, recall, and F1-scores. Antonym
replacement and random deletion enhanced macro-level performance, with random deletion achieving
the highest macro F1-score of 0.590.</p>
        <p>
          To contextualize our results, we compare our model’s performance with the oficial PAN shared task
baseline on both the test and validation splits [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. As shown in Table 8, while our test-time performance
lags behind the baseline, our validation scores significantly exceed it, particularly in terms of macro
F1-score and recall. This suggests that our model is capable of learning from the augmented data, but
may sufer from domain shift or limited generalizability on the blind test set.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Summary of Findings</title>
        <p>Our experiments confirm that Subtask 1 can be efectively solved with standard fine-tuning of a
transformer-based model, achieving near-ceiling performance even under adversarial-style obfuscation.
In contrast, Subtask 2’s multi-way classification remains challenging due to severe class imbalance
and nuanced distinctions between collaboration levels. Data augmentation proves a viable strategy for
boosting performance on underrepresented classes, but future work should explore complementary
approaches—such as ensembling, stylometric feature fusion, or few-shot prompting with large language
models—to further enhance robustness and fine-grained discrimination.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this study, we focused on both binary and multi-class AI authorship detection tasks for the
VoightKampf challenge at the CLEF PAN Lab 2025, utilizing a fine-tuned BERT base uncased model. For
Subtask 1, our approach achieved an impressive accuracy of 98.77%, demonstrating robust F1 scores for
both human and AI classes, which illustrates the model’s efectiveness in binary classification.</p>
      <p>Subtask 2 posed a considerable challenge due to severe class imbalance. By applying targeted data
augmentation—specifically focused on underrepresented classes—and fine-tuning a RoBERTa-Large
model, we were able to significantly improve macro F1-score across the board. The largest gains were
observed in minority classes, particularly Class 4 and Class 5, demonstrating that balancing strategies
can efectively improve performance on rare collaboration levels without sacrificing overall accuracy.</p>
      <p>Moreover, performance on high-support classes such as Class 0 (fully human-written) and Class 2
(minor AI assistance) remained robust, indicating that augmentation did not negatively impact the
model’s understanding of dominant patterns. However, despite these gains, Class 3 continues to show
low recall, suggesting persistent confusion in capturing intermediate collaboration levels. Future work
could explore the use of contrastive learning, ensemble techniques, or stylometric features to help better
disentangle nuanced authorial blends, especially with more powerful foundation models.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The authors would like to acknowledge the support provided by the Ofice of Research (OoR) at Habib
University, Karachi, Pakistan, for funding this project through the internal research grant IRG-2235.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors utilized GPT-4 and Grammarly for grammar and
spelling checks. After employing these tools, the authors independently reviewed and edited the content
as necessary, taking full responsibility for the final publication.
[8] Z. Lin, Y. Li, J. Huang, Voight-kampf generative ai authorship verification based on t5, Working</p>
      <p>Notes of CLEF (2024).
[9] A. Richburg, C. Bao, M. Carpuat, Automatic authorship analysis in human-ai collaborative writing,
in: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language
Resources and Evaluation (LREC-COLING 2024), 2024, pp. 1845–1855.
[10] A. Valdez-Valenzuela, H. Gómez-Adorno, Team iimasnlp at pan: leveraging graph neural networks
and large language models for generative ai authorship verification, Working Notes of CLEF
(2024).
[11] J. Huang, Y. Chen, M. Luo, Y. Li, Generative ai authorship verification of tri-sentence analysis base
on the bert model, Working Notes of CLEF (2024).
[12] Z. Wu, W. Yang, L. Ma, Z. Zhao, Bertt: a hybrid neural network model for generative ai authorship
verification, Working Notes of CLEF (2024).
[13] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: Advances in Information
Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes in Computer
Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsivgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Elozeiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “VoightKampf” Generative AI Authorship Verification Task at PAN</article-title>
          and
          <article-title>ELOQUENT 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen, Deberta:
          <article-title>Decoding-enhanced bert with disentangled attention</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>03654</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yadagiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shree</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Parween</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maurya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pakray</surname>
          </string-name>
          ,
          <article-title>Detecting ai-generated text with pre-trained models using linguistic features</article-title>
          ,
          <source>in: Proceedings of the 21st International Conference on Natural Language Processing (ICON)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>188</fpage>
          -
          <lpage>196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          , L. Ma,
          <article-title>Bcav: a generative ai author verification model based on the integration of bert and cnn</article-title>
          , Working Notes of CLEF (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T.-Y. Liu, et al., R-drop:
          <article-title>Regularized dropout for neural networks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>34</volume>
          (
          <year>2021</year>
          )
          <fpage>10890</fpage>
          -
          <lpage>10905</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . Kong,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>A verifying generative text authorship model with regularized dropout</article-title>
          , Working Notes of CLEF (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>