<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bert_T for Human-AI Collaborative Text Classification Notebook for PAN at CLEF 2025</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Weidong Wu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenyin Yang</string-name>
          <email>cswyyang@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhen Shen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meifang Xie</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiliang Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miaoji Zheng</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tufeng Xian</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qiyuan Sun</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cargosmart (Zhuhai) Co., LTD</institution>
          ,
          <addr-line>Zhuhai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As generative language models become increasingly integrated into writing processes, distinguishing between human-written, AI-generated, and collaboratively authored texts has emerged as a critical challenge. This paper explores neural approaches to text classification in Human-AI co-authorship contexts, aiming to identify the varying degrees and patterns of collaboration within texts. We propose a hybrid neural architecture that combines the contextual strength of BERT with the sequence modeling capabilities of Transformer layers, tailored to capture subtle signals of authorship. The model is evaluated on a specially constructed dataset reflecting diverse collaborative scenarios, including pure human writing, fully AI-generated content, and human-AI co-authored texts. Experimental results demonstrate that this approach consistently outperforms standard baselines in both accuracy and robustness, offering a promising direction for authorship analysis in the era of generative AI.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2025</kwd>
        <kwd>Human-AI Collaborative Text Classification</kwd>
        <kwd>Transformer</kwd>
        <kwd>BERT 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Text classification remains a cornerstone of Natural Language Processing (NLP), and within this
domain, the task of authorship verification has gained renewed importance due to the proliferation
of large-scale generative models. Authorship verification supports applications such as authenticity
validation, plagiarism detection, and source attribution, playing a crucial role in maintaining the
integrity of digital content.</p>
      <p>The Generative AI Authorship Verification Task at PAN@CLEF 2025 continues this line of
research by focusing on the challenge of distinguishing between texts written by humans and those
generated by large language models (LLMs). As models such as GPT increasingly produce
humanlike text, the boundary between human and machine authorship becomes harder to discern,
amplifying the relevance of this task. In the 2025 benchmark setting, baseline performance on this task
yielded a Recall (Macro) of 48.32%, F1 (Macro) of 47.82%, and Accuracy of 57.09%. Building on previous
studies and neural methods for text classification, we introduced a hybrid neural model designed to
more effectively capture subtle stylistic and semantic differences indicative of authorship. Our
improved system achieved Recall (Macro) of 54.09%, F1 (Macro) of 53.57%, and Accuracy of 63.01%,
representing substantial gains across all core evaluation metrics. This performance improvement
demonstrates the effectiveness of leveraging neural architectures tailored for human-AI
coauthorship detection. Notably, our approach integrates contrastive learning and advanced sequence
modeling techniques to enhance discriminative capabilities, especially in scenarios where
differences in authorship style are subtle and context-dependent.</p>
      <p>Our system was evaluated through the TIRA.io platform, which ensures a reproducible and fair
comparison under shared task conditions. The results reinforce the critical role of neural models in
advancing authorship verification tasks and illustrate the feasibility of scalable, accurate solutions in
the face of increasingly human-like AI-generated text.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>The The dataset for the Human-AI Collaborative Text Classification Task at PAN@CLEF 2025 plays
a central role in training and evaluating models aimed at discerning authorship dynamics within
mixed-authored content. This year's dataset features a diverse collection of texts that reflect a
spectrum of authorship scenarios, including purely human-written, fully machine-generated, and
coauthored content. These texts are drawn from multiple genres, including news articles, Wikipedia
introductions, and fanfiction, ensuring a broad stylistic and structural diversity.</p>
      <p>Participants are also provided with a bootstrap dataset containing annotated samples of real and
machine-generated news articles centered on prominent 2021 U.S. events. This component is
designed to simulate real-world collaborative or comparative writing scenarios, in which
AIgenerated content often mirrors the topical and rhetorical choices of human authors. The data—
curated in collaboration with contributors such as ELOQUENT Labs— is carefully balanced to
represent different authorship types. Articles are generated either by one or more human authors or
by advanced LLMs, particularly Google’s Gemini Pro. The dataset is structured around pairs of texts
on the same topic, authored separately by a human and a machine, to highlight subtle differences in
writing style and semantic composition.</p>
      <p>Each text is stored in a newline-delimited JSON (.jsonl) format. A typical entry in the
development set appears as:</p>
      <p>{"text":"Have you... of lost.", "language":"English", "label":4, "source_dataset":"TriBERT",
"model":"chatgpt", "label_text":"deeply-mixed text; where some parts are written by a human and
some are generated by a machine"}</p>
      <p>{"text":"But now... really mattered.","language":"English", "label":3, "source_dataset":
"RoFT_chatgpt", "model":"llm1-llm2", "label_text":"human-initiated, then machine-continued"}</p>
      <p>In the Human-AI Collaborative Text Classification subtask of PAN@CLEF 2025, each document
in the dataset is assigned to one of six categories, reflecting the nuanced interactions between
human authors and large language models (LLMs). These categories capture various forms of
collaboration and transformation that occur in co-authored texts. Some documents are written
entirely by humans without any AI involvement, while others begin with a human-authored draft
that is subsequently continued or polished by an AI model. Conversely, certain texts originate from
an AI system and are later modified by a human editor, either for stylistic refinement or to obscure
the machine origin of the content. In more complex scenarios, human and AI contributions are
deeply intertwined throughout the document, lacking a clear division between authorship segments.</p>
      <p>This classification task requires models to discern subtle stylistic, structural, and semantic cues
indicative of each collaboration pattern, going beyond surface-level detection of synthetic language.
By embracing this more detailed taxonomy of co-authorship, the subtask enables a richer
understanding of the ways in which human and machine writing processes intersect.To facilitate
model training and evaluation, the dataset is provided in newline-delimited JSON (JSONL) format,
with each entry comprising a unique identifier, the full text, and a corresponding class label. During
testing, labels are omitted, and models must predict the appropriate category for each instance.
Evaluation metrics such as macro-averaged recall, F1-score, and accuracy are used to ensure
balanced assessment across all six classes, reflecting the importance of generalization in this
multiclass setting.</p>
      <p>Ultimately, this task offers a framework for systematically analyzing the emerging landscape of
collaborative authorship, where distinguishing between different forms of human-AI interaction is
critical for maintaining transparency, trust, and accountability in content creation.</p>
      <p>Participants are required to classify each individual document into one of six categories that
represent different patterns of human-AI collaboration in text creation. This task challenges models
to capture subtle linguistic, stylistic, and semantic cues that differentiate various forms of
coauthorship, rather than simply distinguishing between human- and machine-generated texts. Access
to the dataset is carefully controlled through Zenodo, where participants must register and request
access using their TIRA-registered email. This process ensures that the dataset is used exclusively for
research purposes and prohibits any unauthorized redistribution. Such controlled access maintains
compliance with copyright regulations and preserves the dataset’ s integrity for academic and
developmental use.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset Preprocessing</title>
        <p>EffectiveEffective data preprocessing plays a vital role in enhancing the robustness and accuracy of
machine learning models, especially for complex tasks like Human-AI Collaborative Text
Classification. For this subtask at PAN@CLEF 2025, our preprocessing pipeline was carefully
designed to prepare diverse and nuanced texts reflecting different collaboration patterns between
humans and AI.</p>
        <p>The preprocessing began with text normalization, which involved converting all text to lowercase
and removing punctuation, non-alphabetic characters, and numerals. This step aimed to reduce
irrelevant variability and focus the model on meaningful linguistic content. Subsequently, common
stopwords were filtered out to minimize noise and emphasize distinctive textual features indicative
of different co-authorship styles. Following normalization, the texts were tokenized into discrete
units suitable for model input. We utilized a pre-trained BERT tokenizer to vectorize the token
sequences, ensuring consistency by applying padding and truncation to standardize input lengths,
thereby optimizing training efficiency. Given the inherent complexity and limited size of labeled
data in this multi-class classification task, data augmentation techniques were employed. These
techniques generated additional training samples by introducing subtle modifications to existing
texts, preserving semantic and stylistic integrity critical for distinguishing collaborative writing
patterns.</p>
        <p>Throughout the preprocessing workflow, special attention was paid to maintaining the delicate
balance between cleaning the data and preserving the linguistic cues essential for accurate
classification of the six collaboration categories. This comprehensive preprocessing framework laid
a solid foundation for training models capable of capturing the nuanced human-AI interplay within
the texts, thereby improving classification performance on this challenging task.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Network Architecture</title>
        <p>In this study, we propose a neural network architecture designed to address the complexities of
Human-AI Collaborative Text Classification in the PAN@CLEF 2025 challenge. The model builds
upon the robust contextual representation capabilities of BERT-base, combined with a Transformer
encoder to effectively model semantic dependencies and stylistic variations across the input texts.
Specifically, we utilize the pre-trained bert-base-uncased model from Hugging Face as the
foundational encoder. The [CLS] token embedding is extracted to represent the overall semantic
composition of each document and is subsequently passed through a Dropout layer to mitigate
overfitting and improve generalization. To capture the nuanced distinctions among the six
collaboration types— ranging from fully human-written texts to deeply interwoven human-AI
compositions—a Transformer encoder layer with multi-head attention is introduced. This allows the
model to attend to different parts of the input text and infer patterns that indicate varying degrees
of human and AI involvement. Unlike binary classification tasks, our model is trained in a
multiclass setting using a categorical cross-entropy loss function, enabling it to predict one of six
predefined collaboration categories for each input text. During inference, each document is
processed individually, and the model outputs a probability distribution over the six classes, from
which the most probable class is selected.</p>
        <p>We fine-tuned hyperparameters such as learning rate, batch size, and dropout rate to maximize
classification accuracy and macro-averaged F1 scores— metrics particularly relevant in the context
of imbalanced multi-class tasks. This model configuration demonstrates strong adaptability to the
demands of Human-AI co-authored text classification and highlights its effectiveness in identifying
subtle textual signals that correspond to distinct collaboration modes, as visualized in Figure 1:
Model Architecture for Human-AI Text Classification.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setting</title>
        <p>For Subtask 2 of PAN@CLEF 2025, we trained a classification model to distinguish six types of
Human-AI collaborative writing. The dataset was split into training and testing sets at a 7:3 ratio.
Our model integrates a pretrained BERT base encoder with a Transformer layer and a linear
classification head to predict one of six categories reflecting different human-machine authorship
dynamics. The model uses 768 hidden units, four attention heads, and is optimized using AdamW
with a learning rate of 1e-6 and batch size of 8. Training was conducted over 300 epochs on
CUDAenabled GPUs. This setup enables effective learning of stylistic and structural patterns unique to each
collaboration type.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Metrics</title>
        <p>Our evaluation framework was meticulously designed to rigorously assess the performance of the
Bert_T model across several metrics that reflect its effectiveness in classifying texts based on
different modes of Human-AI collaboration. The model was evaluated using a standard set of
metrics commonly employed in multi-class text classification tasks, including ROC-AUC, Brier
score, C@1, F1, and F0.5u, along with the arithmetic mean of these metrics to provide a
comprehensive overview of performance.</p>
        <p>Performance Metrics:</p>
        <p>ROC-AUC measures the area under the receiver operating characteristic curve, providing
insight into the model's ability to discriminate between classes across all thresholds [8]. The ROC
curve plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold
settings. The formula is given by:</p>
        <p>
          1
ROC-AUC = ∫ TPR (t ) d ( FPR (t )) # (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
        </p>
        <p>0</p>
        <p>Brier Score evaluates the mean squared error of the predicted class probabilities in the context
of multi-class Human-AI collaborative writing classification [9]. It reflects how well the model's
probabilistic outputs align with actual class labels. A lower Brier score indicates more accurate and
better-calibrated predictions. It is calculated as:</p>
        <p>
          Brier Score = 1 ∑N ( predicted probabilityi − actual outcomei)2 # (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
        <p>N i =1</p>
        <p>C@1 represents a modified accuracy that treats non-answers (predictions with a confidence score
of 0.5) by averaging the accuracy of the remaining cases, thus penalizing uncertainty [10]. This metric
is particularly useful in situations where making no prediction is preferable to making an incorrect
prediction. The formula is:</p>
        <p>
          C @ 1 = Number of correct answers + Number of non-answers # (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
        </p>
        <p>Total number of cases − Number of non-answers Total number of cases
F1 Score is the harmonic mean of precision and recall, offering a balance between the precision
of the classifier and its recall capability [11]. It is particularly useful in situations where an equal
balance between precision and recall is desired. The formula is:</p>
        <p>
          Precision × Recall
F 1 = 2⋅ # (
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
        </p>
        <p>Precision + Recall
F 0.5 u =</p>
        <p>(1 + 0.52)⋅
Where Precision = TP a nd Recall = TP .</p>
        <p>TP + FP TP + FN</p>
        <p>F0.5u is a variant of the F-measure that weights precision more than recall, suitable for scenarios
where false positives are more costly than false negatives [12]. It is calculated using the formula:
Precision × Recall
0.52⋅ Precision + Recall</p>
        <p>
          These metrics collectively provided a robust framework for evaluating our model, enabling us to
effectively measure its ability to perform authorship verification across different dimensions of
accuracy and reliability.
# (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results</title>
        <p>Our proposed Bert_T model exhibited strong performance in the PAN@CLEF 2025 Subtask 2:
HumanAI Collaborative Text Classification, demonstrating notable effectiveness across core classification
metrics. As shown in Table 1, Bert_T achieved a macro-averaged recall of 0.541, macro F1 score of
0.535, and an overall accuracy of 0.630, outperforming the official baseline system, which yielded a
recall of 0.483, F1 of 0.478, and accuracy of 0.570. These results indicate that Bert_T is capable of
effectively distinguishing among the six nuanced categories of human-AI collaborative writing—
ranging from fully human-written texts to deeply interwoven co-authored documents. The
improvement in macro recall and F1 highlights the model’s balanced ability to detect minority classes
as well as dominant ones, which is critical in a multi-class setting with imbalanced category
distributions. The model’s consistent performance across these metrics underscores its robustness in
handling the complex stylistic variations and subtle linguistic cues that characterize collaborative
human-LLM texts. Unlike binary authorship verification, this task demands a more granular
understanding of co-authorship dynamics, and the Bert_T model's architecture proves well-suited to
these challenges.</p>
        <p>Future enhancements will focus on refining category-specific sensitivity and improving
classwise calibration, especially for closely related subtypes like "human-initiated, machine-continued"
versus "machine-written, human-edited." Expanding annotated training data and applying
contrastive learning are also being considered to further boost model generalization.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This This paper presents the design and evaluation of the Bert_T model, our proposed solution for
Subtask 2: Human-AI Collaborative Text Classification at PAN@CLEF 2025. By integrating
BERTbased contextual feature extraction with a Transformer encoder for attention modeling, Bert_T is
tailored to capture the nuanced patterns of collaboration between human authors and large
language models. It effectively classifies co-authored texts into six distinct categories, such as fully
human-written, human-written then machine-polished, and deeply-mixed compositions.</p>
      <p>In experimental evaluations, Bert_T achieved a macro-averaged recall of 0.541, F1 score of 0.535,
and an accuracy of 0.630, outperforming the baseline system (0.483 recall, 0.478 F1, 0.570 accuracy).
These results confirm the model’s reliable performance in handling the complex and subtle nature
of human-AI collaborative writing. Its effectiveness demonstrates strong generalization across
different forms of human-machine co-authorship and the ability to detect varying degrees of AI
involvement in text generation. Looking ahead, we aim to further improve Bert_T through
parameter tuning, enhanced feature engineering strategies, and by diversifying the training data to
better represent various human-AI interaction styles. These enhancements are expected to boost the
model’ s precision and adaptability, extending its utility beyond this task to broader challenges in
collaborative text understanding and authorship analysis.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgements</title>
      <p>This work was supported by grants from the Guangdong-Foshan Joint Fund Project (No.
2022A1515140096) and Open Fund for Key Laboratory of Food Intelligent Manufacturing in
Guangdong Province (No. GPKLIFM-KF-202305).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this manuscript, the authors made limited use of GPT-based tools solely
for grammar and spelling checking. All content generated by these tools was carefully reviewed and
revised by the authors. The authors take full responsibility for the final content of the publication.
[8] D. Yuan, W. Y. Yang, L. Ma, et al. Analysis of Irony and Stereotype Spreaders Based On
Convolutional Neural Networks. In Guglielmo Faggioli, Nicola Ferro, Allan Hanbury, and
Martin Potthast, editors, CLEF 2022 Labs and Workshops, Notebook Papers, September 2022.</p>
      <p>CEUR-WS.org.
[9] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M.</p>
      <p>Potthast, Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L.
Goeuriot, F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.),
Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023),
Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241.
URL: https://link.springer.com/chapter/10.1007/978-3-031-28241-6_20.
doi:10.1007/978-3-03128241-6_20.
[10] A. M. Carrington, D. G. Manuel, P. W. Fieguth, et al. "Deep ROC analysis and AUC as balanced
average accuracy, for improved classifier selection, audit and explanation." IEEE Transactions
on Pattern Analysis and Machine Intelligence 45.1 (2022): 329-341.
[11] W. Yang, J. Jiang, E. M. Schnellinger, et al. "Modified Brier score for evaluating prediction
accuracy for binary outcomes." Statistical methods in medical research 31.12 (2022): 2287-2296.
[12] A. Peñas, A. Rodrigo, A simple measure to assess non-response (2011).
[13] F. Pedregosa, G. Varoquaux, A. Gramfort, Scikit-learn: Machine learning in python, the Journal
of machine Learning research 12 (2011) 2825–2830.
[14] J. Bevendorff, B. Stein, M. Hagen, M. Potthast, Generalizing unmasking for short texts, in:
Proceedings of the 2019 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers),
2019, pp. 654–659.
[15] M. Abassy, K. Elozeiri, A. Aziz, M. N. Ta, R. V. Tomar, B. Adhikari, S. E. D. Ahmed, Y. Wang,O.</p>
      <p>Mohammed Afzal, Z. Xie, J. Mansurov, E. Artemova, V. Mikhailov, R. Xing, J. Geng, H. Iqbal, Z.
M.Mujahid, T. Mahmoud, A. Tsvigun, A. F. Aji, A. Shelmanov, N. Habash, I. Gurevych, P.
Nakov, LLM-DetectAIve: a tool for fine-grained machine-generated text detection, in: D. I.
Hernandez Farias,T. Hope, M. Li (Eds.), Proceedings of the 2024 Conference on Empirical
Methods in Natural Language Processing: System Demonstrations, Association for
Computational Linguistics, Miami, Florida, USA, 2024, pp. 336– 343. URL:
https://aclanthology.org/2024.emnlp-demo.35/. doi:10.18653/v1/2024.emnlp-demo.35.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Janek</given-names>
            <surname>Bevendorff</surname>
          </string-name>
          , Daryna Dementieva, Maik Fröbe, Bela Gipp, André Greiner-Petter, Jussi Karlgren, Maximilian Mayerl, Preslav Nakov, Alexander Panchenko, Martin Potthast, Artem Shelmanov, Efstathios Stamatatos, Benno Stein, Yuxia Wang,
          <string-name>
            <surname>Matti Wiegmann</surname>
            , and
            <given-names>Eva</given-names>
          </string-name>
          <string-name>
            <surname>Zangerle</surname>
          </string-name>
          .
          <source>Overview of PAN</source>
          <year>2025</year>
          :
          <article-title>Voight-Kampff Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection. In Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Springer, Madrid, Spain,
          <year>September 2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bevendorff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlgren</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fröbe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsivgun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abassy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mansurov</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ta</surname>
            ,
            <given-names>M. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elozeiri</surname>
            ,
            <given-names>K. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomar</surname>
            ,
            <given-names>R. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artemova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shelmanov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Habash</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>Overview of the "Voight-Kampff" Generative AI Authorship Verification Task at PAN</article-title>
          and
          <article-title>ELOQUENT 2025</article-title>
          . In G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , &amp; D. Spina (Eds.),
          <source>Working Notes of CLEF</source>
          <year>2025</year>
          -
          <article-title>- Conference and Labs of the Evaluation Forum</article-title>
          ,
          <source>CEUR Workshop Proceedings. CEUR-WS.org</source>
          , Madrid, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Generative AI Authorship Verification, Multi-Author Writing Style Analysis, Multilingual Text Detoxification, and Generative Plagiarism Detection, in: Experimental IR Meets Multilinguality,Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Halvani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Graner</surname>
          </string-name>
          .
          <article-title>"On the usefulness of compression models for authorship verification</article-title>
          .
          <source>" Proceedings of the 12th international conference on availability, reliability and security</source>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , et al.
          <article-title>"Generalizing unmasking for short texts." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , Volume
          <volume>1</volume>
          (Long and Short Papers).
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. B.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. Y.</given-names>
            <surname>Teng</surname>
          </string-name>
          , et al.
          <article-title>"Fast-detectgpt: Efficient zero-shot detection of machinegenerated text via conditional probability curvature</article-title>
          .
          <source>" arXiv preprint arXiv:2310.05130</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z. X.</given-names>
            <surname>Yang</surname>
          </string-name>
          , L. Ma, W. Y.
          <string-name>
            <surname>Yang</surname>
          </string-name>
          , et al.
          <article-title>A Intelligent Detection Method for Irony and Stereotype Based on Hybird Neural Networks</article-title>
          . In Guglielmo Faggioli, Nicola Ferro, Allan Hanbury, and Martin Potthast, editors,
          <source>CLEF 2022 Labs and Workshops</source>
          , Notebook Papers,
          <year>September 2022</year>
          .
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>