<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generative AI Authorship Verification Based on Contrastive-Enhanced Dual-Model Decision System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Junlang Liu</string-name>
          <email>liujunlang2015@Gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leilei Kong</string-name>
          <email>kongleilei@fosu.edu.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhenyu Peng</string-name>
          <email>pengzhenyu1411@163.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Feifan Chen</string-name>
          <email>chenfeifan0203@163.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Detecting human-written text from content produced by large language models (LLMs) remains a moving target, especially when detectors face unseen generators. We formalize the CLEF PAN 2025 Generative-AI Authorship Verification task as a text classification problem, employing a contrastive-enhanced ModernBERTlarge approach, a Qwen3-based approach, and a fusion method combining both approachs. Specifically, to implement contrastive learning in the contrastive-enhanced method, we applied the large language model ChatGPT-4.1 for data augmentation, rewriting 1,000 human-written sentences. On the oficial validation set, our contrastive-enhanced method achieves a 0.997 mean score, with all five PAN metrics above 0.99. On the hidden test set our submitted single-model ModernBERT-large(CE + SCL) achieves a 0.871 mean score (ROC-AUC = 0.822, F1 = 0.855), ranking 3rd out of 24 teams. The results suggest that the contrastive-enhanced method yields competitive results, even without relying on large-scale ensemble systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Generative AI Detection</kwd>
        <kwd>Pre-trained Model</kwd>
        <kwd>Contrastive-Enhanced</kwd>
        <kwd>Text Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large language models (LLMs) have drastically lowered the cost of generating fluent text, but this
progress intensifies the need to verify whether content is authored by humans or machines[
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
Experience from the PAN 2024 lab shows that detectors fine-tuned on one generator family often
underperform when faced with unseen models or domains[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Current detection methods face three core challenges:
• Models trained solely with cross-entropy loss tend to focus on surface-level features and struggle
to capture subtle semantic diferences between human-written and AI-generated texts.
• Previous ensemble methods, while improving accuracy, depend on multiple large models, resulting
in low inference eficiency and significant deployment barriers due to hardware constraints.
• Inspired by the success of noise-based perturbation strategies in computer vision tasks—where
slight input transformations help models generalize better—we explore analogous perturbation
strategies for textual data to improve robustness and semantic representation learning[
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>To address these issues, we propose a lightweight ensemble composed of a bidirectional encoder
model (ModernBERT-large) and an autoregressive decoder model (Qwen3-4B). The encoder branch is
ifne-tuned using a joint cross-entropy and supervised contrastive loss to improve discrimination in
borderline cases, while the decoder branch is trained with standard cross-entropy. During inference, the
outputs are combined via mean-probability soft voting, incurring only minimal additional computational
overhead.</p>
      <p>Our experiments show that the proposed method achieves state-of-the-art robustness on the PAN25
validation set, with all five oficial metrics—ROC-AUC, Brier, C@1, F 1, and F0.5—exceeding 0.99.</p>
      <p>The remainder of this paper is organised as follows: Section 2 reviews related work; Section 3 details
our model design and training; Section 4 presents results and discussion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The fast rise of LLMs has made reliably telling human- from machine-authored text a pressing NLP
problem. Earlier eforts fall into (i) supervised classification, (ii) zero-shot detection, and (iii) multi-model
decision aggregation.Classical lexical-feature classifiers can still rank highly—e.g. a plain SVM built on
TF-IDF matched or beat neural baselines—yet their robustness drops once generators evolve. Conversely,
zero-shot signals such as cross-perplexity generalise well but lag in absolute accuracy.</p>
      <sec id="sec-2-1">
        <title>2.1. Supervised Classification Models</title>
        <p>
          Traditional machine-learning methods remain remarkably competitive. Lorenz et al. employ a linear
SVM trained on TF-IDF features and achieve performance close to the top[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Meanwhile, several teams
ifne-tune Transformer-based classifiers. Cao et al. enhance their model by augmenting the training
set with additional human-written samples[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The Tri-Sentence Analysis method splits each long
document into three shorter segments and averages their individual predictions to stabilise the final
decision[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Lin et al. incorporate R-Drop regularisation to reduce the variance caused by dropout
during inference[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Overall, supervised models achieved some of the highest mean scores in the PAN-24
competition. However, despite strong results on validation sets during training, these models often
show reduced robustness when applied to out-of-domain test data, leading to noticeable performance
drops in generalisation scenarios.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Zero-Shot Detection Models</title>
        <p>
          Unsupervised techniques avoid costly annotation by exploiting statistical irregularities in machine text.
Compression-based detectors such as PPMd-CDM treat lower entropy as an AI signature and require
only a generic compressor[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The Binoculars framework measures the ratio between an observer
model’s perplexity and that of a performer model to expose hidden over-repetition in generated text[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
However, their average performance in PAN-24 competition was notably lower than that of supervised
systems, underscoring an inherent trade-of between broad generality and fine-grained accuracy.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Multi-model Decision Aggregation Models</title>
        <p>
          To enhance robustness, some teams opted to combine multiple detection strategies. BinocularsLLM
integrates two QLoRA-fine-tuned language models with Binoculars-style perplexity scoring, applying
soft voting across all components to reach a final decision[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. This ensemble achieved the top rank in
the competition. LAVA takes a diferent approach by training separate adapters for diferent families of
generative models and employs a conservative “unanimous agreement” rule—only predicting human
authorship when all modules concur—efectively reducing false positives[ 13]. These ensemble-based
systems demonstrated high mean scores in the evaluation, but their improved performance comes at
the cost of increased inference time and memory usage, highlighting the trade-of between speed and
accuracy.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System Overview</title>
      <p>To build a robust generative-AI authorship verifier, three strategies are developed:
1. ModernBERT-large is fully fine-tuned as a classifier using both cross-entropy and supervised
contrastive loss.
2. Qwen3-4B is fully fine-tuned with cross-entropy loss.</p>
      <p>3. ModernBERT-large and Qwen3-4B are fused via weighted soft voting.</p>
      <p>Our design aims to achieve the following main goals:
• Compare the performance of two diferent model architectures on the generative AI detection
task after supervised fine-tuning.
• To enhance the overall robustness and generalization of the system by incorporating two
structurally diferent models.</p>
      <p />
      <p>Let ℋ = { ℎ}=1 be the set of human-written texts (ℎ ∈ Σ * ). Let  = {  }=1 be the set of
AIgenerated texts. For 1000 texts ℎ ∈ ℋ we obtain an augmented paraphrase  using the GPT-4.1 model.
The set of all augmented texts is  = { }=1. We assign the paraphrase set  the machine-generated
label (1), while the corresponding original texts in ℋ retain the human-written label (0). Unless stated
otherwise we denote the complete corpus by  = ℋ ∪  ∪  and a generic sample by  ∈ .</p>
      <sec id="sec-3-1">
        <title>3.1. Contrastive-Enhanced ModernBERT-large</title>
        <sec id="sec-3-1-1">
          <title>3.1.1. Data Augmentation</title>
          <p>To expose the detector to challenging near-human counterfeits, we first sampled 1000 sentences from
the human class and then asked ChatGPT-4.1 (04-01-2025) to rewrite each sentence in its own words
while preserving the original meaning. We call these rewrites paraphrases and assign them the label 1
(machine-generated); their source sentences retain label 0 (human). Because the two versions of every
sentence convey the same idea yet belong to opposite classes, they form hard positive-negative pairs
that sharpen the contrastive objective.</p>
          <p>Balanced mini-batches. Purely shufling the data can yield mini-batches containing only positives
or only negatives, which dilutes the contrastive signal. Therefore, we deterministically interleave
samples in the order human → paraphrase → human → machine, aiming to keep the class ratio
within each batch as close to 1:1 as possible.</p>
          <p>Prompt
System Prompt:
This is a piece of text generated by a human. I want to express the same meaning as this sentence,
but without changing its writing style. Please help me rephrase it. Just output the rephrased
sentence directly.</p>
          <p>User Prompt:
I approach a corner in the hallway as the door to a classroom in front of me opens and a girl steps
out. She is wearing a form fitting black shirt with ...</p>
          <p>Answer:
I round a corner in the hallway just as the door of a classroom ahead swings open and a girl steps
out. She’s dressed in a fitted black shirt, snug yet ...</p>
          <p>For the augmented dataset, we first separated human-written texts and AI-generated texts. We then
alternately inserted them one by one into the training dataset.Additionally, the remaining AI-generated
texts were randomly inserted, and the 1000 augmented samples were ensured to be included in the
same training batches as the original human-written texts. The final statistics of the training data are
presented in Table 1.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Supervised Fine-Tuning with Joint Loss</title>
          <p>To better capture nuanced semantic diferences, we adopt a supervised fine-tuning strategy combined
with contrastive learning, training the LLM directly on labeled data. Specifically, we attach a fully
connected classification head to the hidden representation of the [CLS] token, allowing the model
to output a binary label given an input text—where 0 denotes a human-written text and 1 denotes a
machine-generated one.</p>
          <p>To enhance the model’s ability to discriminate between subtle semantic patterns, we incorporate
supervised contrastive learning following the formulation proposed by Beliz Gunel et al[14]. The overall
training objective is a weighted combination of cross-entropy loss and supervised contrastive loss. The
ifnal loss function is defined as follows:
(1)
(2)
(3)
(4)
ℒ = (1 −  ) · ℒ CE +  · ℒ SCL</p>
          <p>1 ∑︁ log  ( | x)
ℒCE = −</p>
          <p>=1
ℒSCL = 1 ∑=︁1 |1()| ∈∑︁() log</p>
          <p>exp (︀ z⊤z / )︀
∑︀=1 exp (︀ z⊤z/ )︀
̸=
Specifically,  () denotes the set of positive samples that share the same class label as the anchor
sample ,  represents the hidden representation (feature vector) extracted by the model, and  ∈ R+ is
a temperature hyperparameter that controls the concentration level of the similarity distribution. This
formulation encourages the model to bring semantically similar samples closer in the representation
space while pushing apart dissimilar ones, thereby improving class-level discrimination.</p>
          <p>Equation (1) represents the overall loss, Equation (2) corresponds to the standard cross-entropy loss,
and Equation (3) denotes the contrastive learning loss.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Supervised Fine-Tuning with LLMs</title>
        <p>For the decoder-based model, we adopt Qwen3-4B as our backbone. The fine-tuning strategy is similar
to that used in the encoder-based model. Specifically, we add a fully connected classification head to the
output vector of the last token after decoding, and perform binary classification—predicting whether a
given input text is human-written or machine-generated.</p>
        <p>Unlike the encoder-based model, this decoder-only model is trained using only the standard
crossentropy loss, as the limited GPU memory prevented us from incorporating the contrastive-learning
loss.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Contrastive-enhanced Dual-Model Decision(CeDD)</title>
        <p>To combine the strengths of both the encoder-based and decoder-based models, we aggregate their
prediction outputs using a soft voting strategy. Specifically, the final prediction probability is computed
as the mean of the individual classification probabilities:
{︃1 if final ≥ 0.5 (machine-generated)</p>
        <p>0 if final &lt; 0.5 (human-written)</p>
        <p>This simple yet efective fusion mechanism leverages the complementary inductive biases of the two
model architectures. It improves prediction robustness without introducing significant computational
overhead and helps mitigate model-specific errors on borderline or ambiguous samples.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>In this section, we present the implementation details, evaluation metrics, and provide a comprehensive
analysis of the results. We utilize the TIRA platform to evaluate our three methods using test datasets[15].</p>
      <sec id="sec-4-1">
        <title>4.1. Implementation Details</title>
        <p>In this research, the training CeDD was implemented in PyTorch and executed on a single Nvidia
A800 GPU. The model was trained using full bf16 precision to ensure numerical stability and training
eficiency. The fine-tuning process lasted for 3 epochs, using the AdamW optimizer with a learning rate
of 2e-5. The batch size was set to 32 without employing gradient accumulation. For the
ModernBERTlarge model, the training objective combined standard cross-entropy loss with supervised contrastive
loss, with a lambda weight of 0.9 and a temperature of 0.3. The warm-up ratio was set to 0.1, and
training logs were recorded every 50 steps. An independent validation set was used during training for
evaluation. All experiments were conducted under a fixed random seed and employed cosine learning
rate scheduling to ensure reproducibility.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation Metrics</title>
        <p>To evaluate the performance of our proposed model, we used the evaluation metrics provided by PAN25,
which include the following metrics:
• ROC–AUC: The area under the ROC (Receiver Operating Characteristic) curve.
• Brier: The complement of the Brier score (mean squared loss).
• C@1: A modified accuracy score that assigns non-answers (score = 0.5) the average accuracy of
the remaining cases.
• F1: The harmonic mean of precision and recall.
• F0.5: A modified F 0.5 measure (precision-weighted F measure) that treats non-answers (score =
0.5) as false negatives.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Validation-set Results</title>
        <p>As mentioned earlier, we compare three approaches for detecting AI-generated text: classification using
an encoder-based model, classification using a decoder-based model, and a contrastive-enhanced
dualmodel decision strategy that combines both. The performance of various LLMs under these approaches
is summarized in Table 2, based on evaluations on the validation dataset.
0.984
0.757
0.844</p>
        <p>F1</p>
        <p>Upon analyzing the results shown in Table 2, it is evident that ModernBERT-large delivers the most
stable and consistent performance across all evaluation metrics. Notably, it achieves an F1 score of 0.998
and an F0.5 score of 0.999, highlighting its eficiency and accuracy in text classification tasks.</p>
        <p>Qwen3-4B also performs competitively, especially in the Brier and mean scores, reflecting its strength
in handling order-sensitive or generative-context inputs. This supports the efectiveness of the
decoderonly architecture.</p>
        <p>Our final system CeDD integrates both models and demonstrates near-optimal results across all
six metrics. This confirms the efectiveness of our CeDD in enhancing the robustness, stability, and
accuracy of generative authorship verification.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Test-set Results</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This work presents a supervised contrastive learning approach built upon the ModernBERT-large model
for the CLEF PAN 2025 Generative-AI Authorship Verification task. By jointly optimizing cross-entropy
loss and supervised contrastive loss, our method improves the model’s ability to distinguish between
human-written and AI-generated texts.</p>
      <p>• On the oficial validation set, ModernBERT-large (CE+SCL) achieved a near-perfect mean score
of 0.998 across all PAN metrics.
• On the hidden test set, this single-model approach obtained a mean score of 0.871, ranking 3rd
out of 24 teams, confirming the efectiveness of our design.</p>
      <p>In addition to the above results, we summarize the following key insights: (i) Supervised contrastive
learning substantially enhances class separability and semantic discrimination; (ii) A single
wellregularized encoder model can outperform complex ensembles while remaining eficient and scalable;
(iii) Paraphrased data generated by GPT-4.1 serves as highly efective contrastive pairs during training,
especially in narrowing the gap between human-like machine outputs and real human writing.</p>
      <p>Overall, our findings show that a contrastively fine-tuned ModernBERT encoder can achieve strong
performance on generative authorship verification, even without relying on large-scale ensemble
systems or decoder-based large language models.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Social Science Foundation of China (No. 22BTQ101).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-o3 in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.
P. Galuščáková, A. G. S. Herrera (Eds.), Working Notes Papers of the CLEF 2024 Evaluation Labs,
CEUR-WS.org, 2024, pp. 2901–2912. URL: http://ceur-ws.org/Vol-3740/paper-281.pdf.
[13] Z. Chen, Y. Han, Y. Yi, Team chen at PAN: Integrating R-Drop and Pre-trained Language Model
for Multi-author Writing Style Analysis, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. Herrera
(Eds.), Working Notes Papers of the CLEF 2024 Evaluation Labs, CEUR-WS.org, 2024, pp. 2547–2553.</p>
      <p>URL: http://ceur-ws.org/Vol-3740/paper-232.pdf.
[14] B. Gunel, J. Du, A. Conneau, V. Stoyanov, Supervised contrastive learning for pre-trained language
model fine-tuning, 2021. URL: https://arxiv.org/abs/2011.01403. arXiv:2011.01403.
[15] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: Advances in Information
Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes in Computer
Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsivgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Elozeiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “VoightKampf” Generative AI Authorship Verification Task at PAN</article-title>
          and
          <article-title>ELOQUENT 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kornblith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Norouzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>A simple framework for contrastive learning of visual representations</article-title>
          , in: H.
          <string-name>
            <surname>D. III</surname>
          </string-name>
          , A. Singh (Eds.),
          <source>Proceedings of the 37th International Conference on Machine Learning</source>
          , volume
          <volume>119</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1597</fpage>
          -
          <lpage>1607</lpage>
          . URL: https://proceedings.mlr.press/v119/chen20j.html.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Bengio,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Recht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <article-title>Understanding deep learning (still) requires rethinking generalization</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>64</volume>
          (
          <year>2021</year>
          )
          <fpage>107</fpage>
          -
          <lpage>115</lpage>
          . URL: https://doi.org/10.1145/3446776. doi:
          <volume>10</volume>
          .1145/3446776.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. Z.</given-names>
            <surname>Aygüler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schlatt</surname>
          </string-name>
          , N. Mirzakhmedova, BaselineAvengers at PAN 2024:
          <article-title>OftenForgotten Baselines for LLM-Generated Text Detection</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S.</surname>
          </string-name>
          Herrera (Eds.),
          <source>Working Notes Papers of the CLEF</source>
          <year>2024</year>
          <article-title>Evaluation Labs, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>2761</fpage>
          -
          <lpage>2768</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-262.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Y. Han,
          <article-title>Enhancing Human-Machine Authorship Discrimination in Generative AI Verification Task with BERT and Augmented Data</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S.</surname>
          </string-name>
          Herrera (Eds.),
          <source>Working Notes Papers of the CLEF</source>
          <year>2024</year>
          <article-title>Evaluation Labs, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>2536</fpage>
          -
          <lpage>2540</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-230.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Generative AI Authorship Verification Of Tri-Sentence Analysis Base On The Bert Model</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S.</surname>
          </string-name>
          Herrera (Eds.),
          <source>Working Notes Papers of the CLEF</source>
          <year>2024</year>
          <article-title>Evaluation Labs, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>2632</fpage>
          -
          <lpage>2637</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-243.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . Kong,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Verifying</given-names>
            <surname>Generative Text Authorship Model With Regularized Dropout</surname>
          </string-name>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S.</surname>
          </string-name>
          Herrera (Eds.),
          <source>Working Notes Papers of the CLEF</source>
          <year>2024</year>
          <article-title>Evaluation Labs, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>2728</fpage>
          -
          <lpage>2734</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-257.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Halvani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Graner</surname>
          </string-name>
          ,
          <article-title>On the usefulness of compression models for authorship verification</article-title>
          ,
          <source>in: Proceedings of the 12th International Conference on Availability, Reliability and Security</source>
          , ARES '17,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2017</year>
          . URL: https://doi.org/10.1145/3098954.3104050. doi:
          <volume>10</volume>
          .1145/3098954.3104050.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schwarzschild</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cherepanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kazemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldblum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geiping</surname>
          </string-name>
          , T. Goldstein,
          <article-title>Spotting llms with binoculars: Zero-shot detection of machine-generated text</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2401.12070. arXiv:
          <volume>2401</volume>
          .
          <fpage>12070</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tavan</surname>
          </string-name>
          , M. Najafi, MarSan at PAN: BinocularsLLM , fusing Binoculars'
          <article-title>Insight with the Proficiency of Large Language Models for Machine-Generated Text Detection</article-title>
          , in: G. Faggioli, N. Ferro,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>