<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Team Bohan Li at PAN: DeBERTa-v3 with R-Drop Regularization for Human-AI Collaborative Text Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bohan Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haoliang Qi</string-name>
          <email>haoliang.qi@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kai Yan</string-name>
          <email>yankai@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>No. 33 Guangyun Road, Shishan, Nanhai District, Foshan 528225, Guangdong</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents our approach to Subtask 2: Human-AI Collaborative Text Classification in the PAN 2025 Voight-Kampf Generative AI Detection challenge. The task focuses on determining the extent to which a text co-authored by humans and artificial intelligence reflects human or machine authorship. The objective is to classify the degree of AI assistance in a given document. In this study, we propose a detection framework that integrates R-Drop regularization with the DeBERTa-v3-base pre-trained language model. The task involves assigning each document to one of six levels of human-AI collaboration, ranging from fully human-written to deeply mixed authorship. To address the challenges of class imbalance and limited training data, we apply random undersampling to high-frequency categories and adopt data augmentation strategies-such as synonym substitution and back-translation-for underrepresented classes. Additionally, R-Drop regularization is introduced during the fine-tuning stage to reduce overfitting and enhance the model's generalization ability on unseen texts. Experimental results show that our proposed model significantly outperforms baseline systems lacking R-Drop and data balancing strategies. On the oficial test set, our system achieved a macro-level recall of 61.72% and ranked second overall, confirming the efectiveness of the resampling and regularization techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2025</kwd>
        <kwd>Human-AI Collaborative Text Classification</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>R-Drop</kwd>
        <kwd>DeBERTa-v3</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In recent years, large-scale pre-trained language models (LLMs) such as Claude, GPT, and LLaMA
have undergone rapid iterations. The resulting advancement in AI-generated content (AIGC) has
brought machine-generated texts to a level of fluency and semantic coherence that rivals, and in many
cases is indistinguishable from, human-written texts. While these developments have revolutionized
applications in dialogue systems, machine translation, and content generation, they have simultaneously
posed unprecedented challenges to authorship attribution and content authenticity verification.</p>
      <p>
        To foster progress in this domain, the PAN 2025 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] shared task on Voight-Kampf Generative AI
Detection [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] introduces a more fine-grained subtask: categorizing documents co-authored by humans
and AI into six levels of collaboration, ranging from fully human-written to deeply mixed authorship.
Existing approaches to AIGC detection generally fall into five categories: watermarking-based tracing,
zero-shot perplexity-based detection, fine-tuned language models, adversarial training, and large
language models employed as detectors [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Each of these strategies emphasizes diferent aspects such
as traceability, unsupervised discrimination, or robustness enhancement. However, under the complex
and nuanced six-class setting—particularly when minor human edits are involved—many single-strategy
models sufer from limited generalization and vulnerability to adversarial examples.
      </p>
      <p>
        To address these challenges, we propose a classification framework that combines the
DeBERTav3-base [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] pre-trained model with R-Drop regularization [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. DeBERTa-v3 enhances long-range
dependency modeling through disentangled attention and improved decoder masking, enabling more
accurate representation of subtle stylistic variations [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. R-Drop introduces dual forward passes during
training and minimizes the Kullback–Leibler divergence between outputs, thereby reducing overfitting
and encouraging decision boundary smoothness [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In addition, we apply a training data resampling
strategy that combines undersampling of majority classes with multi-strategy data augmentation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
for minority classes, including synonym substitution and back-translation, to improve representation
diversity.
      </p>
      <p>By integrating the DeBERTa-v3 pre-trained model, R-Drop regularization, and data balancing and
augmentation techniques, we construct a six-way classifier to assess the degree of human-AI collaboration
in text. Experimental results on the oficial development set demonstrate that our model significantly
outperforms baseline models without R-Drop or data balancing, confirming the efectiveness of the
proposed approach.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        With the rapid development and widespread deployment of large language models (LLMs) such as
Claude, GPT, and LLaMA, AI-generated content (AIGC) detection has emerged as a critical research
direction for ensuring content credibility and copyright compliance. This task is typically formulated as
a text classification problem. In this section, we provide a structured overview of current AIGC detection
strategies, which can be broadly categorized into watermarking techniques, zero-shot detection models,
supervised learning-based detectors, adversarial and robustness-oriented methods, and LLM-as-detector
paradigms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Watermarking Techniques: Watermarking techniques embed verifiable statistical fingerprints
into generated texts during inference, such as controlled token distributions or congruence-based
constraints, enabling downstream statistical tests to verify content provenance eficiently [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These
methods ofer fast inference and low false-positive rates, and—if enforced at the generation source—can
provide near-deterministic traceability. However, watermarks are often vulnerable to dilution through
post-processing steps such as clipping, translation, or rewriting, and are inefective against unauthorized
APIs or unknown-source texts.
      </p>
      <p>
        Zero-Shot Detection Models: Zero-shot approaches do not rely on labeled training data; instead,
they distinguish human- and machine-authored texts using statistical signals such as perplexity,
entropy, or n-gram rarity. Tools like GLTR [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]and DetectGPT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], for example, examine anomalies in
token confidence distributions to detect machine authorship. These models are naturally domain- and
language-agnostic, but their efectiveness diminishes on high-quality or human-refined texts. Moreover,
some variants incur substantial computational costs due to repeated perturbations or multiple forward
passes.
      </p>
      <p>
        Supervised Learning Approaches: Supervised methods leverage pre-trained language models such
as BERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], RoBERTa [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and DeBERTa [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], which are fine-tuned on annotated human-AI hybrid
corpora to capture deep semantic and syntactic distinctions. These models generally perform well in
single-domain, large-scale, and balanced datasets, and can be extended to multi-level classification tasks.
However, their generalization is often limited by the training distribution, leading to overfitting on
out-of-domain inputs or evasive edits, and their performance is highly sensitive to underrepresented
classes.
      </p>
      <p>
        Adversarial Learning Methods: This line of work enhances model robustness by generating
adversarial examples, incorporating consistency regularization (e.g., R-Drop), or employing contrastive
loss functions. For example, RADAR [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]and OUTFOX [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Empirical studies show that adversarial
training can substantially reduce the success rate of paraphrasing or syntactic evasion attacks, while
also enabling models to estimate confidence or uncertainty in predictions. Nevertheless, these methods
typically require carefully designed adversarial strategies, incur high training costs, and their gains
may be limited when synthetic adversarial samples diverge significantly from real-world attacks.
      </p>
      <p>
        LLMs as Detectors: Large language models (LLMs) can assess authorship by utilizing prompts that
frame the detection task. Early results were erratic and highly prompt-sensitive. However, in-context
learning (ICL) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]has improved stability by embedding a few curated input–label examples within
the prompt. Experimental findings demonstrate that the ICL strategy outperforms both traditional
zero-shot methods and RoBERTa-based detectors.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. System Overview</title>
      <p>In this section, we present the experimental model and methodology. Our approach is built upon the
DeBERTa-v3-base pre-trained language model, enhanced by the incorporation of R-Drop regularization.
In addition, we apply data balancing and augmentation strategies to the training set, which comprises
288,918 samples. These methods are designed to improve the model’s generalization ability and enhance
its stability during inference.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Balancing and Augmentation</title>
        <p>Large-scale imbalanced corpora often lead classifiers to overfit to majority classes, resulting in poor
performance on underrepresented categories—particularly classes 3, 4, and 5 in the six-way classification
task. To ensure the model captures fine-grained patterns of human-AI collaboration, we construct a
twostage preprocessing pipeline consisting of undersampling for majority classes and multi-strategy
augmentation for minority classes.</p>
        <p>The three most frequent classes (0–2) together account for 90.7% of the total training data. Direct
training on such skewed distributions would severely bias the decision boundaries. Based on
preliminary assessments of class dificulty and model capacity, we define a target class distribution of
40k:40k:40k:20k:20k:5k, which significantly increases the weight of rare categories while avoiding
excessive pruning of majority-class instances. As shown in Table 1.</p>
        <p>For majority classes (0, 1, and 2), we apply undersampling by fixing the random seed
random.seed(42) and randomly sampling 40,000 representative and diverse instances from each
class.</p>
        <p>For minority classes (3, 4, and 5), we apply multi-strategy augmentation. Specifically, classes 3, 4, and
5 are expanded using random oversampling combined with six data augmentation strategies:
• Random Swap: Randomly swaps 15% of word positions to increase syntactic diversity.
• Random Deletion: Deletes words with a probability of 0.1 to simulate abbreviation and
compression.
• Synonym Replacement: Replaces selected non-stopwords with their synonyms (retrieved via
embedding or lexical databases) to preserve semantics while diversifying expression.
• Back-Translation: Introduces structural variation through machine translation and
reconstruction.
• Sentence Shufle : Retains the first and last sentences while shufling intermediate ones to
simulate paragraph-level rewriting.
• EDA Combination: Applies multiple Easy Data Augmentation (EDA) operations (e.g., swap,
synonym replacement) sequentially to generate highly heterogeneous variants.</p>
        <p>These methods are applied with uniform random selection. As a result, classes 3 and 4 are each
augmented to 20,000 instances, while the most underrepresented class 5 is expanded from 1,368 to 5,000
instances.</p>
        <p>This “undersampling + multi-strategy augmentation” pipeline efectively mitigates distributional
bias and provides a balanced and diverse input space for subsequent fine-tuning with the DeBERTa
model and R-Drop regularization.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. R-Drop Regularization</title>
        <p>During fine-tuning of the DeBERTa-v3-base model, we adopt R-Drop (Regularized Dropout) to impose
a consistency constraint on the conventional dropout mechanism, aiming to mitigate overfitting and
enhance the model’s robustness in distinguishing fine-grained labels. Unlike traditional dropout, which
performs a single forward pass with stochastic masking, R-Drop conducts two independent forward
passes with diferent dropout masks on the same input batch and minimizes the Kullback–Leibler
(KL) divergence between their output distributions. This encourages consistency between the two
predictions and explicitly reduces discrepancies among sub-network outputs.significantly lowering the
model’s reliance on specific neuron co-activations and improving generalization.</p>
        <p>To formalize the R-Drop loss, let the input sample be denoted as  with its ground-truth label .
Under two independent dropout masks, the model produces predictive distributions  1 ( | ) and
 2 ( | ). The R-Drop loss combines dual cross-entropy with a symmetric Kullback–Leibler (KL)
divergence term:</p>
        <p>1
ℒR-Drop = 2</p>
        <p>1
[CE(,  1 ) + CE(,  2 )] +  · 2 [KL( 1 ‖  2 ) + KL( 2 ‖  1 )]
(1)</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Supervised Fine-Tuning</title>
        <p>We adopt DeBERTa-v3-base as the backbone model. Owing to its disentangled attention mechanism
and enhanced masked decoder, DeBERTa-v3-base demonstrates strong capabilities in modeling complex
semantic dependencies and capturing positional relationships, significantly improving contextual
understanding and structural representation.</p>
        <p>To construct a balanced training dataset, we perform undersampling on majority classes (labels
0, 1, and 2) and apply data augmentation to minority classes (labels 3, 4, and 5), resulting in a
wellproportioned dataset of approximately 165,000 instances.</p>
        <p>Subsequently, the model is fine-tuned with the R-Drop regularization technique. For each training
batch, two independent forward and backward passes are performed under diferent dropout masks,
and the Kullback–Leibler (KL) divergence between the two output distributions is calculated as a
regularization term. This encourages predictive consistency and helps reduce uncertainty, thus enhancing
model robustness.</p>
        <p>To prevent overfitting, we apply an early stopping strategy: training terminates once the validation
loss ceases to decrease. The model is evaluated using the oficial metrics—recall and F1 score—and the
best-performing checkpoint across all training epochs is retained. Final performance is assessed on the
held-out test set.</p>
        <p>As shown in Algorithm 1, the complete fine-tuning procedure of R-Drop applied to DeBERTa-v3-base
is detailed.</p>
        <p>Algorithm 1 Fine–Tuning DeBERTa-v3 with R-Drop Regularization
Require: Raw training set train, raw development set dev
Require: Pre-trained model DeBERTa-v3-base with parameters 
Require: Undersample size  =40k for classes 0–2, augmentation targets 3=4=20k, 5=5k
Require: Hyper-parameters: learning rate  , batch size , epochs , R-Drop weight  =1.0, random
seed 
Ensure: Fine-tuned model  ⋆ with best validation performance
1: Set random seed ; initialize tokenizer and optimizer with  ◁ Stage 1: Data Balancing
2: Split train by label ℓ ∈ {0, . . . , 5}: {ℓ}
3: Undersample majority classes: ℓ ← Sample(ℓ,  ) for ℓ ∈ {0, 1, 2}
4: Augment minority classes (ℓ ∈ {3, 4, 5}) with strategies random_swap, random_deletion,
back_translation, sentence_shufle , EDA_combination until |3|=|4|=3 and |5|=5
5: tbraalin ← ⋃︀ℓ5=0 ℓ; shufle with seed  ◁ Stage 2: Tokenization
6: Tokenize tbraalin and dev using max length 512 ◁ Stage 3: R-Drop Fine-Tuning
7: for epoch  = 1 to  do
8: for each mini-batch (, ) ⊂  tbraalin do
9: Forward pass #1: (z1, CE1) ←  (, )
10: Forward pass #2: (z2, CE2) ←  (, ) ◁ independent dropout masks
11: Compute symmetric KL loss: KL = 12 [KL(z1 ‖ z2) + KL(z2 ‖ z1)]
12: Total loss: ℒ = 12 (CE1 + CE2) +  KL
13: Back-propagate ∇ ℒ; update  with AdamW
14: end for
15: Evaluate on dev subset; save  if F1macro improves
16: if validation loss has not decreased for  consecutive epochs then
17: break ◁ early stopping
18: end if
19: end for ◁ Stage 4: Final Evaluation
20: Load best checkpoint  ⋆; evaluate on full dev/test set and report accuracy, recall, F1macro, F1micro
21: return Fine-tuned model parameters  ⋆</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental settings</title>
        <p>Model We adopt DeBERTa-v3-base1 as the encoder, given its disentangled self-attention and
enhanced mask decoder, which have demonstrated strong performance on long-sequence classification
tasks.</p>
        <p>Input preprocessing All documents are tokenised with the original DeBERTa WordPiece tokenizer.
Sentences exceeding 512 tokens are truncated, while shorter ones are padded on-the-fly with the special
&lt;pad&gt; token.</p>
        <p>Hyper-parameters Table 2 lists the full configuration used in every run. The R-Drop weight  is set
to 1.0 after a coarse logarithmic search in {0.05, 0.2, 1, 5, 10}.</p>
        <p>Evaluation protocol After each epoch, we save a checkpoint and we evaluate the model on the
development set to obtain timely feedback. During this evaluation, we compute metrics including
macro-F1, micro-F1, accuracy, and macro-recall, which facilitate the selection and preservation of the
best-performing model.
112 transformer layers, hidden size 768, 12 attention heads.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Result</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>As shown in Table 3, our model clearly outperforms the baseline, achieving higher scores on Recall
(Macro), F1 (Macro), and Accuracy.</p>
      <p>This paper presents our work on Subtask 2: Human-AI Collaborative Text Classification of the PAN 2025
Voight-Kampf Generative AI Detection challenge. We built our system based on the DeBERTa-v3-base
model, enhanced with R-Drop regularization and data balancing and augmentation techniques, in order
to improve the model’s generalization and robustness. Our approach ultimately achieved a strong
second-place result in the task. Comparative experiments demonstrated that incorporating R-Drop
during training positively contributed to the overall performance of the model. However, due to the
lack of more refined data processing and our still-limited understanding of the model architecture, the
system did not reach its full potential, leaving room for further improvement. In future work, we plan
to focus on deeper architectural enhancements to further improve performance.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Natural Science Foundation of China (No.62276064)</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsivgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Elozeiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “VoightKampf” Generative AI Authorship Verification Task at PAN</article-title>
          and
          <article-title>ELOQUENT 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Chao</surname>
          </string-name>
          ,
          <article-title>A survey on llm-generated text detection: Necessity, methods</article-title>
          , and future directions,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2310.14724. arXiv:
          <volume>2310</volume>
          .
          <fpage>14724</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen, Deberta:
          <article-title>Decoding-enhanced BERT with disentangled attention</article-title>
          , CoRR abs/
          <year>2006</year>
          .03654 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2006</year>
          .03654. arXiv:
          <year>2006</year>
          .03654.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing</article-title>
          ,
          <source>CoRR abs/2111</source>
          .09543 (
          <year>2021</year>
          ). URL: https://arxiv.org/ abs/2111.09543. arXiv:
          <volume>2111</volume>
          .
          <fpage>09543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Liu, R-drop:
          <article-title>Regularized dropout for neural networks</article-title>
          ,
          <source>CoRR abs/2106</source>
          .14448 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2106.14448. arXiv:
          <volume>2106</volume>
          .
          <fpage>14448</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>EDA: easy data augmentation techniques for boosting performance on text classification tasks</article-title>
          , CoRR abs/
          <year>1901</year>
          .11196 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1901</year>
          .11196. arXiv:
          <year>1901</year>
          .11196.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Hsieh</surname>
          </string-name>
          ,
          <article-title>Watermarking pre-trained language models with backdooring</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2210.07543. arXiv:
          <volume>2210</volume>
          .
          <fpage>07543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Strobelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          ,
          <article-title>GLTR: statistical detection and visualization of generated text</article-title>
          , CoRR abs/
          <year>1906</year>
          .04043 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1906</year>
          .04043. arXiv:
          <year>1906</year>
          .04043.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khazatsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Finn</surname>
          </string-name>
          , Detectgpt:
          <article-title>Zero-shot machinegenerated text detection using probability curvature</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2301.11305. arXiv:
          <volume>2301</volume>
          .
          <fpage>11305</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          , CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1907</year>
          .11692. arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          , P.-Y. Chen, T.-Y. Ho, Radar:
          <article-title>Robust ai-text detection via adversarial learning</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2307.03838. arXiv:
          <volume>2307</volume>
          .
          <fpage>03838</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Koike</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaneko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Okazaki</surname>
          </string-name>
          , Outfox:
          <article-title>Llm-generated essay detection through in-context learning with adversarially generated examples</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2307.11729. arXiv:
          <volume>2307</volume>
          .
          <fpage>11729</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , J. Ma,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sui</surname>
          </string-name>
          ,
          <article-title>A survey on in-context learning</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2301.00234. arXiv:
          <volume>2301</volume>
          .
          <fpage>00234</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>