<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Journal of Interactive Multimedia and Artificial Intelligence
8 (2024) 5-14. Special issue on Generative Artificial Intelligence in Education.
[6] P. He</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1186/s40561-023-00237-x</article-id>
      <title-group>
        <article-title>DBG: Human-AI Collaborative Text Classification with DeBERTa-enhanced Contextual and Geometric Attention</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tufeng Xian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yong Zhong</string-name>
          <email>zhongyong@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fen Liu</string-name>
          <email>liufen@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meifang Xie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qiyuan Sun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miaoji Zheng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weidong Wu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiliang Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan, University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>36</volume>
      <fpage>5</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>With the rapid development of generative large language models (LLMs), hybrid texts generated by human-AI collaboration are becoming increasingly common in real-world scenarios. Their detection and classification have become key challenges to ensure information authenticity and academic integrity. This paper proposes a three-level features fusion model called DBG (DeBERTa-BiLSTM-Geometric Attention) for the complex human-AI collaborative text classification problem in the PAN-CLEF 2025 task. The model achieves fine-grained classification through a three-stage collaborative mechanism: first, the decoupled attention of the DeBERTa-v3-large pre-trained model is used to capture global semantic features. Second, the bidirectional language dependency pattern in the text sequence is modeled through a bidirectional LSTM. Finally, the geometric attention module is innovatively introduced, combining one-dimensional convolution with a learnable position enhancement factor to dynamically enhance local discriminative features. Experiments show that DBG significantly outperforms the baseline model in six types of hybrid text classification tasks, with Macro Recall reaching 56.87% (an increase of 8.55% over RoBERTa-base), and F1 Macro Score and Accuracy increased to 56.45% and 66.81% respectively. This study provides technical support for risk prevention and control of human-machine collaborative content, and provides new ideas for the text traceability problem in generative AI ethical governance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2025</kwd>
        <kwd>Human-AI Collaborative Text Classification Work</kwd>
        <kwd>DeBERTa</kwd>
        <kwd>Geometric Attention</kwd>
        <kwd>BiLSTM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the rapid development of computing chip GPU hardware, computing power continues to
improve, driving the rapid development of artificial intelligence. In the era of big data, there is a huge
amount of data, which is enough to build a large data set, making it possible to scalably train language
models, which has completely changed the field of natural language processing. The advent of Large
Language Models (LLMs) has revolutionized traditional statistical machine learning methods. These
models demonstrate superior capabilities in tackling diverse Natural Language Processing (NLP) tasks.
Moving beyond earlier tasks like spam detection, sentiment analysis, automatic response, and academic
text summarization, generative large language models, trained on vast amounts of text data, are now
capable of generating fluent and coherent text [ 1, 2, 3, 4, 5]. In recent years, LLMs such as GPT-3/4, PaLM,
and Claude have seen rapid technological progress, with the quality of the generated text approaching
and even surpassing that of human writing. The GPT-3 model released by OpenAI in 2020, with its
huge scale of 175 billion parameters, can generate logically coherent long articles, poems, codes and
other content based on simple instructions; and the GPT-4 launched in 2023 has made breakthroughs in
multimodal understanding and complex reasoning capabilities, further blurring the boundaries between
human and machine-generated content. According to Stanford University’s "2023 Artificial Intelligence
Index Report", GPT-4 has performed better than 90% of human candidates in professional exams (such
as the bar exam), and the semantic rationality and style diversity of its generated text have reached
the level of being indistinguishable from the real thing. This progress has led to the development of
large language models (LLMs), which has greatly changed the way people generate and interact with
machine text. LLMs have demonstrated a strong ability to generate text that matches human writing.</p>
      <p>
        To enhance the performance of generative AI detection for large language models and improve
classification accuracy in subtask 2’s human-machine collaborative text classification task, the complexity
inherent in the six distinct classification categories necessitates the development of a more robust model
architecture. Therefore, we propose a novel model for multi-accumulated text classification called DBG,
which cleverly combines advanced pre-training, sequence modeling and geometric attention mechanism
to address the key challenges of language analysis. DBG is based on DeBERTa-v3-large, and uses its
decoupled attention to capture subtle contextual relationships in text sequences. To enhance sequential
pattern learning, we integrate a bidirectional LSTM layer to process the representation encoded by
DeBERTa[6], thereby efectively modeling forward and backward language dependencies. The
architecture also integrates a novel geometric attention module with a learned position enhancement factor[7],
which uses one-dimensional convolution to dynamically highlight discriminative text features while
preserving structural relationships within the sequence. The entire model exhibits synergistic efects
through three core mechanisms:
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) DeBERTa’s powerful contextual embedding builds a semantic foundation.
(2) BiLSTM layers model hierarchical language patterns.
(3) Geometric attention mechanism optimizes feature extraction through learnable spatial reasoning.
      </p>
      <p>Experimental verification on benchmark datasets shows that our model achieves significant
performance improvements, especially macro recall (baseline 48.32% vs ours 56.87%). In addition, F1, recall,
is also significantly higher than the baseline model, highlighting that the DBG model has a strong
human-machine collaborative classification capability and is believed to be able to efectively solve the
complex problems in generative AI author authentication.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Despite their revolutionary potential, the technological advancements of Large Language Models
(LLMs) are accompanied by significant societal risks: the misuse of machine-generated text exhibits
exponential growth, manifesting as critical issues including opinion manipulation, disinformation
propagation, and systematic plagiarism. This underscores the imperative to establish ethical constraints
and risk mitigation frameworks alongside technological deployment. Against this backdrop, Generative
AI Authorship Verification has emerged as a critical research focus. Notably, the Voight-Kampf task
subtrack for human-AI collaborative text classification at PAN@CLEF 2025 demonstrates particular
foresight—by defining six categories of human-AI collaboration patterns, this initiative transcends
the limitations of conventional binary classification. It not only directly addresses detection blind
spots for hybrid text but also provides multidimensional support for ethical governance and technical
optimization[5]. Current mainstream techniques can be categorized into three primary approaches:
Jakesch et al.[8] revealed through cognitive experiments that humans’ reliance on surface-level
features (e.g., first-person pronouns, domestic topics) for text attribution constitutes a fundamental flaw,
demonstrating that such heuristic rules are vulnerable to exploitation by "hyper-humanized" generated
content. Chakraborty [9] leveraged BERT’s contextual embeddings to capture implicit patterns in
generated text, while Guo et al.[10] innovatively integrated Transformer encoders with spaCy-derived
multi-scale textual features, enhancing long-text semantic coherence analysis via BiLSTM. Nevertheless,
current research exhibits notable limitations, First, while mainstream methods achieve high accuracy
(&gt;89%) for purely machine-generated text[11], their performance plummets by 37-52% when detecting
human-AI hybrid documents[5], with particularly low F1-scores (merely 0.63) in fine-grained
collaborative category classification. Second, supervised learning approaches generally sufer from model
dependency, exhibiting up to 28 percentage points of accuracy degradation when detecting outputs from
closed-source models (e.g., GPT-4, Claude 3). Third, existing systems struggle to adapt to LLMs’ rapid
evolution—novel RLHF alignment techniques have increased human-like feature density in generated
text by 42%, while multimodal content exceeds current detection frameworks’ scope. Fourth,
surfacefeature-based strategies incur false positive rates as high as 65% in specialized domains[9], exposing
deficiencies in semantic depth analysis mechanisms. These bottlenecks compellingly demonstrate the
necessity for novel verification frameworks specifically designed for hybrid text characteristics and
adaptive to dynamic technological evolution.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This paper proposes the DBG model (DeBERTa-BiLSTM-Geometric Attention) to perform
multidimensional feature modeling for generative text classification tasks. The model architecture is based
on the DeBERTa-v3-large pre-trained language model, and builds deep contextual representations
through the decoupled attention mechanism. A bidirectional LSTM network is connected after the
encoding layer to capture the bidirectional language dependency pattern in the sequence. The geometric
attention module is innovatively introduced, and the spatial enhancement factor of text features is
dynamically learned using a one-dimensional convolution kernel. The extraction of discriminative
features is optimized through learnable local position weights. The three-layer architecture forms a
synergistic mechanism: DeBERTa provides global semantic representation, BiLSTM models sequence
pattern evolution, and geometric attention strengthens the discriminative power of local features. The
joint optimization of the three significantly improves the fine-grained classification ability of generated
text.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset Preprocessing</title>
        <p>This experiment is based on the PAN 2025 human-machine collaborative text classification task dataset,
which is publicly provided by the Zenodo platform and covers academic papers, news reports, social
media and other multi-field texts. Data annotation includes six types of human-machine collaboration
(see Table 1 for details), covering complex scenarios such as human-led creation, machine-generated
post-processing, and deep mixed text. The original data is stored in JSONL format. Each piece of data
contains text content (the ‘text‘ field) and category labels (the ‘label‘ field), supporting multilingual
analysis in English, Spanish and German. The type label is an int of 0-5, that is, [0, 1, 2, 3, 4, 5], and the
corresponding text description is as follows:
id2label = {
0: “fully human-written”,
1: “human-written, then machine-polished”,
2: “machine-written, then machine-humanized”,
3: “human-initiated, then machine-continued”,
4: “deeply-mixed text; where some parts are written by a human and some are generated by a
machine”,
5: “machine-written, then human-edited”
}</p>
        <p>The sample sizes of the training set and the development set are 288,918 and 72,661 respectively,
and the category distribution shows a significant imbalance. For example, the "Human-initiated, then
machine-continued" category accounts for 51.2% of the development set, while "Deeply-mixed text"
accounts for only 0.3% (see Table 1 for details). This distribution characteristic reflects the diversity
of human-machine collaboration modes in real scenarios, but it is not a good thing for us to use
large models to solve human-machine collaboration text classification tasks. In order to obtain better
classification accuracy, sometimes we need to think about how to solve the problem of data imbalance.</p>
        <p>In order to better input data into the model to train our model, we preprocess the data as follows:
• Label normalization: Map the category label to an integer value of 0-5, and use the id2label
dictionary to maintain semantic interpretability. For possible label missing, set invalid labels to
-1 and filter abnormal samples.
• Text segmentation: Use DebertaV2Tokenizer dedicated to DeBERTa-v3-large to segment the
text into subwords. Set the maximum sequence length to 512, truncate the tail of overlong text,
and fill the insuficient part with [PAD] tags.
• Batch encoding: Generate a fixed-length tensor through the truncation = True and padding =
"max_length" parameters, and construct input_ids and attention_mask as model input.</p>
        <p>This preprocessing process efectively retains the semantic and structural features of the text, while
adapting to the input specifications of the pre-trained model, providing structured data support for
subsequent multi-task learning.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Network Architecture</title>
        <p>The traditional human-machine collaborative text classification methods have the following main
shortcomings: reliance on artificial feature engineering and single-modal encoding leads to insuficient
capture of complex patterns of generated text, one-way sequence modeling makes it dificult to parse
traces of two-way collaboration, and the static attention mechanism lacks dynamic enhancement
of local key features, resulting in limited fine-grained classification performance. The DBG model
proposed in this paper adopts a three-level feature enhancement architecture, as shown in Figure 1. The
model extracts global semantic features through the DeBERTa pre-trained encoder, captures sequence
pattern dependencies through bidirectional LSTM, and strengthens local discriminative features through
geometric attention. The three-stage together to improve the fine-grained classification capabilities of
human-machine collaborative text.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. DeBERTa Disentangled Attention Encoder</title>
          <p>As the model base, it is responsible for extracting deep contextual semantic representations from the
input text and solving the semantic ambiguity problem of the generated text. We use the
DeBERTav3-large pre-trained model, whose core is the disentangled attention mechanism, which separates the
content and position encoding calculations:</p>
          <p>
            Attn(, , ,  ) = Softmax
︂(  +   )︂
√

(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
Where Qc, Kc are content vectors, Qr is the relative position vector, and P is a trainable position
embedding matrix. This design enables the model to independently model the relative position relationship
between words and enhances the ability to capture long-range dependencies in the generated text.
Output hidden stateH ∈ R× 1024, where L is the sequence length and 1024 is the DeBERTa hidden
layer dimension.
after dimensionality reduction.
          </p>
          <p>(2) BiLSTM:
Among them,W ∈ R1024× 256is a trainable parameter, and the core semantic information is retained
H = WH + b
By concatenating the forward→−h and the backward− ←h hidden states, we obtain the bidirectional
temporal feature→−H = →[︁−H ;→−H]︁ ∈ R× 512, which efectively models the contextual dependencies in
the generated text.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.3. Geometric Attention Enhancement Module</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.2. Projection Dimensionality Reduction and Bidirectional LSTM</title>
          <p>The feature dimension is reduced by flattening to reduce computational complexity, and local language
patterns are captured by bidirectional temporal modeling to solve the sequence coherence diferences
of generated text.</p>
          <p>
            (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) Projection layer: DeBERTa output is mapped to a low-dimensional space:
 = (0) +  ∈ R
6
Where  ∈ R512× 6 maps the features to 6 category spaces. Practical analysis shows that the [CLS]
tag can efectively aggregate the collaborative mode features of the full text after multi-layer attention
transfer.
          </p>
          <p>This architecture achieves multi-granular modeling of traces of human and machine collaboration in
generated text through a hierarchical feature refinement mechanism. Experiments have verified that
its classification performance on diferent collaboration modes is significantly better than the baseline
model (see Chapter 3).</p>
          <p>By focusing on key local features through learnable spatial weights, the problem of hidden feature
extraction of machine-generated fragments in generated text is solved.</p>
          <p>
            (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) One-dimensional convolution transformation: feature space transformation of bidirectional LSTM
output:
(2) Location enhances attention:
 = Conv1D(, ),  = Conv1D(,  )
 = Softmax 
︂(
︂( ⊤
√
 ⊙ 
︂)
 is the SiLU activation function, which dynamically adjusts the attention distribution;  ∈ R is a
learnable position enhancement factor, which amplifies the weight contribution of important positions
(3) Residual connection:
          </p>
          <p>= Conv1D( ) + 
The original sequence structure information is retained to avoid feature shift caused by the attention
mechanism.</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.4. Classification decision layer</title>
          <p>The final classification is completed based on the aggregated features of the [CLS] tag, and the
global representation is used to distinguish complex collaborative patterns. The first position vector
(corresponding to the [CLS] tag) of the geometric attention output is taken as the global feature:
(2)
(3)
(4)
(5)
(6)
(7)
(8)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <sec id="sec-4-1">
        <title>4.1. Experiment Settings</title>
        <p>In the training experiment of DBG, we trained the model on the training data of the oficial train,
and used dev as the validation data of the model. The ratio of train.jsonl to dev.jsonl is about 8:2. This
model uses the DeBERTa-v3-large pre-trained model to build the classification framework, setting
the projection layer dimension to 256, and the bidirectional LSTM adopts a single-layer bidirectional
structure. We apply layer normalization between LSTM layers and set the recurrent connection drop
rate to 0.3. The geometric attention module contains 8 parallel convolutional layers. The model adopts
a hierarchical learning rate strategy in the training phase. The pre-trained DeBERTa uses a learning
rate of 1e-5, and the newly added module uses a learning rate of 1e-4. The model is trained for 5 epochs
in total, and mixed precision training is performed using the AdamW optimizer. The batch is set to 8
and 4 steps of gradient updates are accumulated, and global gradient clipping (threshold 1.0) and early
stopping mechanism (development set loss is terminated after 10 rounds of no improvement) are used.
In data processing, the text content is unified to a length of 512 tokens, and DebertaV2Tokenizer is used
for word segmentation. The experiment was run on a single RTX 3090 GPU, with a fixed random seed
of 42 to control parameter initialization and data shufle, and a full training cycle of 20 hours.</p>
        <p>In the test phase, we used the model saved in the training phase to reason on the unlabeled test set
(subtask2_test.jsonl) provided by the CodaLab platform. First, the original text was standardized
into an input sequence of 512 tokens in length through DebertaV2Tokenizer. After loading the optimal
training model, full batch reasoning was performed on the RTX 3090 GPU with a batch size of 16.
Mixed precision calculation and torch.no_grad mode were enabled to accelerate the prediction
process. After obtaining the category probability distribution of each sample through argmax, the
integer prediction label (range 0–5) was strictly bound to the original test sample ID in the key-value pair
format of "id": "label", and written into a new result JSONL file (such as predictions.jsonl)
by line. This process took about 1 hour. Finally, the format_checker.py script provided by the
oficial was used to verify the text format, and finally a standardized submission file that meets the
requirements of the CodaLab evaluation platform was generated.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Metrics</title>
        <p>According to the particularity of this human-machine collaborative classification task, the oficial
defines four indicators, namely Macro Recall, F1 Macro Score, F1 Micro Score and Accuracy. Among
them, Macro Recall is the most important and is the primary indicator of the ranking list. These
indicators are good for evaluating the performance of the model and the degree of completion of the
task. The following section will introduce these indicators in detail.</p>
        <p>Macro Recall: The arithmetic mean of the recall rates of all categories, treating the recognition
ability of each category equally[12]. Its formula is defined as follows:</p>
        <p>Macro Recall = 1 ∑︁</p>
        <p>=1   +  
( = 6)
Among them,   represents the number of true positive samples (correctly predicted samples) of the
th category;   represents the number of false negative samples (missed samples) of the th category</p>
        <p>F1 Macro Score: The arithmetic mean of the F1 scores of all categories, which comprehensively
balances the precision and recall[13]. Its formula is defined as follows:
(9)
(10)
F1 Macro =
 =1
1 ∑︁ 2 · Precision · Recall</p>
        <p>Precision + Recall
   
Among them, Precision =  +  , Recall =  +  . Relying on the characteristics of
macroaverage, the F1 value of each category is calculated independently and then averaged to avoid large
categories dominating the results.</p>
        <p>F1 Micro Score: The F1 value calculated based on global statistics (the sum of TP/FP/FN of all
categories) reflects the performance dominance of high-frequency categories[ 14]. Its formula is defined
as follows:</p>
        <p>F1 Micro = 2 · ∑︀=1   (11)</p>
        <p>2 · ∑︀=1   + ∑︀=1 (  +  )</p>
        <p>Accuracy: The ratio of correctly predicted samples to the total number of samples measures the
overall classification performance[15]. Its formula is defined as follows:</p>
        <p>Accuracy =
∑︀=1  
total
(total = Total number of samples)
(12)</p>
        <p>These four indicators efectively quantify the efectiveness of DBG in the human-machine
collaborative text classification task through the combined efect of a comprehensive evaluation model. Accuracy
focuses on category balance, Macro Recall focuses on full category coverage, F1 Macro Score focuses on
ifne-grained classification stability, and F1 Micro Score focuses on the dominance of high-frequency
categories. A multi-dimensional evaluation system covering scenarios with uneven data distribution is
comprehensively constructed to efectively evaluate the model’s capabilities in this classification task.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results</title>
        <p>The model we proposed performed well in the PAN-CLEF 2025 Subtask 2: Human-AI Collaborative
Text Classification task, and was significantly higher than the baseline model in four key indicators,
demonstrating its efectiveness and competitiveness in this task. As shown in Table 2, our model
achieved a significant breakthrough in the oficial indicator Macro Recall, reaching 56.87%, which is
8.55% higher than the 48.32% of RoBEta-base, indicating that it has a higher discriminative ability in this
human-machine collaborative text classification; in addition, the F1 Macro Score score is 56.45%, which
significantly exceeds the baseline model, reflecting its stability in this classification task. The Accuracy
score is 66.81%, which is 9.72% higher than the 57.09% of the baseline model, reflecting the model’s
robust performance in dealing with the class imbalance problem in this classification task.However,
through our testing, the average single sample inference delay of the 512 token sequence on the RTX
3090 GPU for DBG is * * 1200ms * * (standard deviation 85ms), with a peak memory usage of 10GB. In
the future, knowledge distillation technology will be used to improve inference speed.</p>
        <p>In terms of competition ranking, our entry ranked 5th in the oficial leaderboard of PAN-CLEF
2025 Subtask 2 (a total of 22 contestants). This ranking fully demonstrates the competitiveness of
DBG and its excellent classification ability in this task among many excellent entries and diferent
types of participating models. It can be said that it has excellent discrimination ability in complex
classification tasks such as multi-class text classification and single text content with mixed and varied
content.However, through our tests, DBG’s single-sample inference latency for a 512-token sequence
on an RTX 3090 GPU averaged 1200ms (standard deviation 85ms), and the peak memory usage reached
10GB. In the future, knowledge distillation technology will be used to improve inference speed.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, a multi-level classification framework DBG combining pre-trained language model,
sequence modeling and dynamic attention enhancement is proposed to solve the classification problem
of generative AI and human collaborative text. Through the deep semantic encoding of DeBERTa, the
bidirectional sequence dependency modeling of BiLSTM and the local feature extraction of geometric
attention, the model efectively solves the problem of hidden feature extraction of human-machine
collaboration traces in mixed text. Experiments verify the robustness of DBG in scenarios with highly
uneven data distribution. It performs well in various indicators, with a Macro Recall score of 56.87%, an
F1 Macro Score score of 56.45%, and an Accuracy score of 66.81%, all of which exceed the benchmark.
This achievement provides a technical solution for practical needs such as authenticity verification of
AIgenerated content and academic misconduct detection, and lays the foundation for multimodal
humanmachine collaborative analysis. In future work, we will further improve DBG, including optimizing
its hyperparameters, enhancing data preprocessing, improving deep feature fusion capabilities, and
exploring the combination of other methods to improve system performance and detection accuracy.
At the same time, we will also explore the expansion of cross-language transfer capabilities and
the optimization of the explainability of attention mechanisms to meet more complex generative AI
governance challenges.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used DeepSeek-R1 in order to: Grammar and spelling
check. After using this tool, the authors reviewed and edited the content as needed. full responsibility
for the publication’s content.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgements</title>
      <p>This work was supported by grants from the Guangdong-Foshan Joint Fund Project (No.
2022A1515140096) and Open Fund for Key Laboratory of Food Intelligent Manufacturing in Guangdong
Province (No. GPKLIFM-KF-202305).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>