<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Text Author Classification: A ModernBERT Approach with Gradient Loss Function</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhankeng Liang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kaiyin Sun</string-name>
          <email>sunkaiyin123@163.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haojie Cao</string-name>
          <email>caohaojie0322@163.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jieren Luo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhongyuan Han</string-name>
          <email>hanzhongyuan@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan No.3 Middle School</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper focuses on the text author (human or AI) classification problem, aiming to enhance the model's ability to detect AI-generated texts. We propose an innovative method based on ModernBERT, which introduces a custom gradient loss function and optimizes model training by combining sample weighting strategies, efectively enhancing the model's classification capability for texts of varying dificulties. In implementation, we ifne-tuned the ModernBERT model, optimized the training process using the custom gradient loss function, and constructed an eficient classification system through meticulous data preprocessing and rigorous testing evaluation. Experimental results show that this method achieved good performance metrics on the training set, but there was some overfitting on the test set, with performance metrics declining. Future work will be dedicated to further optimizing the model's generalization ability to improve its potential for application in multi-domain text classification.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2025</kwd>
        <kwd>Text Author Classification</kwd>
        <kwd>ModernBERT</kwd>
        <kwd>Gradient Loss Function</kwd>
        <kwd>AI Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the tremendous success of large language models in recent years and the rapid development
of AI content generation technology, text author classification (human or AI) has gradually become
an important research area. In recent years, the gap between AI-generated texts and human texts in
terms of language fluency and logical coherence has gradually narrowed, and in some scenarios, it is
even dificult to distinguish between them. This makes text author classification crucial for content
authenticity verification, copyright protection, and information security defense. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] The research
work in this paper is conducted for the Voight-Kampf Generative AI Authorship Verification task at
the PAN 2025 workshop. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
      </p>
      <p>This study aims to develop a classification model that can efectively distinguish whether the text
author is human or AI. Based on the pre-trained ModernBERT model, we referred to a training method
that combines gradient loss and sample weighting. During training, we not only focused on classification
accuracy but also controlled the model’s learning intensity for samples of diferent dificulties through
the gradient loss function to prevent overfitting to dificult samples. We also assigned weights to the
samples in the dataset, with higher weights for easy samples and lower weights for dificult samples, to
adjust the loss calculation. Experiments have shown that this method can improve the model’s accuracy
on the validation set, verifying its efectiveness in the text author classification task.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Currently, research in this field mainly focuses on the application of text feature analysis and deep
learning models. Traditional methods extract text features such as vocabulary, grammar, and syntax,
and use machine learning algorithms such as Support Vector Machine (SVM), Naive Bayes, Logistic
Regression, and Decision Tree for classification. Although these methods can distinguish between
human and AI-generated texts to some extent, feature engineering is complex and relies on human
experience, making it dificult to efectively extract and utilize complex text features. Moreover, with
the development of deep learning technology, models based on Recurrent Neural Networks (RNN) and
its variants (such as LSTM, GRU) have been applied to text author classification. These models can
automatically learn the sequential features and semantic information of texts, resulting in improved
classification performance compared to traditional methods. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] In addition, Convolutional Neural
Networks (CNN) have also been applied in this field, mainly for extracting local features and structural
information of texts. However, these methods have limited ability to handle long texts and complex
semantic relationships and are easily afected by noisy data.
      </p>
      <p>
        In recent years, pre-trained language models based on the Transformer architecture (such as BERT
and its variants, GPT series, etc.) have gradually become the mainstream method for text author
classification. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] These methods first use unsupervised pre-training on large-scale unannotated text
data to enable the model to learn rich text features and semantic knowledge. Then, through fine-tuning
strategies, they can efectively capture the diferences between human and AI-generated texts, achieving
higher accuracy in classification tasks. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Method Introduction</title>
      <p>
        This experiment is based on the ModernBERT model and employs an eficient text author classification
method, aimed at overcoming the limitations of traditional models such as the original BERT in handling
long texts, computational eficiency, and timeliness of data. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]The ModernBERT model, introduced
by Warner et al. (2024), provides a robust foundation for text classification tasks with its capability to
process sequences up to 8192 tokens, optimized architectural design, and adaptability to new data. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
      </p>
      <p>In the experiment, we first preprocessed the training and validation data, including tokenization,
truncation or padding to a fixed length, and assigning weights based on sample dificulty. Then, we
finetuned the ModernBERT model using this data by adding a classification layer that utilizes the CLS token
as input to capture the global representation of the text for the binary classification task and optimizing
model parameters to adapt to specific text classification tasks. Additionally, a custom gradient loss
function was introduced, which efectively controlled the learning intensity of the model for samples
of varying dificulties by adjusting the gradient of the loss function for model outputs, preventing
overfitting. Ultimately, the model performed well on the training set, verifying its efectiveness and
superiority in text author classification tasks.</p>
      <p>Figure 1 below shows the basic workflow of the ModernBERT model in the experiments of this paper.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <sec id="sec-4-1">
        <title>4.1. Overview of Experimental Requirements</title>
        <p>
          This experiment aims to construct a binary classification model and program that can accurately
distinguish whether the text author is human or AI. The experimental dataset is provided by PAN lab
2025 and contains text samples generated by humans or machines in JSONL format. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] Each sample
includes the text content, a unique identifier, the author type (human or a specific AI model), a label
(0 for human, 1 for AI), and the text genre (such as prose, news, or fiction). The test set only includes
the text and identifier, used to evaluate the model’s generalization ability. Evaluation metrics include
ROC-AUC, Brier score, C@1, F1 score, F0.5u, the arithmetic mean of these metrics, and the confusion
matrix used to calculate true/false positives and true/false negatives.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Data Source</title>
        <p>
          The fine-tuning training data for this model comes from the Zenodo platform, uploaded by PAN lab 2025.
The dataset includes training and validation datasets, containing text samples generated by humans
and multiple AI models to achieve training and prediction purposes. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] Additionally, some of the test
datasets are not publicly available due to competition reasons and are only allowed for online testing
by uploading the program.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Program Design</title>
        <sec id="sec-4-3-1">
          <title>4.3.1. Model Selection and Initialization</title>
          <p>This program uses the pre-trained ModernBERT-base model from answerdotai company as the base
model and fine-tunes it (fine-tuning) with the training dataset to adapt to the text author classification
task. The ModernBERT model, based on the Transformer architecture, uses a self-attention mechanism
to capture long-range dependencies in the text. The output layer of the model is adjusted for binary
classification tasks, with an output category number of 2 (human or AI).</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.3.2. Data Preprocessing</title>
          <p>The training and validation datasets are provided in JSONL format. The data preprocessing steps are as
follows:
1. Tokenize the text using the ModernBERT tokenizer.
2. Truncate or pad the tokenized sequence to a fixed length (512 tokens). Although ModernBERT
can handle sequences of up to 8192 tokens, we truncate or pad the text to 512 tokens here to align
with the input requirements of the dataset and to ensure uniformity of model input, which can
also improve the computational eficiency during training.
3. Assign weights to each sample based on the dificulty label. The dificulty of a sample is determined
by its distinguishability from human-written text. Samples that are more easily distinguishable
(i.e., those with more obvious AI characteristics) are labeled as easy and assigned higher weights,
while samples that are harder to distinguish (i.e., those closely resembling human writing style)
are labeled as dificult and assigned lower weights. This strategy allows the model to focus more
on dificult samples during training, thereby improving its generalization ability.</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>4.3.3. Model Training</title>
          <p>The model training process includes several key steps:
1. Define a custom dataset class: Create a custom dataset class MyDataset to load and preprocess
training and validation data. This class is also responsible for reading JSONL files, tokenizing,
truncating, and padding the text, and assigning weights to each sample based on the dificulty
label.
2. Custom gradient loss function: A custom gradient loss function is defined in the experimental
program, combining the standard cross-entropy loss and its gradient for model outputs, while
adjusting the gradient size based on sample weights. The formula for the custom gradient loss
function is as follows:</p>
          <p>⃦
ℒgrad = 1 ∑=︁1 ⃦⃦⃦ ∇ CE(logits, labels) · weights⃦⃦ 2
where CE is the cross-entropy loss function, logits represents the gradient of the model’s output
logits, weights are the sample weights, and N is the batch size. This formula calculates the gradient
of the cross-entropy loss for the model’s output, adjusts the gradient size based on sample weights,
and ultimately derives the gradient loss to optimize model training and prevent overfitting.
3. Training process: During the training phase, the preprocessed data is loaded onto the GPU, and
the AdamW optimizer is used to zero the model’s gradients. In training, the model performs
forward propagation to obtain output results, calculates the loss using the custom gradient loss
function, and then updates the model parameters through backpropagation and the optimizer.
The entire training process lasts for three epochs, and the best model parameters are saved by
monitoring the validation set accuracy.</p>
        </sec>
        <sec id="sec-4-3-4">
          <title>4.3.4. Prediction Data</title>
          <p>The prediction process includes the following steps:
1. Define the test dataset class: Load the trained model parameters and define the test dataset class
TestDataset to load and preprocess the test data, similar to the custom dataset class MyDataset
used in the training model above.
2. Prediction function: The prediction function loads the test data, performs inference through the
model, converts the predicted confidence scores into probabilities through the sigmoid function,
and then converts them into binary labels based on a threshold of 0.5. Finally, the prediction
results are saved in JSONL format.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Experimental Results and Analysis</title>
        <sec id="sec-4-4-1">
          <title>4.4.1. Training Results</title>
          <p>
            This experimental program has been uploaded to the oficial testing platform tria of PAN lab 2025 for
oficial testing [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. The program has undergone three online tests using both the training and test
datasets. In the tests on the training dataset, the average ROC-AUC metric value was 0.874, the average
Brier metric value was 0.878, the average C@1 metric value was 0.878, the average F1 metric value was
0.904, the average F0.5u value was 0.913, the average Mean value was 0.889. In the tests on the test
dataset, the average ROC-AUC metric value was 0.844, the average Brier metric value was 0.853, the
average C@1 metric value was 0.853, the average F1 metric value was 0.815, the average F0.5u value
was 0.917, the average Mean value was 0.856.
          </p>
          <p>Tables 4-1 and 4-2 below show the evaluation results of the test and training datasets on the tria
online platform, respectively.</p>
          <p>F1</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>4.4.2. Results Analysis</title>
          <p>Based on multiple tests of the test and training datasets, overall, the program using the pre-trained
ModernBERT model and fine-tuning it to adapt to the text author classification task has achieved good
results. However, there is still room for improvement in the model’s generalization ability. The high
metric values on the training set indicate that the model fits the training data well, but the performance
drop on the test set suggests insuficient adaptability to new data. In the future, methods such as data
augmentation, regularization, or adjusting the model architecture could be considered to enhance the
model’s generalization ability to cope with the diversity of test data.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This thesis innovatively proposes a text author classification method based on ModernBERT, conducting
in-depth research on the detection of human and AI texts. For the first time, a gradi-ent loss function
is applied to the fine-tuning process of ModernBERT, combined with a sample weighting strategy to
optimize model training, efectively improving the classification perfor-mance for texts of diferent
dificulties. In implementation, through meticulous data prepro-cessing, custom gradient loss calculation,
model fine-tuning, and rigorous testing evaluation, an eficient classification system is constructed.
Experiments have verified the advantages of this method over traditional models, providing new ideas
for the identification of AI-generated con-tent and potentially expanding its application in multi-domain
text classification in the future.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the Social Science Foundation of Guangdong Province, China
(No.GD24CZY02)</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author utilized Kimi K2 and DeepSeek-V3 to accomplish
drafting content, text translation, and content enhancement tasks. After employing these tools, the
author reviewed and edited the content as necessary and took appropriate measures to assume full
responsibility for the content of the publication.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Quach</surname>
          </string-name>
          ,
          <article-title>Advancements in natural language processing: Exploring transformerbased architectures for text understanding</article-title>
          ,
          <source>arXiv preprint arXiv:2503.20227</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsivgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Elozeiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Vardhan</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “VoightKampf” generative AI authorship verification task at PAN</article-title>
          and
          <article-title>ELOQUENT 2025</article-title>
          , in: Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org, Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mahesh</surname>
          </string-name>
          ,
          <article-title>Machine learning algorithms-a review</article-title>
          ,
          <source>International Journal of Science and Research (IJSR) 9</source>
          (
          <year>2020</year>
          )
          <fpage>381</fpage>
          -
          <lpage>386</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Sadasivan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Balasubramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feizi</surname>
          </string-name>
          ,
          <article-title>Can AI-generated text be reliably detected?</article-title>
          ,
          <source>arXiv preprint arXiv:2303.11156</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Warner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chafin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Clavié</surname>
          </string-name>
          , Smarter, better, faster, longer
          <article-title>: A modern bidirectional encoder for fast, memory eficient, and long context finetuning and inference</article-title>
          ,
          <source>arXiv preprint arXiv:2412.13663</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Continuous integration for reproducible shared tasks with TIRA.io</article-title>
          ,
          <source>in: Advances in Information Retrieval: 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Lecture Notes in Computer Science, Springer, Berlin, Heidelberg,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          . URL: https://link.springer.com/chapter/10. 1007/978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>