<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ReText.Ai Team at PAN 2025: Applying a Multiple Classification Heads to a Transformer Model for Human-AI Collaborative Text Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daria Ignatenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Zaitsev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Shkriaba</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HSE University</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ReText.Ai Team</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents the ReText.Ai team's solution to the Human-AI Collaborative Text Classification subtask of the PAN-2025 Generative AI Authorship Verification Task. Our approach involves fine-tuning transformer models, such as RoBERTa-base and Gemma-2 2B, with a custom multi-head classifier that includes a main multiclass head and auxiliary binary heads to better distinguish closely related labels. Through utilizing a transformer-based model augmented with multiple classification heads and a confidence-based override mechanism, our method outperforms the baseline, achieving macro Recall scores of 80.36% and 83.00% for RoBERTa-base and Gemma-2 2B, respectively, compared to 68.67% and 75.70% for the baseline models. In the competition, our team's fine-tuned Gemma-2-2B model achieved seventh place in the automated evaluation on the test set with a score of 56.11%.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2025</kwd>
        <kwd>Voight-Kampf Generative AI Detection 2025</kwd>
        <kwd>Human-AI Collaborative Text Classification</kwd>
        <kwd>AI-generated text detection</kwd>
        <kwd>Multi-head classifier</kwd>
        <kwd>RoBERTa</kwd>
        <kwd>Gemma-2</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>we present the results obtained and compare them with those of the other participants in the shared
task. We demonstrate how our approach improves upon the baseline.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data</title>
      <p>The dataset from the shared task contains samples with the following labels:
• Fully human-written: The document is entirely authored by a human without any AI assistance.
• Human-initiated, then machine-continued: A human starts writing, and an AI model
completes the text.
• Human-written, then machine-polished: The text is initially written by a human but later
refined or edited by an AI model.
• Machine-written, then humanized: An AI generates the text, which is later modified to
obscure its machine origin.
• Machine-written, then human-edited: The content is generated by an AI but subsequently
edited or refined by a human.
• Deeply-mixed text: The document contains interwoven sections written by both humans and</p>
      <p>AI, without a clear separation.</p>
      <p>The dataset was derived from various sources. It also contains additional information, such as the
model that produced the text and the language used (English, Spanish, or German). The authors provide
three subsets of the dataset: training, development, and testing. Labels are known for the training and
development sets, but not for the test set. Table 1 presents the statistics for each subset.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>In this section, we describe our approach to developing a custom classification model for the Human-AI
Collaborative Text Classification task. Our methodology leverages text preprocessing and fine-tuning a
transformer-based architecture with a custom multi-head classifier.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Preprocessing</title>
        <p>Firstly, we preprocess the dataset. Although modern neural network models do not require text
preprocessing [6], we found that the texts in the dataset varied. This could lead to overfitting in some
dataset sources. To prevent this and create more consistent samples, we implemented a preprocessing
pipeline and applied it to each text sample. This consists of the following steps:
1. Newline Removal: All newline characters in the text are replaced with spaces to create a
continuous string. This step prevents the model from interpreting newlines as token boundaries,
which could disrupt the contextual understanding of sentences spanning multiple lines.
2. Whitespace Normalization: Multiple consecutive whitespace characters (e.g., spaces, tabs) are
replaced with a single space.</p>
        <p>3. Text Stripping: Leading and trailing whitespace is removed.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Fine-Tuning Multi-Head Classification Model</title>
        <p>The next step in our approach involves fine-tuning a classifier. We conducted a series of experiments
and found that models struggle to distinguish between certain classes. Based on the confusion matrix in
Figure 2 for the RoBERTa baseline, we can see that the true labels Machine-written, then humanized are
often predicted as Fully human-written, Human-initiated, then machine-continued and Human-written,
then machine-polished. This suggests that it is dificult for the classifier to distinguish between these
classes.</p>
        <p>To tackle this issue, we propose that in addition to training the classifier on the task of predicting
the main classes, we train the classifier to distinguish similar classes using additional heads that solve
binary classification tasks. The essence of the approach is to predict, for similar classes, whether the
text belongs to this class, or whether it belongs to any other class. The intuition of this approach is
that the signals obtained from the binary classification heads will allow better delineation of examples
with similar classes and, as a consequence, this may lead to an improvement in the final quality of the
classifier.</p>
        <p>As shown in Figure 1, the classifier is designed to predict multiple related labels using several heads
that are trained in parallel:
• Main head: A multiclass classification head predicting one of the six categories: fully
humanwritten, human-initiated, then machine-continued; human-written, then machine-polished;
machine-written, then humanized; machine-written, then human-edited; and deeply mixed.
• Auxiliary binary heads: Five binary classification heads to detect specific subcategories
(humanwritten, mixed, polished, continued, and humanized text), enhancing the model’s ability to capture
nuanced patterns. The introduction of binary heads helped decompose the complex task of
distinguishing subtle patterns from the data into series of simpler ones.</p>
        <p>Each classification head comprises a linear layer applied to the transformer’s pooled output. Using
single linear layers keeps the model’s complexity in check, maintaining similar training times as without
extra classification heads. A dropout rate of 0.1 is applied in classification heads to mitigate overfitting.
The main head’s loss is computed using weighted cross-entropy to address class imbalance, defined as:
Lossmain = −
 
∑︁ ∑︁  · , · log(ˆ,),
=1 =1
where  is the number of samples,  = 6 is the number of classes,  is the weight for class 
(inversely proportional to class frequency), , is the true label indicator, and ˆ, is the predicted
probability for class .</p>
        <p>To obtain the loss value  for each auxiliary classification head, we sum all the losses for the
auxiliary heads:</p>
        <p>Lossaux = Lossfully human + Lossmixed + Losspolished + Losscontinued + Losshumanized</p>
        <p>The final loss combines losses from all heads, weighted to prioritize the main multi-class head’s
prediction:</p>
        <p>Loss = 0.6 · Lossmain + 0.4 · Lossaux</p>
        <p>During the evaluation phase in training and inference, the model generates logits for each classification
head. To improve prediction accuracy, we implement a confidence-based override mechanism. For each
sample, we compute softmax probabilities for all heads and apply class-specific confidence thresholds
presented in Table 2.</p>
        <p>The thresholds were assigned respectively to the assessed quality (F1) of each head. If a head’s
maximum probability exceeds its threshold and is the highest among all heads, the corresponding class
is selected, overriding the main head’s prediction by setting other logits to a large negative value (-1e9).
This ensures that high-confidence predictions from specialized heads guide the final classification. The
ifnal prediction is then determined by the argmax of the modified logits.</p>
        <p>Initially, we conducted experiments with the RoBERTa-base model1. The aim of these experiments
was to demonstrate that our approach can enhance the baseline and, consequently, be transferred
to stronger model architectures. After this, we fine-tuned the Gemma-2 2B model 2. This model was
chosen because of its size and its proven performance in classification tasks related to the detection of
AI-generated content, as demonstrated in several studies [7, 8, 9].</p>
        <p>
          All models were fine-tuned over 10 epochs. To prevent overfitting, we selected the best checkpoint
according to the weighted F1-score across all classification heads on the development set. Such choice
of key metric was made because it prioritizes performance on more frequent classes (e.g., fully
humanwritten), which are likely more common in real-world scenarios, while still evaluating performance on
1https://huggingface.co/FacebookAI/roberta-base
2https://huggingface.co/google/gemma-2-2b
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
rare classes. This ensures that the metric reflects practical utility. Hyperparameters are shown in the
appendix A.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The evaluation results on the development set are presented in Table 3. For the development set,
we used Macro Recall, F1 Macro, F1 Micro, and Accuracy as these are used in the shared task. As
can be seen in the table, adding additional heads increased all metrics for both RoBERTa-base and
Gemma-2 2B. Specifically, the main metric for the shared task, Macro Recall, increased from 74% to 80%
for RoBERTa-base, and from 76% to 83% for Gemma-2 2B.</p>
      <p>Table 4 demonstrates our approach performance compared to the other participants in the shared
task. As can be seen from the table, our team achieved 7th place, significantly improving on the baseline
of 46.32% Macro Recall to reach 56.11%.</p>
      <p>To compare the predictions obtained by a baseline model and a fine-tuned model, we created confusion
matrices for the baseline and fine-tuned RoBERTa-base models. The confusion matrices were obtained
by predicting the samples in the development set. Figure 2 presents these matrices. As can be seen from
the figure, significant improvements were made to machine-written, then humanized, machine-written,
then human-edited and deeply-mixed text labels. However, our approach failed to distinguish between
human-initiated, then machine-continued and human-written, then machine-polished.</p>
      <p>For further exploration of the quality of class diferentiation, we obtained the final hidden states from
the fine-tuned multi-head RoBERTa-base model for each data sample in the training and development
sets. We then used the t-SNE algorithm [10] to visualize the embeddings, which are presented in
Figure 3.</p>
      <p>The figure shows that the classifier accurately distinguishes between embeddings related to diferent
classes in the training set. However, for the development set, there are many noisy points located close
to embeddings related to diferent classes. This means that the classifier has overfitted to the training
set and is unable to generalize to unseen samples.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>In conclusion, our approach demonstrates the enhancement of text classification through the human-AI
collaboration classification task. Using multiple heads on a pre-trained model and then fine-tuning the
architecture significantly improves classification performance. Our key contribution lies in decomposing
the complex classification problem into auxiliary binary tasks, thereby improving generalization and
achieving significantly better results than the provided baselines in the test and development sets. On
the test set leaderboard, we achieved a Macro Recall of 56.11% and came 7th out of 21 participants.</p>
      <p>A possible direction for future research could be to add contrastive training to our approach. The
detection of generated or collaborative texts could be defined as an authorship detection task, as has been
demonstrated in other studies [11, 12]. Some texts were generated by specific models, and considering
these models as authors, it may be possible to train a classifier contrastively to distinguish between
models that produced a text. Such signals could be important for the classification model as they
highlight texts produced by particular models.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Deepl in order to: Paraphrase and reword.
After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.
E. Collins, J. Barral, Z. Ghahramani, R. Hadsell, D. Sculley, J. Banks, A. Dragan, S. Petrov, O. Vinyals,
J. Dean, D. Hassabis, K. Kavukcuoglu, C. Farabet, E. Buchatskaya, S. Borgeaud, N. Fiedel, A. Joulin,
K. Kenealy, R. Dadashi, A. Andreev, Gemma 2: Improving open language models at a practical
size, 2024. URL: https://arxiv.org/abs/2408.00118. arXiv:2408.00118.
[6] M. Siino, I. Tinnirello, M. La Cascia, Is text preprocessing still worth the time? a comparative survey
on the influence of popular preprocessing methods on transformers and traditional classifiers,
Information Systems 121 (2024) 102342. URL: https://www.sciencedirect.com/science/article/pii/
S0306437923001783. doi:https://doi.org/10.1016/j.is.2023.102342.
[7] G. Mehak, A. Qasim, A. G. M. Meque, N. Hussain, G. Sidorov, A. Gelbukh, TechExperts(IPN)
at GenAI detection task 1: Detecting AI-generated text in English and multilingual contexts,
in: F. Alam, P. Nakov, N. Habash, I. Gurevych, S. Chowdhury, A. Shelmanov, Y. Wang, E.
Artemova, M. Kutlu, G. Mikros (Eds.), Proceedings of the 1stWorkshop on GenAI Content Detection
(GenAIDetect), International Conference on Computational Linguistics, Abu Dhabi, UAE, 2025, pp.
161–165. URL: https://aclanthology.org/2025.genaidetect-1.14/.
[8] N. H. Doan, K. Inui, Grape at GenAI detection task 1: Leveraging compact models and linguistic
features for robust machine-generated text detection, in: F. Alam, P. Nakov, N. Habash, I. Gurevych,
S. Chowdhury, A. Shelmanov, Y. Wang, E. Artemova, M. Kutlu, G. Mikros (Eds.), Proceedings
of the 1stWorkshop on GenAI Content Detection (GenAIDetect), International Conference on
Computational Linguistics, Abu Dhabi, UAE, 2025, pp. 209–217. URL: https://aclanthology.org/
2025.genaidetect-1.22/.
[9] K. Kuznetsov, L. Kushnareva, P. Druzhinina, A. Razzhigaev, A. Voznyuk, I. Piontkovskaya, E.
Burnaev, S. Barannikov, Feature-level insights into artificial text detection with sparse autoencoders,
2025. URL: https://arxiv.org/abs/2503.03601. arXiv:2503.03601.
[10] L. van der Maaten, G. E. Hinton, Visualizing high-dimensional data using t-sne, Journal of Machine</p>
      <p>Learning Research 9 (2008) 2579–2605.
[11] S. Liu, X. Liu, Y. Wang, Z. Cheng, C. Li, Z. Zhang, Y. Lan, C. Shen, Does DetectGPT fully utilize
perturbation? bridging selective perturbation to fine-tuned contrastive learning detector would be
better, in: L.-W. Ku, A. Martins, V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational
Linguistics, Bangkok, Thailand, 2024, pp. 1874–1889. URL: https://aclanthology.org/2024.acl-long.
103/. doi:10.18653/v1/2024.acl-long.103.
[12] X. Guo, Y. He, S. Zhang, T. Zhang, W. Feng, H. Huang, C. Ma, Detective: Detecting AI-generated
text via multi-level contrastive learning, in: The Thirty-eighth Annual Conference on Neural
Information Processing Systems, 2024. URL: https://openreview.net/forum?id=cdTTTJfJe3.</p>
      <p>RoBERTa-base</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsivgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Elozeiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “VoightKampf” Generative AI Authorship Verification Task at PAN</article-title>
          and
          <article-title>ELOQUENT 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , L. u. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          , in: I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2017</year>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1907</year>
          . 11692. arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>