=Paper=
{{Paper
|id=Vol-3740/paper-244
|storemode=property
|title=Generative AI Authorship Verification Based on Contrastive Learning and Domain Adaptation
|pdfUrl=https://ceur-ws.org/Vol-3740/paper-244.pdf
|volume=Vol-3740
|authors=Kaicheng Huang,Haoliang Qi,Kai Yan
|dblpUrl=https://dblp.org/rec/conf/clef/HuangQY24
}}
==Generative AI Authorship Verification Based on Contrastive Learning and Domain Adaptation==
<pdf width="1500px">https://ceur-ws.org/Vol-3740/paper-244.pdf</pdf>
<pre>
                         Generative AI Authorship Verification based on
                         Contrastive Learning and Domain Adaptation
                         Notebook for the PAN Lab at CLEF 2024

                         Kaicheng Huang, Haoliang Qi* and Kai Yan
                         Foshan University, Foshan, Guangdong, China


                                      Abstract
                                      Generative AI Authorship Verification is a task that is given two pieces of text, one is a human text, and the
                                      other is a text generated by AI, and determines which of them is a human text. In this paper, we accomplish
                                      this task(Generative AI Authorship Verification) by contrastive learning, domain adaptation, and pre-trained
                                      language models. Compared with traditional machine learning methods, we use self-supervised contrastive
                                      learning and unsupervised domain adaptation methods to effectively utilize labeled source domain and unlabeled
                                      target domain data, obtain features in human texts and AI texts, and use these Features to classify the text. It can
                                      be seen from our experiments that on the validation set we constructed by ourselves, the average score of our
                                      model in ROC-AUC, Brier, F1, c@1, and F0.5U reached 0.994, and the average score in the PAN test data was the
                                      score reached 0.480.

                                      Keywords
                                      PAN 2024, Generative AI Authorship Verification, Contrastive Learning, Domain Adaptation, Pre-trained Model


                         1. Introduction
                         As large language models (LLMs) improve and become more widely adopted, distinguishing between
                         human and machine-written text becomes increasingly challenging. Many classification methods exist
                         to help with this differentiation, but the fundamental feasibility of the task is rarely questioned. Drawing
                         on many years of experience in a related but broader field (authorship verification), we set out to answer
                         whether this task can be solved. The goal of the PAN@CLEF 2024 generative AI author verification
                         task [1] [2] is: given two texts, one written by a human and one written by a machine, pick out the
                         human.
                            In recent years, pre-trained language models have gradually matured and can be adapted to more
                         and more tasks, one of which is author identification. With contrastive learning and domain adaptation
                         methods being steadily developed, in order to understand the source of the text, such as ConDA [3],
                         a framework that use self-supervised contrastive learning and unsupervised text detection to label
                         through labeled data sets—unlabeled text to detect whether AI generates text from different sources.
                         In this paper, we will use the ConDA framework and the pre-trained language model RoBERTa [4] to
                         complete the AI text recognition task and discuss its effectiveness.


                         2. Background
                         The birth of generative text detection is mainly due to the increasing capabilities of LLMs. Without
                         specific training or guidance, the authenticity of the content generated by these LLMs is uncontrollable,
                         such as generating false news, patents, etc. Content harmful to society may be misleading or politically
                         incorrect to readers without good subjective judgment. To verify whether the text is human-written or
                         generated by AI, combined with Generative AI Authorship Verification @ PAN, we will explore this
                         work through different features generated by human writing and AI.
                          CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                         *
                           Corresponding author.
                          $ teaslation@gmail.com (K. Huang); haoliang.qi@gmail.com (H. Qi); yankai@fosu.edu.cn (K. Yan)
                           0000-0002-3473-3355 (K. Huang); 0000-0003-1321-5820 (H. Qi); 0000-0002-4960-7108 (K. Yan)
                                   © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   Since AI-generated texts are difficult to distinguish by manual comparison alone, many methods
have been proposed in recent years for AI detection work in order to detect whether text is generated
by AI. For example, from simple feature-based classifiers to fine-tuned language model detectors to
distinguish whether the input text is written by humans or generated by AI [5], including detection
methods specifically for AI-generated news [6]; a related research direction is author attribution (AA).
Although early AA methods focused on human authors, recent studies have built models to identify
specific input text generators [7]. There is also the framework ConDA designed for AI text detection
based on self-supervised contrastive learning and unsupervised language adaptation used in this article.
   Contrastive learning [8] focuses on comparing similar and dissimilar samples to learn data represen-
tations, creating high-quality features without explicit labels. This enhances generalization, which is
crucial for transfer learning across various tasks and data distributions.
   Domain adaptation [9] helps machine learning models generalize from a source domain to a different
target domain, addressing performance degradation due to distribution differences. By fitting the model
to the target domain’s data distribution, domain adaptation improves the model’s performance on new
data.
   Self-supervised contrastive learning leverages the inherent structural information of data to learn high-
quality feature representations without labels. When combined with unsupervised domain adaptation,
this approach enhances the model’s ability to accurately detect AI-generated text across various
generators without relying on extensive labeled data. This combination allows the model to learn
robustly in both source and target domains, acquiring universal features that improve performance in
the target domain.


3. Model Framework
In this paper, we adopt the contrastive learning domain adaptation framework and use the RoBERTa
model to obtain text features. The framework consists of a Source Domain(S) and a Target Domain(T).
During training, we input two texts into S and T respectively. The text input into the S is labeled,
whereas the text input into the T is unlabeled. For the S part, this part mainly inputs the labeled data
set. The data set can be expressed as 𝑆 = {𝑥𝑆𝑖 , 𝑦𝑖𝑆 }, 𝑥𝑆𝑖 represents the input article, and 𝑦𝑖𝑆 is the label
of 𝑥𝑆𝑖 , marking whether 𝑥𝑆𝑖 was written manually or by AI-generated. For the T part, this part mainly
inputs unlabeled data sets. The data set is represented as 𝑇 = {𝑥𝑇𝑖 }. The loss of T tags is mainly for S to
predict in T through model adaptation. The input articles from the source (𝑥𝑆𝑖 ) and target (𝑥𝑇𝑖 ) domains
undergo a text transformation to generate transformed samples 𝑥𝑆𝑗 and 𝑥𝑇𝑗 . This transformation helps
in aligning the representations of source and target domain texts.
   To enable the original samples and the converted samples to be input at the same time and share
the weights of RoBERTa, we use a Siamese network. Through RoBERTa, use the [CLS] tag to obtain
the hidden layers of the input text: ℎ𝑆𝑖[CLS] and ℎ𝑆𝑗[CLS] , and pass these embeddings into a projection
layer composed of MLP and hidden layers, and calculate the contrastive loss in their low-dimensional
projection space. The contrastive loss for the source is denoted by:

                                                                      (︂                    )︂
                                                                           sim(𝑧𝑖𝑆 ,𝑧𝑗𝑆 )
                                        ∑︁                      exp             𝑡
                           ℒ𝑆𝑐𝑡𝑟 = −             log                             (︂                    )︂   (1)
                                                                                      sim(𝑧𝑖𝑆 ,𝑧𝑘𝑆 )
                                                                1[𝑘̸=𝑖] exp
                                                       ∑︀2|𝑏|
                                       (𝑖,𝑗)∈𝑏
                                                         𝑘=1                               𝑡

where 𝑧𝑖𝑆 and 𝑧𝑗𝑆 denote the projection layer embeddings for the original and the transformed text, 𝑡 is
the temperature, 𝑏 is the current mini-batch, 𝑠𝑖𝑚(·, ·) is a similarity metric which is cosine similarity in
our case. In the target, the contrastive loss is represented by 𝐿𝑐 𝑡𝑟𝑇 , and the equation is the same as (1).
   For both the original text and the transformed text in the source domain, the CE loss is computed to
train the model to classify the text instances as either human-written or AI-generated correctly. The
transformation performed on the original text preserves the semantics of the text and hence is label-
preserving. In this case, we hope the classifier can also detect text with such small, semantic-preserving
perturbations. This not only improves the robustness of the classifier but also increases the versatility
of the detector. This binary classification task helps in learning to differentiate between the two types
of text. The CE loss for the source is denoted by:
                             𝑏 [︁
                     1      ∑︁           (︁            )︁              (︁    (︁             )︁)︁]︁
            ℒ𝑆𝐶𝐸 = −             𝑦𝑖 log 𝑝 𝑦𝑖 | ℎ𝑆𝑖[𝐶𝐿𝑆] + (1 − 𝑦𝑖 ) log 1 − 𝑝 𝑦𝑖 | ℎ𝑆𝑖[𝐶𝐿𝑆]                   (2)
                        𝑏
                            𝑖=1

where 𝐿𝑆𝐶𝐸 denotes the CE loss for the original text, 𝑏 denotes the batch size. The CE loss of the
                                        ′
transformed text is represented by 𝐿𝑆𝐶𝐸 , and the equation is the same as (2).
   MMD [10] is utilized to align the distributions of text embeddings between the source domain
(labeled data) and the target domain (unlabeled data). By minimizing the MMD, the model aims to reduce
the distributional dissonance between the two domains, ensuring that the learned representations are
domain-invariant. This encourages the model to learn domain-invariant features that effectively detect
AI-generated text across different generators. The MMD is denoted by:

                                             𝑁       𝑆     𝑁              𝑇
                                           1 ∑︁ (︀ 𝑆 )︀  1 ∑︁ (︀ 𝑆 )︀
                         𝑀 𝑀 𝐷(𝒮, 𝒯 ) = || 𝑆   𝜑 𝑧𝑖 − 𝑇      𝜑 𝑧𝑖 ||ℋ                                         (3)
                                          𝑁             𝑁
                                                   𝑖=1                  𝑖=1

where 𝑆 = {𝑥𝑆1 , 𝑥𝑆2 , 𝑥𝑆3 . . . , 𝑥𝑆𝑁 } and 𝑇 = {𝑥𝑇1 , 𝑥𝑇2 , 𝑥𝑇3 . . . , 𝑥𝑇𝑁 } are two samples drawn from the
distributions 𝒮 and 𝒯 . 𝜑 : 𝒮 ↦→ ℋ and H represents the RKHS space [11]. The RKHS mapping helps in
aligning the feature representations of the source and target domains in a higher-dimensional space. By
mapping the samples to a common RKHS, the MMD calculation aims to minimize the distributional
dissonance between the domains and learn domain-invariant representations that are effective for
domain adaptation.
   The final training objective for our main framework is:

                     (1 − 𝜆1 ) [︁ 𝑆       ′
                                            ]︁ 𝜆 [︀
                                                1
                                 𝐿𝐶𝐸 + 𝐿𝑆𝐶𝐸 +       𝐿𝑆𝑐𝑡𝑟 + 𝐿𝑇𝑐𝑡𝑟 + 𝜆2 𝑀 𝑀 𝐷(𝑆, 𝑇 )                           (4)
                                                                 ]︀
                  ℒ=
                         2                     2
where 𝜆1 and 𝜆2 are hyper-parameters.


Figure 1: Model Framework. The PLM refers to the pre-train language model (In this paper, it means roberta-
base); the PLM and MLP weights are shared across all four instances. The weights of S and T are independent of
each other to adapt to the specific characteristics and distribution of each domain. The final classes are derived
in the model during inference by passing the extracted features to the classification head
4. Experiment and Result
4.1. Experiment Setting
This paper chooses RoBERTa-base as an encoder with 12-layer, 768-hidden, 12-heads, and 110M pa-
rameters. The vocab size is 50,265. The maximum length of the encoder is set to 512. We used Adam
optimizer with the learning rate set to 2e-5. Our experiment was conducted on an A800 server. The
best performance is achieved through 10 epoch models.

4.2. Dataset
4.2.1. PAN Dataset
PAN@CLEF 2024 generative AI author verification task provides a bootstrap dataset of real and fake
news articles containing multiple 2021 US news headlines. The test set which includes contributions
from ELOQUENT participants, encompasses a variety of text types such as news articles, Wikipedia
intro texts, and fanfiction.The bootstrap dataset contains human text and text generated by multiple
large language models. There are 1,087 news topics, and each topic has human text and text generated
by various Ais, such as Alpaca, ChatGPT, LLaMA, etc.


4.2.2. External Dataset
For comparative learning, we also added an external dataset, TT-Grover-mega [12]. This dataset is made
for fake news detection. It contains text and labels. The labels are divided into human and grover_mega.
Human corresponding to human text, grover_mega is text generated by the Grover generation model.
This dataset has a strong correlation with the AI author recognition task. The types and number of
labels of the training set, validation set, and test set of this dataset are shown in Table 1.

Table 1
The types and number of labels of TT-Grover-mega
                     TT-Grover-mega Dataset Human number Grover number
                     Train                               5964                5507
                     Validation                           975                 894
                     Test                                1915                1763


4.3. Data Preprocessing
To allow the dataset to be used smoothly in experiments, we split the bootstrapping data set into a
training set, a verification set, and a test set. According to the human category and AI category, we
split these two types of datasets into training sets, verification sets, and test sets in a ratio of 9:1:2. For
the TT-Grover-mega Dataset, we retain its original quantity and modify its data format to the same
JSONL format as the bootstrap dataset.

4.4. Data Augmentation
To expand the dataset and improve the model’s generalization ability, we performed data enhancement
on the bootstrap dataset and TT-Grover-mega Dataset. Through sentence segmentation, we traversed
each article in the data set and extracted 10% of the words in the original article. Replace them with their
synonyms, and then recombine the enhanced sentences into articles. The enhanced data is combined
with the original data and their labels to generate an improved dataset.
4.5. Evaluation
When evaluating the model, we separate two sentences from the test set and make predictions. If the
model can accurately identify the types of both texts, we output a label in the range of (0, 1) as required.
If the two texts are recognized as the same category, indicating that the model is confused, we assign a
label of 0.5. To evaluate the performance of our model, we used the evaluation platform provided by
PAN, which includes the following metrics:
• ROC-AUC: the conventional area under the curve score.
• c@1: rewards systems that leave complicated problems unanswered [13].
• F_0.5u: focus on deciding same-author cases correctly [14].
• F1-score: harmonic way of combining the precision and recall of the model [15].
• Brier: Brier Score evaluates the accuracy of probabilistic predictions [16].


4.6. Results
Table 2 shows our experimental results.We conducted two experiments. Initially, we used TT-Grover-
mega as the source and the bootstrap data set as the target. However, this approach yielded suboptimal
results. Subsequently, we reversed the positions of the two samples. We hypothesize that this phe-
nomenon can be attributed to the minimal distributional disparity between the experimental test set and
the provided training set, thereby hindering our model’s domain adaptation capabilities. We observed a
significant improvement in our results after switching the data sets.
   The first row is the bootstrap dataset as Source and the TT-Grover-mega Dataset as Target. The
second row is the results when the TT-Grover-mega Dataset is used as the Source and the bootstrap
dataset is used as the Target. We found that using the bootstrap dataset as the source and the TT-
grover-mega dataset as the target led to a significant improvement compared to swapping the two
datasets.
   Table 3 demonstrates the performance of our model(PANSource , TTTarget ) evaluated on the TIRA [17]
environment for PAN@CLEF 2024.
   Table 4 demonstrates the final results obtained by our model in PAN@CLEF 2024

Table 2
The results of the test set
               Approach                         ROC-AUC Brier C@1            F1        F0.5𝑢 Mean
               Robert(TT𝑆𝑜𝑢𝑟𝑐𝑒 ,PAN𝑇 𝑎𝑟𝑔𝑒𝑡 )     0.658     0.778     1   0.359         0.41      0.641
               Robert(PAN𝑆𝑜𝑢𝑟𝑐𝑒 ,TT𝑇 𝑎𝑟𝑔𝑒𝑡 )       1       0.994     1   0.987         0.99      0.994


Table 3
Results on pan24-authorship-verification-test
            Approach                             ROC-AUC Brier C@1                F1     F0.5𝑢 Mean
            current-boutique                       0.951     0.924   0.922    0.844 0.868          0.902
            Baseline Binoculars                    0.972     0.957   0.966    0.964      0.965     0.965
            Baseline Fast-DetectGPT (Mistral)      0.876      0.8    0.886    0.883      0.883     0.866
            Baseline PPMd                          0.795     0.798   0.754    0.753      0.749      0.77
            Baseline Unmasking                     0.697     0.774   0.691    0.658      0.666     0.697
            Baseline Fast-DetectGPT                0.668     0.776   0.695    0.69       0.691     0.704
Table 4
Final results on pan24-authorship-verification
                    Approach           ROC-AUC Brier C@1             F1   F0.5𝑢 Mean
                    current-boutique       0.645    0.798   0.325   0.307 0.323   0.480


5. Conclusion
We successfully completed the PAN@CLEF2024 generative AI authorship verification task in this
benchmark task through self-supervised contrastive learning and unsupervised domain adaptation.
Our results surpassed four baselines and achieved a mean score of 0.902, effectively distinguishing most
human-written texts from AI-generated texts. We found that if there are more text source generators
(referred to as LLMs), our method can capture the characteristics of the text more accurately and can
judge more texts of unknown origin, thereby determining whether the text is manually written or
generated by AI. We will take multi-language into consideration to achieve higher adaptability in the
future.


Acknowledgments
This work is supported by the National Natural Science Foundation of China (No.62276064).


References
 [1] A. A. Ayele, N. Babakov, J. Bevendorff, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag,
     M. Fröbe, D. Korenčić, M. Mayerl, D. Moskovskiy, A. Mukherjee, A. Panchenko, M. Potthast,
     F. Rangel, N. Rizwan, P. Rosso, F. Schneider, A. Smirnova, E. Stamatatos, E. Stakovskii, B. Stein,
     M. Taulé, D. Ustalov, X. Wang, M. Wiegmann, S. M. Yimam, E. Zangerle, Overview of PAN 2024:
     Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking
     Analysis, and Generative AI Authorship Verification, in: L. Goeuriot, P. Mulhem, G. Quénot,
     D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro
     (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of
     the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in
     Computer Science, Springer, Berlin Heidelberg New York, 2024.
 [2] J. Bevendorff, M. Wiegmann, J. Karlgren, L. Dürlich, E. Gogoulou, A. Talman, E. Stamatatos,
     M. Potthast, B. Stein, Overview of the “Voight-Kampff” Generative AI Authorship Verification
     Task at PAN and ELOQUENT 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera
     (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR
     Workshop Proceedings, CEUR-WS.org, 2024.
 [3] A. Bhattacharjee, T. Kumarage, R. Moraffah, H. Liu, Conda: Contrastive domain adaptation for
     ai-generated text detection, in: Proceedings of the 13th International Joint Conference on Natural
     Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for
     Computational Linguistics, Association for Computational Linguistics, Nusa Dua, Bali, 2023, pp.
     598–610. URL: https://aclanthology.org/2023.ijcnlp-long.40.
 [4] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
     Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
 [5] D. Ippolito, D. Duckworth, C. Callison-Burch, D. Eck, Automatic detection of generated text is
     easiest when humans are fooled, arXiv preprint arXiv:1911.00650 (2019).
 [6] T. Kumarage, J. Garland, A. Bhattacharjee, K. Trapeznikov, S. Ruston, H. Liu, Stylometric detection
     of ai-generated text in twitter timelines, arXiv preprint arXiv:2303.03697 (2023).
 [7] S. Munir, B. Batool, Z. Shafiq, P. Srinivasan, F. Zaffar, Through the looking glass: Learning to
     attribute synthetic text generated by language models, in: Proceedings of the 16th Conference of
     the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, pp.
     1811–1822.
 [8] X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, J. Tang, Self-supervised learning: Generative
     or contrastive, IEEE transactions on knowledge and data engineering 35 (2021) 857–876.
 [9] M. Long, H. Zhu, J. Wang, M. I. Jordan, Unsupervised domain adaptation with residual transfer
     networks, Advances in neural information processing systems 29 (2016).
[10] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, A. Smola, A kernel two-sample test, The
     Journal of Machine Learning Research 13 (2012) 723–773.
[11] I. Steinwart, On the influence of the kernel on the consistency of support vector machines, Journal
     of machine learning research 2 (2001) 67–93.
[12] R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, Y. Choi, Defending against
     neural fake news, Advances in neural information processing systems 32 (2019).
[13] A. Peñas, A. Rodrigo, A simple measure to assess non-response (2011).
[14] J. Burstein, C. Doran, T. Solorio, Proceedings of the 2019 conference of the north american chapter
     of the association for computational linguistics: Human language technologies, volume 1 (long
     and short papers), in: Proceedings of the 2019 Conference of the North American Chapter of the
     Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and
     Short Papers), 2019.
[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
     R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine
     Learning research 12 (2011) 2825–2830.
[16] G. W. Brier, Verification of forecasts expressed in terms of probability, Monthly weather review
     78 (1950) 1–3.
[17] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
     Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
     F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
     in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
     in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
     978-3-031-28241-6_20.

</pre>