1. Introduction

Generative AI Authorship Verification based on Contrastive Learning and Domain Adaptation

Kaicheng Huang

Haoliang Qi

Kai Yan

0 0 Foshan University , Foshan, Guangdong , China

2024

Generative AI Authorship Verification is a task that is given two pieces of text, one is a human text, and the other is a text generated by AI, and determines which of them is a human text. In this paper, we accomplish this task(Generative AI Authorship Verification) by contrastive learning, domain adaptation, and pre-trained language models. Compared with traditional machine learning methods, we use self-supervised contrastive learning and unsupervised domain adaptation methods to efectively utilize labeled source domain and unlabeled target domain data, obtain features in human texts and AI texts, and use these Features to classify the text. It can be seen from our experiments that on the validation set we constructed by ourselves, the average score of our model in ROC-AUC, Brier, F1, c@1, and F0.5U reached 0.994, and the average score in the PAN test data was the score reached 0.480.

eol>PAN 2024 Generative AI Authorship Verification Contrastive Learning Domain Adaptation Pre-trained Model

1. Introduction 2. Background

The birth of generative text detection is mainly due to the increasing capabilities of LLMs. Without specific training or guidance, the authenticity of the content generated by these LLMs is uncontrollable, such as generating false news, patents, etc. Content harmful to society may be misleading or politically incorrect to readers without good subjective judgment. To verify whether the text is human-written or generated by AI, combined with Generative AI Authorship Verification @ PAN, we will explore this work through diferent features generated by human writing and AI.

Since AI-generated texts are dificult to distinguish by manual comparison alone, many methods have been proposed in recent years for AI detection work in order to detect whether text is generated by AI. For example, from simple feature-based classifiers to fine-tuned language model detectors to distinguish whether the input text is written by humans or generated by AI [5], including detection methods specifically for AI-generated news [ 6]; a related research direction is author attribution (AA). Although early AA methods focused on human authors, recent studies have built models to identify specific input text generators [ 7]. There is also the framework ConDA designed for AI text detection based on self-supervised contrastive learning and unsupervised language adaptation used in this article.

Contrastive learning [8] focuses on comparing similar and dissimilar samples to learn data representations, creating high-quality features without explicit labels. This enhances generalization, which is crucial for transfer learning across various tasks and data distributions.

Domain adaptation [9] helps machine learning models generalize from a source domain to a diferent target domain, addressing performance degradation due to distribution diferences. By fitting the model to the target domain’s data distribution, domain adaptation improves the model’s performance on new data.

Self-supervised contrastive learning leverages the inherent structural information of data to learn highquality feature representations without labels. When combined with unsupervised domain adaptation, this approach enhances the model’s ability to accurately detect AI-generated text across various generators without relying on extensive labeled data. This combination allows the model to learn robustly in both source and target domains, acquiring universal features that improve performance in the target domain.

3. Model Framework

In this paper, we adopt the contrastive learning domain adaptation framework and use the RoBERTa model to obtain text features. The framework consists of a Source Domain(S) and a Target Domain(T). During training, we input two texts into S and T respectively. The text input into the S is labeled, whereas the text input into the T is unlabeled. For the S part, this part mainly inputs the labeled data set. The data set can be expressed as = {, }, represents the input article, and is the label of , marking whether was written manually or by AI-generated. For the T part, this part mainly inputs unlabeled data sets. The data set is represented as = { }. The loss of T tags is mainly for S to predict in T through model adaptation. The input articles from the source () and target ( ) domains undergo a text transformation to generate transformed samples and . This transformation helps in aligning the representations of source and target domain texts.

To enable the original samples and the converted samples to be input at the same time and share the weights of RoBERTa, we use a Siamese network. Through RoBERTa, use the [CLS] tag to obtain the hidden layers of the input text: ℎ[CLS] and ℎ

[CLS], and pass these embeddings into a projection layer composed of MLP and hidden layers, and calculate the contrastive loss in their low-dimensional projection space. The contrastive loss for the source is denoted by:

ℒ = − ∑︁ log (,)∈ exp ︂( sim(,) ︂)

∑︀2|| 1[̸=] exp =1 ︂( sim(,) ︂)

( 1 ) where and denote the projection layer embeddings for the original and the transformed text, is the temperature, is the current mini-batch, (· , · ) is a similarity metric which is cosine similarity in our case. In the target, the contrastive loss is represented by , and the equation is the same as ( 1 ).

For both the original text and the transformed text in the source domain, the CE loss is computed to train the model to classify the text instances as either human-written or AI-generated correctly. The transformation performed on the original text preserves the semantics of the text and hence is labelpreserving. In this case, we hope the classifier can also detect text with such small, semantic-preserving perturbations. This not only improves the robustness of the classifier but also increases the versatility of the detector. This binary classification task helps in learning to diferentiate between the two types of text. The CE loss for the source is denoted by:

ℒ = − =1 1 ∑︁ [︁ log (︁ | ℎ[] + (1 − ) log 1 ︁) ︁(

︁( − | ℎ[] ︁)]︁ transformed text is represented by ′, and the equation is the same as ( 2 ). where denotes the CE loss for the original text, denotes the batch size. The CE loss of the

MMD [10] is utilized to align the distributions of text embeddings between the source domain (labeled data) and the target domain (unlabeled data). By minimizing the MMD, the model aims to reduce the distributional dissonance between the two domains, ensuring that the learned representations are domain-invariant. This encourages the model to learn domain-invariant features that efectively detect AI-generated text across diferent generators. The MMD is denoted by: ( 2 ) ( 3 ) ( 4 ) (, ) = || 1 ∑︁ (︀ )︀ − 1 ∑︁ (︀ )︀ ||ℋ =1 ℒ =

2 (1 − 1) [︁ + ′]︁ +

1 [︀ + ︀] + 2 (, ) where = {1, 2, 3. . . , } and = {1 , 2 , 3 . . . , } are two samples drawn from the distributions and . : ↦→

ℋ and H represents the RKHS space [11]. The RKHS mapping helps in aligning the feature representations of the source and target domains in a higher-dimensional space. By mapping the samples to a common RKHS, the MMD calculation aims to minimize the distributional dissonance between the domains and learn domain-invariant representations that are efective for domain adaptation.

The final training objective for our main framework is: =1

2 where 1 and 2 are hyper-parameters. in the model during inference by passing the extracted features to the classification head

4. Experiment and Result 4.1. Experiment Setting

This paper chooses RoBERTa-base as an encoder with 12-layer, 768-hidden, 12-heads, and 110M parameters. The vocab size is 50,265. The maximum length of the encoder is set to 512. We used Adam optimizer with the learning rate set to 2e-5. Our experiment was conducted on an A800 server. The best performance is achieved through 10 epoch models.

4.2. Dataset 4.2.1. PAN Dataset

PAN@CLEF 2024 generative AI author verification task provides a bootstrap dataset of real and fake news articles containing multiple 2021 US news headlines. The test set which includes contributions from ELOQUENT participants, encompasses a variety of text types such as news articles, Wikipedia intro texts, and fanfiction.The bootstrap dataset contains human text and text generated by multiple large language models. There are 1,087 news topics, and each topic has human text and text generated by various Ais, such as Alpaca, ChatGPT, LLaMA, etc.

4.2.2. External Dataset

For comparative learning, we also added an external dataset, TT-Grover-mega [12]. This dataset is made for fake news detection. It contains text and labels. The labels are divided into human and grover_mega. Human corresponding to human text, grover_mega is text generated by the Grover generation model. This dataset has a strong correlation with the AI author recognition task. The types and number of labels of the training set, validation set, and test set of this dataset are shown in Table 1.

4.3. Data Preprocessing

To allow the dataset to be used smoothly in experiments, we split the bootstrapping data set into a training set, a verification set, and a test set. According to the human category and AI category, we split these two types of datasets into training sets, verification sets, and test sets in a ratio of 9:1:2. For the TT-Grover-mega Dataset, we retain its original quantity and modify its data format to the same JSONL format as the bootstrap dataset.

4.4. Data Augmentation

To expand the dataset and improve the model’s generalization ability, we performed data enhancement on the bootstrap dataset and TT-Grover-mega Dataset. Through sentence segmentation, we traversed each article in the data set and extracted 10% of the words in the original article. Replace them with their synonyms, and then recombine the enhanced sentences into articles. The enhanced data is combined with the original data and their labels to generate an improved dataset.

4.5. Evaluation

When evaluating the model, we separate two sentences from the test set and make predictions. If the model can accurately identify the types of both texts, we output a label in the range of ( 0, 1 ) as required. If the two texts are recognized as the same category, indicating that the model is confused, we assign a label of 0.5. To evaluate the performance of our model, we used the evaluation platform provided by PAN, which includes the following metrics: • ROC-AUC: the conventional area under the curve score. • c@1: rewards systems that leave complicated problems unanswered [13]. • F_0.5u: focus on deciding same-author cases correctly [14]. • F1-score: harmonic way of combining the precision and recall of the model [15]. • Brier: Brier Score evaluates the accuracy of probabilistic predictions [16].

4.6. Results

Table 2 shows our experimental results.We conducted two experiments. Initially, we used TT-Grovermega as the source and the bootstrap data set as the target. However, this approach yielded suboptimal results. Subsequently, we reversed the positions of the two samples. We hypothesize that this phenomenon can be attributed to the minimal distributional disparity between the experimental test set and the provided training set, thereby hindering our model’s domain adaptation capabilities. We observed a significant improvement in our results after switching the data sets.

The first row is the bootstrap dataset as Source and the TT-Grover-mega Dataset as Target. The second row is the results when the TT-Grover-mega Dataset is used as the Source and the bootstrap dataset is used as the Target. We found that using the bootstrap dataset as the source and the TTgrover-mega dataset as the target led to a significant improvement compared to swapping the two datasets.

Table 3 demonstrates the performance of our model(PANSource, TTTarget) evaluated on the TIRA [17] environment for PAN@CLEF 2024.

Table 4 demonstrates the final results obtained by our model in PAN@CLEF 2024

5. Conclusion

We successfully completed the PAN@CLEF2024 generative AI authorship verification task in this benchmark task through self-supervised contrastive learning and unsupervised domain adaptation. Our results surpassed four baselines and achieved a mean score of 0.902, efectively distinguishing most human-written texts from AI-generated texts. We found that if there are more text source generators (referred to as LLMs), our method can capture the characteristics of the text more accurately and can judge more texts of unknown origin, thereby determining whether the text is manually written or generated by AI. We will take multi-language into consideration to achieve higher adaptability in the future.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No.62276064). the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, pp. 1811–1822. [8] X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, J. Tang, Self-supervised learning: Generative or contrastive, IEEE transactions on knowledge and data engineering 35 (2021) 857–876. [9] M. Long, H. Zhu, J. Wang, M. I. Jordan, Unsupervised domain adaptation with residual transfer networks, Advances in neural information processing systems 29 (2016). [10] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, A. Smola, A kernel two-sample test, The

Journal of Machine Learning Research 13 (2012) 723–773. [11] I. Steinwart, On the influence of the kernel on the consistency of support vector machines, Journal of machine learning research 2 (2001) 67–93. [12] R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, Y. Choi, Defending against neural fake news, Advances in neural information processing systems 32 (2019). [13] A. Peñas, A. Rodrigo, A simple measure to assess non-response (2011). [14] J. Burstein, C. Doran, T. Solorio, Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers), in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019. [15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine Learning research 12 (2011) 2825–2830. [16] G. W. Brier, Verification of forecasts expressed in terms of probability, Monthly weather review 78 (1950) 1–3. [17] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast, Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot, F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/ 978-3-031-28241-6_20.

[1]

A. A.

Ayele ,

Babakov ,

Bevendorf ,

X. B.

Casals ,

Chulvi ,

Dementieva ,

Elnagar ,

Freitag ,

Fröbe ,

Korenčić ,

Mayerl ,

Moskovskiy ,

Mukherjee ,

Panchenko ,

Potthast ,

Rangel ,

Rizwan ,

Rosso ,

Schneider ,

Smirnova ,

Stamatatos ,

Stakovskii ,

Stein ,

Taulé ,

Ustalov ,

Wang ,

Wiegmann ,

S. M.

Yimam , E. Zangerle, Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification , in: L. Goeuriot , P.

Mulhem , G.

Quénot , D.

Schwab , L.

Soulier , G. M. D. Nunzio , P. Galuščáková , A. G. S. de Herrera , G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024 ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024 .

[2]

Bevendorf ,

Wiegmann ,

Karlgren ,

Dürlich ,

Gogoulou ,

Talman , E. Stamatatos,

Potthast ,

Stein , Overview of the “Voight-Kampf” Generative AI Authorship Verification Task at PAN and ELOQUENT 2024 , in: G. Faggioli,

Ferro ,

Galuščáková , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org , 2024 .

[3]

Bhattacharjee ,

Kumarage ,

Morafah , H. Liu, Conda: Contrastive domain adaptation for ai-generated text detection , in: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Association for Computational Linguistics , Nusa Dua, Bali, 2023 , pp. 598 - 610 . URL: https://aclanthology.org/ 2023 . ijcnlp-long . 40 .

[4]

Liu ,

Ott ,

Goyal ,

Du ,

Joshi ,

Chen ,

Levy ,

Lewis ,

Zettlemoyer ,

Stoyanov , Roberta: A robustly optimized bert pretraining approach , arXiv preprint arXiv: 1907 . 11692 ( 2019 ).

[5]

Ippolito ,

Duckworth ,

Callison-Burch ,

Eck , Automatic detection of generated text is easiest when humans are fooled , arXiv preprint arXiv: 1911 . 00650 ( 2019 ).

[6]

Kumarage ,

Garland ,

Bhattacharjee ,

Trapeznikov ,

Ruston , H. Liu, Stylometric detection of ai-generated text in twitter timelines , arXiv preprint arXiv:2303.03697 ( 2023 ).

[7]

Munir ,

Batool ,

Shafiq ,

Srinivasan ,

Zafar , Through the looking glass: Learning to attribute synthetic text generated by language models , in: Proceedings of the 16th Conference of