=Paper= {{Paper |id=Vol-3740/paper-264 |storemode=property |title=Meta-Contrastive Learning for Generative AI Authorship Verification |pdfUrl=https://ceur-ws.org/Vol-3740/paper-264.pdf |volume=Vol-3740 |authors=Jiajun Lv,Yong Han,Leilei Kong |dblpUrl=https://dblp.org/rec/conf/clef/LvHK24 }} ==Meta-Contrastive Learning for Generative AI Authorship Verification== https://ceur-ws.org/Vol-3740/paper-264.pdf
                         Meta-Contrastive Learning for Generative AI Authorship
                         Verification
                         Notebook for the PAN Lab at CLEF 2024

                         Jiajun Lv, Yong Han and Leilei Kong
                         Foshan University, Foshan, China


                                      Abstract
                                      This paper proposes a method that combines meta-learning and contrastive learning to address the task of
                                      Generative AI Authorship Verification. Our motivation is to leverage supervised contrastive learning to enhance
                                      the model’s discriminative ability by optimizing the relationships between samples. Additionally, we employ the
                                      meta-learning algorithm Reptile to improve the generalization ability on out-of-domain data. Finally, we select
                                      the model weights that achieve the best performance on the validation set. We obtained an average score of 0.949
                                      on the test set.

                                      Keywords
                                      Authorship Verification, Contrastive Learning, Meta-learning




                         1. Introduction
                         With the widespread application of generative AI and large language models (LLMs), complex issues
                         have emerged, such as the spread of misinformation[1], facilitating plagiarism[2], particularly in
                         academic writing using LLMs[1]. This creates an urgent need to develop detectors capable of identifying
                         LLM-generated text. Since LLMs are trained on extensive datasets of text and code, they can produce
                         content that closely resembles human-written text[3]. As a result, distinguishing between human and
                         machine-written text has become increasingly challenging. In this study, we propose a method that
                         combines contrastive learning and the Reptile meta-learning algorithm to address the PAN: Voight-
                         Kampff Generative AI Authorship Verification task in CLEF 2024[4]. This task requires identifying the
                         human-written text from two given texts.
                           In this research, we propose a combination of comparative learning and Reptile[5] meta-learning
                         based approach to address the CLEF 2024 task PAN:Voight-Kampff Generative AI Authorship Verification
                         which requires identifying human-written texts in a given two texts[6]


                         2. Related work
                         Since 2011, the PAN organization has been continuously organizing authorship verification tasks[7].
                         Unlike previous focuses on cross-discourse type authorship verification, PAN 2024 Authorship
                         Verification[4] aims to address whether generative AI authorship verification can be solved[8]. The
                         task requires participants to design classification methods to distinguish between human and machine-
                         written texts.
                            In recent work on generative AI detectors, fine-tuning language models and zero-shot learning
                         methods are predominant [3]. Zero-shot detectors do not require additional training through supervised
                         signals. Major methods include perplexity (PPL) [9], probability curvature [10], and likelihood ratio
                         ranking (LRR) [11]. Currently, supervised fine-tuning of pre-trained language models is very powerful
                         in natural language understanding [12]. Recent works [3][12][13] further confirm that fine-tuning
                         with pre-trained language models from the BERT family can outperform zero-shot methods in-domain.

                          CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                          $ lvjiajun.96@gmail.com (J. Lv); hanyong2005@fosu.edu.cn (Y. Han); kongleilei@fosu.edu.cn (L. Kong)
                           0000-0002-8755-5310 (J. Lv); 0000-0002-9416-2398 (Y. Han); 0000-0002-4636-3507 (L. Kong)
                                   © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
To further improve the detection capability of unknown models, contrastive learning has also been
applied to LLMs text checking. ConDA [13] proposed a contrastive domain adaptation framework
that combines domain adaptation with contrastive learning representations, enhancing the detector’s
performance on out-of-domain data. Reviewing last year’s authorship verification task, the first-place
team Ibrahim, M.et al[14] and the second-place team Guo, M. et al[15]. both adopted feature encoding
and contrastive learning concepts. From these methods, it is evident that contrastive learning might be
key to the authorship verification task.
   Inspired by [13][16][17], we propose a method that combines contrastive learning and Reptile meta-
learning[18]. Contrastive learning, by learning the relative distances between samples, avoids mapping
texts to a single label. Unlike conventional fine-tuning methods, we use Reptile meta-learning to help
the model learn better feature representations, enhancing its generalization ability.


3. Method
The goal of our model is to allow the model to learn the relative distance between samples on the same
topic, with different authors. Feeding 𝑥 text into the model yields a soft label 𝑦 that encodes the text,
the smaller the label value the more likely the text is to be judged as human-authored, and conversely
the more likely it is to be judged as AI-generated text.

3.1. Contrastive Learning
Our method revolves around constructing a training task 𝜏 , where 𝜏𝑛 is represented as a collection of
                                                                        − −
texts on the same topic written by different authors, denoted as {𝑥+
                                                                   0 , 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 }. In this collection,
                                                                                          −
                                                                         −
𝑥0 is the only positive example, representing a human author, while 𝑥1 , . . . , 𝑥−
  +
                                                                                     𝑛 are negative examples,
representing AI-generated authors.
   The text 𝑥𝑖 is input to the encoder, and the [𝐶𝐿𝑆] markers of the output vector of the last layer of
the encoder are taken as the representation 𝐸𝑖 of the text, and we feed the obtained vector 𝐸𝑖 to the
𝑅𝑒𝐿𝑈 activation function and the linear layer to obtain the soft labels 𝑦𝑖 of the input text 𝑥𝑖 .

                                              𝐸𝑖 = 𝑒𝑛𝑐𝑜𝑑𝑒𝑟(𝑥𝑖 )                                               (1)

                                             𝑦ˆ𝑖 = 𝜎(𝐸𝑖 𝑊ℎ𝑇 + 𝑏ℎ )                                            (2)
where 𝐸𝑖 ∈ R𝑏𝑎𝑡𝑐ℎ_𝑠𝑖𝑧𝑒×ℎ ,𝑊ℎ ∈ Rℎ×1 , ℎ is the dimension of the hidden layer of the encoder, and 𝑏𝑛 is
the bias of the fully connected layer. The 𝜎() is the nonlinear activation function 𝑅𝑒𝐿𝑈 . We compute
the MarginRankingLoss loss function between numerical labels:

                                   𝑙𝑜𝑠𝑠 = 𝑚𝑎𝑥(0, 𝑚𝑎𝑟𝑔𝑖𝑛 − (𝑦ˆ+  ˆ−
                                                             𝑖 −𝑦 𝑖 ))                                        (3)

   Where 𝑦ˆ+𝑖 is the soft label for positive examples, 𝑦  ˆ−
                                                           𝑖 is the soft label for negative examples, and
𝑚𝑎𝑟𝑔𝑖𝑛 spacing boundaries, which indicates the minimum gap between two scores, and if the value is
larger, it means that it is expected that 𝑦ˆ+
                                            𝑖 is further away from 𝑦ˆ−
                                                                     𝑖 .


3.2. Reptile Meta-Learning
We use the batch version of the algorithm, define slow weight as 𝜑, first copy 𝜑 model parameters as
fast weight denoted as 𝜃, use fast weight to sample n groups of training tasks on the training set to
train the updated model, get the updated ˆ𝜃, calculate the difference between ˆ𝜃 and the difference of
parameter 𝜑 as the gradient direction of updating 𝜑, and carry out updating 𝜑 to get 𝜑1 by repeated
iterations,During training, we adjust the parameter weights of DeBERTa and the linear classification
layer,reptile training algorithm1
Algorithm 1 Reptile training algorithm
Input: Dataset 𝜏 , margin 𝑚, Model 𝜑, 𝑁 number of AI author categories
                                  ′
Output: Model parameter 𝜑
  1: Initialising model parameters 𝜑
  2: for iteration = 1,2,... 𝑡 do
  3:     copy model parameters 𝜑 to 𝜃
  4:     Sample task 𝜏1 , 𝜏2 , 𝜏3 ....𝜏𝑛 in 𝜏
  5:     for i = 1,2,... 𝑛 do
  6:          𝑦ˆ𝑖 = 𝜃(𝜏𝑖∑︀
                         )
                            𝑗=𝑁
  7:          𝐿𝑛 = 𝑁1 𝑗=1       𝑚𝑎𝑥(0, 𝑚 − (𝑦ˆ+  ˆ−
                                              0 −𝑦 𝑗 ))
                ′
  8:          𝜃 ← 𝐿𝑛 + 𝜃
  9:     end for
           ′              ′
 10:     𝜑 ← 𝜑 − 𝜂(𝜃 − 𝜑)
                                       ′
 11:     Deletion of parameters 𝜃
 12: end for



4. Experiments
4.1. Dataset statistics
We perform sequence length statistics for each author’s data in the training dataset, as shown in Figure1.




                                           Analysing training data box plots
                      human

                   alpaca-7b

                  bloomz-7b1

                  alpaca-13b

                   gemini-pro

                  gpt-3.5-turb

                  gpt-4-turbo
        Length




                   llama-2-7b

                 llama-2-70b

                   mistral-7b

                 mixtral-8x7b

                 qwen1.5-72b

                 vicgalle-gpt2

                   text-bison
                                 0   200   400        600         800          1000   1200
                                                        Author



Figure 1: Dataset statistics Analyzing the length of text sequences on a dataset.
  From the chart, it can be seen that the sequence length of the training dataset is around 500. Among
them, the sequence lengths of the alpaca-7b, chavinlo-alpaca-13b, and bigscience-bloomz-7b datasets
are significantly below the average.

4.2. Experimental setup
In this study, we chose the DeBERTa-base[19] model as our pre-trained base model. We set the
hyperparameters as follows: the batch size is set to 16, the maximum sequence length is set to 512 (with
sequences longer than this being truncated), and the margin is set to 0.5. The initial learning rate is
set to 2e-5, and we train for 3 epochs. We use AdamW for optimization during each training session.
During the training phase, we use the officially provided labeled dataset to train the model. To evaluate
the model’s performance across different domains, we use the HC3 dataset [20] during the validation
phase. The results of our model on our validation set Table1

Table 1
Results of our model on the validation set we used ROC-AUC, Brier, C@1, F1 , F0.5𝑢 and their mean.
                             ROC-AUC Brier C@1              F1   F0.5𝑢 Mean
                             0.998         0.972   0.991   0.974 0.973   0.981



4.3. Result
We selected the model with the best performance in validation, tested it on TIRA [9], and scored all test
tasks separately. The combined results for the test dataset are presented in the following Table3 and
Table 2.

Table 2
Overview of the accuracy in detecting if a text is written by an human in task 4 on PAN 2024 (Voight-Kampff
Generative AI Authorship Verification). We report ROC-AUC, Brier, C@1, F1 , F0.5𝑢 and their mean.
           Approach                            ROC-AUC Brier C@1              F1     F0.5𝑢 Mean
           merciless-broth                         0.98      0.945   0.954   0.932 0.935     0.949
           Baseline Binoculars                     0.972     0.957   0.966   0.964   0.965   0.965
           Baseline Fast-DetectGPT (Mistral)       0.876      0.8    0.886   0.883   0.883   0.866
           Baseline PPMd                           0.795     0.798   0.754   0.753   0.749    0.77
           Baseline Unmasking                      0.697     0.774   0.691   0.658   0.666   0.697
           Baseline Fast-DetectGPT                 0.668     0.776   0.695   0.69    0.691   0.704
           95-th quantile                          0.994     0.987   0.989   0.989   0.989   0.990
           75-th quantile                          0.969     0.925   0.950   0.933   0.939   0.941
           Median                                  0.909     0.890   0.887   0.871   0.867   0.889
           25-th quantile                          0.701     0.768   0.683   0.657   0.670   0.689
           Min                                     0.131     0.265   0.005   0.006   0.007   0.224


   Table 2 shows the results, initially pre-filled with the official baselines provided by the PAN organizers
and summary statistics of all submissions to the task (i.e., the maximum, median, minimum, and 95-th,
75-th, and 25-th percentiles over all submissions to the task).
   Table 3 shows the summarized results averaged (arithmetic mean) over 10 variants of the test
dataset. Each dataset variant applies one potential technique to measure the robustness of authorship
verification approaches, e.g., switching the text encoding, translating the text, switchign the domain,
manual obfuscation by humans, etc. Please focus your description on the discussion of the results on
the main dataset, e.g., Table 2. I.e., Table 3 is only here for your completeness, please discuss only the
details on the main dataset (i.e., Table 2). A detailed description of all dataset variants will be available
in the overview notebook.
Table 3
Overview of the mean accuracy over 9 variants of the test set. We report the minumum, median, the maximum,
the 25-th, and the 75-th quantile, of the mean per the 9 datasets.
    Approach                            Minimum 25-th Quantile Median 75-th Quantile Max
    merciless-broth                        0.601           0.859         0.945         0.978        0.987
    Baseline Binoculars                    0.342           0.818         0.844         0.965        0.996
    Baseline Fast-DetectGPT (Mistral)      0.095           0.793         0.842         0.931        0.958
    Baseline PPMd                          0.270           0.546         0.750         0.770        0.863
    Baseline Unmasking                     0.250           0.662         0.696         0.697        0.762
    Baseline Fast-DetectGPT                0.159           0.579         0.704         0.719        0.982
    95-th quantile                         0.863           0.971         0.978         0.990        1.000
    75-th quantile                         0.758           0.865         0.933         0.959        0.991
    Median                                 0.605           0.645         0.875         0.889        0.936
    25-th quantile                         0.353           0.496         0.658         0.675        0.711
    Min                                    0.015           0.038         0.231         0.244        0.252


5. Conclusions
In this paper, we propose a method combining contrastive learning and meta-learning to address the
task set by PAN: Voight-Kampff Generative AI Authorship Verification. Our proposed method achieved
scores of roc-auc: 0.98, brier: 0.945, c@1: 0.954, F1: 0.93, F0.5u: 0.935, and Mean: 0.949 on the leaderboard.
These results validate the effectiveness of our proposed method in the task of Generative AI Authorship
Verification.


Acknowledgments
This research was supported by the Natural Science Platforms and Projects of Guangdong Province
Ordinary Universities (KeyField Special Projects) (No. 2023ZDZX1023)


References
 [1] A. Extance, Chatgpt has entered the classroom: how llms could transform education, Nature 623
     (2023) 474–477.
 [2] L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle,
     A. Kasirzadeh, et al., Ethical and social risks of harm from language models, arXiv preprint
     arXiv:2112.04359 (2021).
 [3] J. Wu, S. Yang, R. Zhan, Y. Yuan, D. F. Wong, L. S. Chao, A survey on llm-gernerated text detection:
     Necessity, methods, and future directions, arXiv preprint arXiv:2310.14724 (2023).
 [4] J. Bevendorff, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag, M. Fröbe, D. Ko-
     renčić, M. Mayerl, A. Mukherjee, A. Panchenko, M. Potthast, F. Rangel, P. Rosso, A. Smirnova,
     E. Stamatatos, B. Stein, M. Taulé, D. Ustalov, M. Wiegmann, E. Zangerle, Overview of PAN 2024:
     Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking
     Analysis, and Generative AI Authorship Verification, in: L. Goeuriot, P. Mulhem, G. Quénot,
     D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro
     (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of
     the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in
     Computer Science, Springer, Berlin Heidelberg New York, 2024.
 [5] T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey, Meta-learning in neural networks: A survey,
     IEEE transactions on pattern analysis and machine intelligence 44 (2021) 5149–5169.
 [6] J. Bevendorff, M. Wiegmann, E. Stamatatos, M. Potthast, B. Stein, Overview of the Voight-Kampff
     Generative AI Authorship Verification Task at PAN 2024, in: G. F. N. Ferro, P. Galuščáková, A. G. S.
     de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum,
     CEUR-WS.org, 2024.
 [7] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
     Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
     F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
     in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
     in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
     978-3-031-28241-6_20.
 [8] A. A. Ayele, N. Babakov, J. Bevendorff, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag,
     M. Fröbe, D. Korenčić, M. Mayerl, D. Moskovskiy, A. Mukherjee, A. Panchenko, M. Potthast,
     F. Rangel, N. Rizwan, P. Rosso, F. Schneider, A. Smirnova, E. Stamatatos, E. Stakovskii, B. Stein,
     M. Taulé, D. Ustalov, X. Wang, M. Wiegmann, S. M. Yimam, E. Zangerle, Overview of PAN 2024:
     Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking
     Analysis, and Generative AI Authorship Verification, in: L. Goeuriot, P. Mulhem, G. Quénot,
     D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro
     (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of
     the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in
     Computer Science, Springer, Berlin Heidelberg New York, 2024.
 [9] Y. Arase, M. Zhou, Machine translation detection from monolingual web-text, in: Proceedings
     of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
     Papers), 2013, pp. 1597–1607.
[10] E. Mitchell, Y. Lee, A. Khazatsky, C. D. Manning, C. Finn, Detectgpt: Zero-shot machine-generated
     text detection using probability curvature, in: International Conference on Machine Learning,
     PMLR, 2023, pp. 24950–24962.
[11] J. Su, T. Y. Zhuo, D. Wang, P. Nakov, Detectllm: Leveraging log rank information for zero-shot
     detection of machine-generated text, arXiv preprint arXiv:2306.05540 (2023).
[12] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., Language models are
     unsupervised multitask learners, OpenAI blog 1 (2019) 9.
[13] A. Bhattacharjee, T. Kumarage, R. Moraffah, H. Liu, Conda: Contrastive domain adaptation for
     ai-generated text detection, arXiv preprint arXiv:2309.03992 (2023).
[14] M. Ibrahim, A. Akram, M. Radwan, R. Ayman, M. Abd-El-Hameed, N. El-Makky, M. Torki, En-
     hancing Authorship Verification using Sentence-Transformers, in: M. Aliannejadi, G. Faggioli,
     N. Ferro, M. Vlachos (Eds.), Working Notes of CLEF 2023 - Conference and Labs of the Evaluation
     Forum, CEUR-WS.org, 2023, pp. 2640–2651. URL: https://ceur-ws.org/Vol-3497/paper-216.pdf.
[15] M. Guo, Z. Han, H. Chen, H. Qi, A contrastive learning of sample pairs for authorship verification,
     Working Notes of CLEF (2023).
[16] M. Boudiaf, J. Rony, I. M. Ziko, E. Granger, M. Pedersoli, P. Piantanida, I. B. Ayed, A unifying
     mutual information view of metric learning: cross-entropy vs. pairwise losses, in: European
     conference on computer vision, Springer, 2020, pp. 548–564.
[17] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for contrastive learning of
     visual representations, 2020. arXiv:2002.05709.
[18] A. Nichol, J. Achiam, J. Schulman, On first-order meta-learning algorithms, arXiv preprint
     arXiv:1803.02999 (2018).
[19] P. He, X. Liu, J. Gao, W. Chen, Deberta: Decoding-enhanced bert with disentangled attention,
     arXiv preprint arXiv:2006.03654 (2020).
[20] B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, Y. Wu, How close is chatgpt to human
     experts? comparison corpus, evaluation, and detection, arXiv preprint arXiv:2301.07597 (2023).