1. Introduction

Enhancing Human-Machine Authorship Discrimination in Generative AI Verification Task with BERT and Augmented Data

Haojie Cao

caohaojie0322@163.com 0

Zhongyuan Han

Jingyan Ye

Biao Liu

Yong Han

0 0 Foshan University , Foshan , China

2024

Voight-Kampf Generative AI Authorship Verification, a task proposed jointly by PAN and the ELOQUENT Lab [1], aims to diferentiate between human and machine authors by analyzing text features, addressing the escalating text generation abilities of large language models. This paper uses the BERT pre-trained model for binary classification, fine-tuning it to identify distinctive features of human and machine writing. Additionally, we introduced an augmented dataset to enhance the model's recognition capabilities. Through comprehensive evaluation, we achieved a ranking score of 0.778 on all test sets and achieved 11th place.

eol>PAN 2024 Voight-Kampf Generative AI Authorship Verification BERT Text classification

1. Introduction 2. Approach Input Balanced Dataset Augmented Dataset Generative AI Text

alpaca-7b text ...

gpt-3.5 text gemini-pro text

Bert

... ...

Deactivate Neurons in Dropout

Output human machine Balanced Dataset One-thirteenth of the Data Human-written Texts Augmented Dataset The Remaining AI Generated Texts AuTexTification 2023 Human-written Texts

Input layer Hidden layer Output layer

2.1. Building Dataset

The dataset provided by PAN comprises over a thousand human-written texts, alongside texts generated by 13 distinct AI models. Each AI model generated an equal number of texts as the human-written ones. To create a balanced dataset, one-thirteenth of the texts were randomly sampled from each of the 13 sets of generative AI texts. This approach allowed us to construct a dataset where the number of human-written texts is equal to the number of generative AI texts, ensuring a 1:1 ratio between human and machine-generated content. The remaining generative AI texts were not discarded. Instead, they were supplemented with an equal number of human-written texts collected from Hugging Face’s Dataset Card for AuTexTification 2023 [ 4 ]. These texts were combined to form an augmented dataset, also referred to as the augmented dataset. This additional dataset ensures a broader representation of language patterns and helps improve the model’s ability to generalize. Finally, the Balanced Dataset and Augmented Dataset will be randomly divided into training and validation sets, respectively, with the training set accounting for 70% of the data and the validation set accounting for 30%.

2.2. Fine-tuning the BERT Model

During fine-tuning, no parameters of the base BERT model were fixed. All parameters were updated to adapt to the binary classification task. Upon acquiring the BERT model, a fully connected (Dense) layer was added to map the model’s output to the label space of the binary classicfiation task. This Dense layer has two output units.

To prevent overfitting, Dropout was applied to this fully connected layer, randomly setting a portion of the neurons’ outputs to zero during each training iteration. This step helps increase the model’s generalization capability. The Dropout rate was set to 0.1, meaning that 10% of the neurons are dropped out. Additionally, a softmax activation function was used to convert the model’s outputs into class probabilities. To train the model, the loss function, optimizer, and evaluation metrics were configured. Considering that the labels of the task are integer class identifiers, sparse categorical cross-entropy was selected as the loss function. The optimizer chosen was Adam [ 5 ], with a learning rate set to 2e-5. This relatively small learning rate helps stabilize the training process and prevent gradient explosion. The evaluation metric used was accuracy, which measures the model’s performance in the classification task.To prevent catastrophic forgetting, the model was first trained on the augmented dataset for ten epochs. After each epoch, its performance was evaluated on the validation set to select the best-performing model. Subsequently, the model was trained using the balanced dataset.

2.3. Build Classifier

In TIRA [ 6 ], the prediction task involves evaluating pairs of sentences, where the model needs to determine which sentence is written by a human. To address this task, a simple classifier was developed to compare the probabilities of the two sentences being written by a human. The classifier outputs 0 if the probability of the first sentence being human-written is greater, and outputs 1 if the probability of the second sentence being human-written is greater. This straightforward approach enables the model to make binary predictions based on the relative likelihood of human authorship for each sentence pair.

3. Result

Based on the aforementioned methods, we identified the texts in the competition and uploaded our results. Table 1 presents the final scores of the Voight-Kampf Generative AI Authorship Verification 2024 shared task, where the individual validity score is an aggregate across all test datasets, corrected by half a standard deviation to penalize unstable classification performance. Rankings are based on the mean average of all individual scores. Our team secured the 11th position out of 30, achieving a score of 0.778 across all test datasets. Table 2 provides an overview of the accuracy in detecting whether a text is written by a human in Task 4 of PAN 2024 (Voight-Kampf Generative AI Authorship Verification). Our model achieved a mean score of 0.906, surpassing most of the published baselines. In addition to the primary test dataset, the PAN organizers evaluated the "Voight-Kampf" Generative AI Authorship Verification on nine additional variants. Table 3 showcases the overview of the mean accuracy across these nine variants of the test set. Among the nine variant datasets, our model demonstrated its lowest accuracy at 0.361, while achieving a 75th percentile score of 0.959, with the highest accuracy reaching 1. Our model consistently outperformed the baselines across most of these variant datasets, underscoring its robustness and efectiveness in diverse scenarios. Baseline Binoculars (Falcon-7B) Baseline DetectLLM-LRR (Mistral-7B) Baseline Fast-DetectGPT (Mistral-7B) Baseline Text Length

SYSTEM

ROC-AUC Brier C@1

F0.5 Mean canary-paint 4. Conclusion In this study, we addressed the challenge of distinguishing between human and machine-generated texts in the Voight-Kampf Generative AI Authorship Verification 2024 task. By leveraging a BERT-based model and incorporating data augmentation techniques, we enhanced the model’s ability to accurately classify texts. Our approach involved constructing a balanced dataset, fine-tuning the BERT model, and implementing a simple classifier for binary predictions.

The evaluation results demonstrated the feasibility and efectiveness of our approach, achieving a mean score of 0.778 and an overall ranking of 11, which outperformed all baseline models. These ifndings suggest that our approach is a promising solution for human-machine authorship discrimination, contributing to the broader field of AI authorship verification.

Acknowledgements

This work is supported by The Natural Science Foundation of Guangdong Province, China (No.2022A1515011544)

[1]

A. A.

Ayele ,

Babakov ,

Bevendorf ,

X. B.

Casals ,

Chulvi ,

Dementieva ,

Elnagar ,

Freitag ,

Fröbe ,

Korenčić ,

Mayerl ,

Moskovskiy ,

Mukherjee ,

Panchenko ,

Potthast ,

Rangel ,

Rizwan ,

Rosso ,

Schneider ,

Smirnova ,

Stamatatos ,

Stakovskii ,

Stein ,

Taulé ,

Ustalov ,

Wang ,

Wiegmann ,

S. M.

Yimam , E. Zangerle, Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification , in: L. Goeuriot , P.

Mulhem , G.

Quénot , D.

Schwab , L.

Soulier , G. M. D. Nunzio , P. Galuščáková , A. G. S. de Herrera , G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024 ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024 .

[2]

Bevendorf ,

Wiegmann ,

Karlgren ,

Dürlich ,

Gogoulou ,

Talman , E. Stamatatos,

Potthast ,

Stein , Overview of the "Voight-Kampf" Generative AI Authorship Verification Task at PAN and ELOQUENT 2024 , in: G. Faggioli,

Ferro ,

Galuščáková , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org , 2024 .

[3]

Fabien ,

Villatoro-Tello ,

Motlicek ,

Parida , Bertaa: Bert fine-tuning for authorship attribution , in: Proceedings of the 17th International Conference on Natural Language Processing (ICON) , 2020 , pp. 127 - 137 .

[4]

A. M.

Sarvazyan ,

J. Á.

González ,

Franco-Salvador ,

Rangel ,

Chulvi ,

Rosso , Overview of autextification at iberlef 2023: Detection and attribution of machine-generated text in multiple domains , in: Procesamiento del Lenguaje Natural , Jaén, Spain, 2023 .

[5]

D. P.

Kingma ,

Ba , Adam: A method for stochastic optimization , arXiv preprint arXiv:1412.6980 ( 2014 ).

[6]

Fröbe ,

Wiegmann ,

Kolyada ,

Grahm ,

Elstner ,

Loebe ,

Hagen ,

Stein ,

Potthast , Continuous Integration for Reproducible Shared Tasks with TIRA.io , in: J. Kamps , L.

Goeuriot , F.

Crestani , M.

Maistro , H.

Joho , B.

Davis , C.

Gurrin , U.

Kruschwitz , A . Caputo (Eds.), Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023 ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2023 , pp. 236 - 241 . doi: 10 .1007/ 978-3- 031 -28241-6_ 20 .