BertT: A Hybrid Neural Network Model for Generative AI Authorship Verification Notebook for PAN at CLEF 2024

BertT: A Hybrid Neural Network Model for Generative AI Authorship Verification Notebook for PAN at CLEF 2024 ZepengWu WenyinYang cswyyang@fosu.edu.cn MaLi ZikaiZhao Foshan University

Foshan China

September 09-12 2024 Grenoble France

BertT: A Hybrid Neural Network Model for Generative AI Authorship Verification Notebook for PAN at CLEF 2024 1613-0073 51A993EC33D5F34F640D792E5680DB5C GROBID - A machine learning software for extracting information from scholarly documents PAN 2024 Generative AI Authorship Verification BERT Transformer1

With the rapid development and widespread adoption of Large Language Models (LLMs), distinguishing between human-authored and machine-generated texts has become increasingly complex. Although various classification methods have been devised to help identify the origins of texts, they often fail to address the fundamental feasibility and inherent challenges of the task. Building on extensive experience in the field of authorship verification, this study introduces BertT, a novel hybrid model that combines BERT and Transformer technologies, specifically designed for the Generative AI Authorship Verification Task organized in collaboration with PAN and ELOQUENT Labs. This task requires accurately identifying human-authored texts from pairs, one written by a human and the other generated by a machine. Leveraging the deep semantic understanding capabilities of BERT and the efficient sequence processing power of Transformers, our model, BertT, significantly outperforms existing baseline models such as Fast-Detect.

Introduction

Text classification is a cornerstone of Natural Language Processing (NLP), with authorship verification serving as a pivotal application in this domain. This process is crucial for validating the authenticity of documents, detecting plagiarism, and identifying the origins of articles, thereby preserving the integrity of written content across various fields. The Generative AI Authorship Verification Task at PAN@CLEF 2024 [1], which builds upon previous challenges, aims specifically to differentiate between human-authored and machine-generated texts. This task is increasingly pertinent as Large Language Models (LLMs) like GPTs now produce highquality text that closely mimics human writing, thereby presenting substantial challenges in differentiation.

The utility of authorship verification has been demonstrated in various contexts, underscoring its adaptability and critical importance. For example, Halvani et al. explore the use of compression models for authorship verification, highlighting their effectiveness in digital text forensics without relying on complex machine learning algorithms or extensive feature engineering [2]. This approach is well-aligned with our need for efficient and scalable solutions to manage the vast amounts of text generated by LLMs. Similarly, Bevendorff et al. have adapted the unmasking method to short texts, significantly reducing the amount of material required for effective authorship verification, thereby making it applicable to more practical scenarios [3]. Additionally, the challenge of distinguishing between machine-generated and human-authored content is accentuated in the work by Bao et al., who developed Fast-DetectGPT. This model improves the efficiency of detecting machine-generated text through the innovative use of conditional probability curvature, thereby reducing computational costs while maintaining high accuracy [4].

This advancement is particularly relevant to our study as it addresses similar challenges concerning processing efficiency and accuracy. Moreover, the broad application of neural networks in text classification tasks is exemplified by Yang et al. and Yuan et al. in their respective studies on profiling irony and stereotype spreaders on Twitter. These studies employ RNN and CNN models to classify complex social media content, offering insights into the adaptability of neural network architectures for varied NLP tasks [5], [6].

In response to the complexities of authorship verification in the era of LLMs, we developed Bert_T, a novel model that marries the deep semantic understanding capabilities of BERT with the efficient sequential data processing power of Transformer architecture. This model employs a sophisticated contrastive learning approach with an advanced loss function, aimed at enhancing the discrimination between human and machine text. It operates on a dataset formatted in pairs "(text1, text2, label)", training to detect subtle nuances that signify distinct authorship styles. Prior to its effectiveness evaluation, we submitted our model to the TIRA.io platform [7], which provides a stringent and controlled testing environment for fair and transparent benchmarking against established baselines. This preliminary submission was crucial for assessing the model's real-world applicability and refining its performance based on unbiased feedback. The Bert_T demonstrated superior performance across several key metrics, achieving a ROC-AUC of 0.967, a Brier score of 0.903, a C@1 of 0.869, an F1 score of 0.869, and an F0.5u of 0.872, culminating in an overall mean score of 0.896. These results significantly surpassed those of other baseline models such as Fast-DetectGPT (Mistral), PPMd, Unmasking, and Fast-DetectGPT, underscoring the Bert_T's enhanced ability to discern between human and machine-generated texts. This success highlights the efficacy of our approach in tackling the complexities of generative AI authorship verification.

Dataset

The dataset for the Generative AI Authorship Verification Task at PAN@CLEF 2024 plays a crucial role in training and validating the efficacy of our Bert_T model. This year, the dataset comprises a diverse array of text genres, reflecting a mix of both real and synthetically generated content. The primary sources of data include news articles, Wikipedia introduction texts, and pieces of fanfiction, which provide a rich variety in style, structure, and complexity. Additionally, PAN participants receive a bootstrap dataset that includes real and fabricated news articles covering various 2021 U.S. news headlines, designed to simulate scenarios that models might encounter in practical applications.

The data, sourced from contributions by ELOQUENT participants, is meticulously curated to ensure a balanced representation of human and machine-authored texts. The bootstrap dataset is formatted as newline-delimited JSON files, where each file contains a list of articles. These articles are authored either by one or more human authors or entirely by an AI, specifically Google's Gemini Pro model. The dataset structure is pivotal for the task, as it contains pairs of texts where each pair is written on the same topic but by different authors-one human and one machine. The file format for these pairs is demonstrated below: {"id": "gemini-pro/news-2021-01-01-2021-12-31-kabulairportattack/art-081", "text": "..."} {"id": "gemini-pro/news-2021-01-01-2021-12-31-capitolriot/art-050", "text": "..."} Each text pair in the dataset is meticulously labeled with `0` or `1`, indicating whether the texts are from the same author, thereby facilitating supervised learning. The test dataset is provided in a slightly altered format to challenge the model's ability to generalize. Instead of individual files, it is delivered as a single JSONL file where each line contains a pair of texts. The content of this file is arranged such that the identities of the authors are anonymized, and the order of texts scrambled:

{"id": "iixcWBmKWQqLAwVXxXGBGg", "text1": "...", "text2": "..."} {"id": "y12zUebGVHSN9yiL8oRZ8Q", "text1": "...", "text2": "..."} Participants are tasked with predicting which of the two texts in each pair is human-authored. This setup tests the model's ability to discern subtle linguistic and stylistic nuances that typically distinguish human writing from its AI-generated counterpart. Access to the dataset is regulated via Zenodo, where participants must register and request access using their Tira-registered email, ensuring that the use of this data remains confined to research purposes and that no redistribution occurs. This controlled distribution ensures compliance with copyright regulations and maintains the integrity of the data for academic and developmental uses.

Methodology

Dataset Preprocessing

Effective data preprocessing is essential for the robust performance of machine learning models, particularly in tasks involving natural language processing such as authorship verification. For the Generative AI Authorship Verification Task at PAN@CLEF 2024, our preprocessing routine involved several critical steps to enhance the quality and consistency of model inputs.

Initially, the text normalization process involved removing all punctuation and converting text to lowercase to reduce variability and focus the model's learning on substantive content. This was coupled with the removal of non-alphabetic characters and numerals to ensure that the model trained strictly on textual elements. Following normalization, stopwords-common words that typically do not contribute to the identification of authorship-were removed to minimize data noise and enhance focus on more distinctive text features.After cleaning the texts, the corpus was tokenized into individual words or tokens, which is essential for structuring raw text into a format suitable for machine learning models. The texts were then vectorized using a pre-trained Bert tokenizer, which also standardized the length of tokens through padding and truncation to optimize computational efficiency. To address the challenge of limited training data, we implemented data augmentation techniques to artificially expand the dataset, creating new text pairs from existing ones by subtly modifying texts while preserving their key attributes. This approach helped improve the model's generalization capabilities from training scenarios to realworld applications.

Throughout the preprocessing stages, we meticulously ensured that the alterations did not compromise the semantic integrity or the stylistic attributes of the texts, which are crucial for authorship identification. This comprehensive preprocessing not only prepared the dataset for effective training of our Bert_T model but also enhanced the model's accuracy in distinguishing between human and machine-generated texts, a critical aspect of the verification task.

Network Architecture

In this study, we introduced Bert_T, a hybrid neural network model that integrates BERT-base for robust feature extraction with a Transformer encoder to handle attention-based dynamics, specifically tailored for distinguishing between human-written and machine-generated texts. We employ the bert-base-uncased model from Hugging Face's Transformers library as our foundational pre-trained BERT layer, leveraging its well-established capabilities in natural language understanding. This layer focuses on the CLS token embedding to capture comprehensive textual context, which is then processed through a Dropout layer to prevent overfitting and enhance generalizability. The Transformer Encoder, equipped with a multi-head attention mechanism, dynamically integrates information across text segments, crucial for identifying subtle linguistic and stylistic nuances. During testing, Bert_T processes each text in a pair independently in a JSONL format, evaluating the likelihood of each text being human-written and comparing these scores to classify texts; the text with the higher score is deemed humanauthored. Optimization of model parameters such as learning rate and batch size, along with the use of Binary Cross-Entropy Loss, fine-tunes the model's accuracy, ensuring it performs effectively on metrics such as ROC-AUC and Brier scores. This configuration enables Bert_T to meet the specific challenges of the Generative AI Authorship Verification Task at PAN@CLEF 2024, demonstrating both innovative theoretical approaches and practical discriminative capabilities, as illustrated in Figure 1: Bert_T Architecture.

Experiments and Results

Experimental Setting

In our experimental setup for evaluating the Bert_T model's ability to distinguish between human-authored and machine-generated texts, we preprocessed the dataset and divided it into training and testing sets with a 7:3 ratio. The model, a Bert_T, integrates a pretrained BERT base model with a Transformer layer tailored for sequence classification, featuring 768 hidden units, four attention heads, and a linear classifier. Training parameters were meticulously set, with a batch size of 8 and a learning rate of 1e-6 over 300 epochs using the AdamW optimizer on CUDAcapable GPUs to balance computational efficiency and learning depth.

Metrics

Our evaluation framework was meticulously designed to rigorously assess the performance of the Bert_T model across several metrics that reflect its effectiveness in distinguishing between human-authored and machine-generated texts. The model was evaluated using a standard set of metrics that are commonly employed in authorship verification tasks, including ROC-AUC, Brier score, C@1, F1, and F0.5u, along with the arithmetic mean of these metrics to provide a comprehensive overview of performance.

Performance Metrics: ROC-AUC measures the area under the receiver operating characteristic curve, providing insight into the model's ability to discriminate between classes across all thresholds [8]. The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The formula is given by:

ROC-AUC = ∫ TPR(𝑡) 1 0 𝑑(FPR(𝑡))(1)

Brier Score evaluates the mean squared error of the probabilities assigned, indicating the accuracy of probability predictions [9]. The lower the Brier score, the better, as it reflects a closer proximity to the true outcome. It is calculated as:

Brier Score = 1 𝑁 ∑(predicted probability 𝑖 − actual outcome 𝑖 ) 2 𝑁 𝑖=1(2)

C@1 represents a modified accuracy that treats non-answers (predictions with a confidence score of 0.5) by averaging the accuracy of the remaining cases, thus penalizing uncertainty [10]. This metric is particularly useful in situations where making no prediction is preferable to making an incorrect prediction. The formula is:

𝐶@1 = Number of

F1 Score is the harmonic mean of precision and recall, offering a balance between the precision of the classifier and its recall capability [11]. It is particularly useful in situations where an equal balance between precision and recall is desired. The formula is:

𝐹1 = 2 ⋅ Precision × Recall Precision + Recall(4)

Where Precision = TP TP+FP 𝑎𝑛𝑑 Recall = TP TP+FN . F0.5u is a variant of the F-measure that weights precision more than recall, suitable for scenarios where false positives are more costly than false negatives [12]. It is calculated using the formula:

𝐹0.5𝑢 = (1 + 0.5 2 ) ⋅ Precision × Recall 0.5 2 ⋅ Precision + Recall(5)

These metrics collectively provided a robust framework for evaluating our model, enabling us to effectively measure its ability to perform authorship verification across different dimensions of accuracy and reliability.

Results

Our Bert_T model demonstrated robust performance in the PAN 2024 Voight-Kampff Generative AI Authorship Verification task, showcasing substantial effectiveness across several critical metrics. As evidenced in Table 1, Bert_T achieved a ROC-AUC of 0.967, which, while slightly lower than the top-performing Baseline Binoculars at 0.972, reflects a high level of discriminative capability. The Brier score for Bert_T was 0.903, indicating reliable probability predictions of class membership, although it did not surpass the Baseline Binoculars, which scored 0.957. Regarding precision-related metrics, Bert_T recorded scores of 0.869 for both C@1 and F1, and 0.872 for F0.5u, remaining competitive although below the near-perfect scores around the 95th percentile.

Table 2 presents an overview of Bert_T's mean accuracy across nine test set variants, showing considerable stability and less variability in performance compared to other models which displayed more significant fluctuations. Bert_T maintained a minimum accuracy of 0.354 and a maximum of 0.980, with a notable median performance of 0.892, and the 25th and 75th percentiles at 0.864 and 0.896, respectively. These figures underscore Bert_T's robust performance across different testing scenarios, highlighting its efficacy in handling the complex demands of the verification task.

In terms of competition standings, our submission ranked 20th out of 30 participants on the official PAN 2024 leaderboard. Notably, Bert_T outperformed all but one baseline with a ranking score over all test datasets of 0.608, as detailed on the PAN 2024 leaderboard. This ranking underscores our model ' s competitive edge and its significant discriminative power in a challenging environment filled with diverse and sophisticated entries.

These results affirm that Bert_T not only embodies theoretical innovation but also exhibits significant practical capabilities in the authorship verification domain. The model's ability to effectively discern between human and machine-generated texts makes it a valuable tool for complex text analysis tasks. Future work will focus on further optimizing model parameters, enhancing feature engineering techniques, and expanding the diversity of the training dataset to boost the model ' s generalizability and performance across varied textual contexts. This continuous improvement aims to refine Bert_T's capabilities for higher detection accuracy and broader application scope in real-world scenarios.

Conclusion

This paper details the development and evaluation of the Bert_T model, our innovative contribution to the PAN 2024 Voight-Kampff Generative AI Authorship Verification task.

Combining BERT-based feature extraction with a Transformer encoder for attention processing, Bert_T effectively differentiates between human-written and machine-generated texts. It demonstrated strong performance across various metrics, achieving a ROC-AUC of 0.967 and a Brier score of 0.903, which confirms its reliability in predictions. Despite stiff competition from established baselines, Bert_T maintained consistent performance across different test set variants, with accuracies ranging from a minimum of 0.354 to a maximum of 0.980. This showcases its capability to handle diverse and complex textual scenarios effectively. Moving forward, we plan to further refine Bert_T by optimizing its parameters, enhancing its feature engineering techniques, and expanding its training dataset to cover a broader spectrum of text types and genres. These efforts will not only improve the model's performance in authorship verification tasks but also extend its applicability to a wider range of natural language processing challenges, aiming for higher detection accuracy and broader operational scope.

Figure 1 :1Figure 1: Bert_T Architecture

Table 1 :1The final performance of our submission on PAN 2024 (Voight-Kampff Generative AI Authorship Verification)ApproachROC-AUCBrierC@1F1F0.5uMeanBert_T0.9670.9030.8690.8690.8720.896Baseline Binoculars0.9720.9570.9660.9640.9650.965Baseline Fast-DetectGPT (Mistral)0.8760.80.8860.8830.8830.866Baseline PPMd0.7950.7980.7540.7530.7490.77Baseline Unmasking0.6970.7740.6910.6580.6660.697Baseline Fast-DetectGPT0.6680.7760.6950.690.6910.70495-th quantile0.9940.9870.9890.9890.9890.990

Table 2 :2Overview of the mean accuracy over 9 variants of the test setApproachMinimum25-th QuantileMedian75-th QuantileMaxBert_T0.3540.8640.8920.8960.980Baseline Binoculars0.3420.8180.8440.9650.996Baseline Fast-DetectGPT (Mistral)0.0950.7930.8420.9310.958Baseline PPMd0.2700.5460.7500.7700.863Baseline Unmasking0.2500.6620.6960.6970.762Baseline Fast-DetectGPT0.1590.5790.7040.7190.98295-th quantile0.8630.9710.9780.9901.00075-th quantile0.7580.8650.9330.9590.991Median0.6050.6450.8750.8890.93625-th quantile0.3530.4960.6580.6750.711Min0.0150.0380.2310.2440.252

Acknowledgements

This work was supported by grants from the Guangdong-Foshan Joint Fund Project (No. 2022A1515140096) and Open Fund for Key Laboratory of Food Intelligent Manufacturing in Guangdong Province (No. GPKLIFM-KF-202305).

Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification JBevendorff XBCasals BChulvi DDementieva AElnagar DFreitag MFröbe DKorenčić MMayerl AMukherjee APanchenko MPotthast FRangel PRosso ASmirnova EStamatatos BStein MTaulé DUstalov MWiegmann EZangerle Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2024) Lecture Notes in Computer Science

Berlin Heidelberg New York

Springer 2024 On the usefulness of compression models for authorship verification OHalvani CWinter LGraner Proceedings of the 12th international conference on availability, reliability and security the 12th international conference on availability, reliability and security 2017 Generalizing unmasking for short texts JBevendorff BStein MHagen Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2019 1 Long and Short Papers Fast-detectgpt: Efficient zero-shot detection of machinegenerated text via conditional probability curvature GSBao YBZhao ZYTeng arXiv:2310.05130 2023 arXiv preprint A Intelligent Detection Method for Irony and Stereotype Based on Hybird Neural Networks ZXYang LMa WYYang CLEF 2022 Labs and Workshops, Notebook Papers GuglielmoFaggioli NicolaFerro AllanHanbury MartinPotthast September 2022 Analysis of Irony and Stereotype Spreaders Based On Convolutional Neural Networks DYuan WYYang LMa CLEF 2022 Labs and Workshops, Notebook Papers GuglielmoFaggioli NicolaFerro AllanHanbury MartinPotthast September 2022 Continuous Integration for Reproducible Shared Tasks with TIRA MFröbe MWiegmann NKolyada BGrahm TElstner FLoebe MHagen BStein MPotthast 10.1007/978-3-031-28241-6_20 doi: Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023) Lecture Notes in Computer Science JKamps LGoeuriot FCrestani MMaistro HJoho BDavis CGurrin UKruschwitz ACaputo

Berlin Heidelberg New York

Springer 2023 Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation AMCarrington DGManuel PWFieguth IEEE Transactions on Pattern Analysis and Machine Intelligence 45 1 2022 Modified Brier score for evaluating prediction accuracy for binary outcomes WYang JJiang EMSchnellinger Statistical methods in medical research 31 12 2022 A simple measure to assess non-response APeñas ARodrigo 2011 Scikit-learn: Machine learning in python FPedregosa GVaroquaux AGramfort the Journal of machine Learning research 12 2011 Generalizing unmasking for short texts JBevendorff BStein MHagen MPotthast Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Long and Short Papers the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2019 1