<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AI-Writing Detection Using an Ensemble of Transformers and Stylometric Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>George Mikros</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Athanasios Koursaris</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitrios Bilianos</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Markopoulos</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hamad Bin Khalifa University</institution>
          ,
          <addr-line>LAS Building, Education City, Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National and Kapodistrian University of Athens, Department of Informatics and Telecommunication</institution>
          ,
          <addr-line>Zografou</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National and Kapodistrian University of Athens, Department of Italian Language and Literature</institution>
          ,
          <addr-line>Zografou</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National and Kapodistrian University of Athens, Department of Philology</institution>
          ,
          <addr-line>Zografou</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>This study aims to develop an efective and precise methodology for detecting AI-generated text, leveraging the synergistic combination of transformer learning and stylometric features. The research utilized two datasets provided by the AuTexTification: Automated Text Identification shared task, a component of IberLEF 2023, the 5th Workshop on Iberian Languages Evaluation Forum held at the SEPLN 2023 Conference. Our team engaged in both English language subtasks, which included binary classification of texts as either human or AI-generated and multiclass classification to predict the specific AI writing model employed from a selection of six. Our main approach was to experiment with multiple Transformer models and, at the same time, to use an extensive stylometric feature engineering workflow. Each method (transformers and stylometric features) was first applied separately, and then we explored various ways to combine them. The most eficient method was based on ensemble learning utilizing majority voting employing the two most accurate transformer models in our training data and a comprehensive combined concatenation of many diferent stylometric feature groups. The macro-F1 scores on the test sets on subtasks 1 and 2 were 60.78 and 55.87, respectively, positioning our group above the median of the competing teams. This study underscores the potential of combining transformer learning and stylometric features to enhance the accuracy of AI-generated text detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AI-writing detection</kwd>
        <kwd>stylometry</kwd>
        <kwd>transformers</kwd>
        <kwd>ensemble learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        During the last few years, the rise of Large Language Models (LLMs) has disrupted most Natural
Language Understanding (NLU) and Language Generation (LG) tasks ofering high-quality,
coherent, and content-specific text [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In most cases, the output is indistinguishable from a
human author’s production. Moreover, most approaches for detecting whether AI has written
a text or not have failed, or, at best, they show some promising results that, however, are not
reliable enough to be used in real-world AI writing detection applications [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. LLM
writing detection has become even more challenging with the release of ChatGPT, the newest
LLM from Open AI. ChatGPT is based on the GPT-3 and more recently on the GPT-4 LLM family
of Open AI and can generate even more human-like and coherent text than its predecessors.
This has led to a growing concern about the potential for malicious use of these models, such as
creating fake news or impersonating individuals online.
      </p>
      <p>
        Given the rapidly emergent malevolent uses, there is an urgent need to develop methods to
discriminate AI vs. human written texts. For this reason, the AuTexTification[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]: Automated
Text Identification shared Task was organized in the framework of the 5th Workshop on Iberian
Languages Evaluation Forum at the SEPLN 2003 Conference[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The AuTexTification shared
task provided two subtasks with related research objectives. In subtask 1, participants received a
corpus, and they were asked to identify which texts were produced by an AI model or a human.
To stimulate the models’ ability to adapt to diverse writing styles, the training dataset in the
specific subtask was based on three separate domains, while the testing datasets contained
texts in two diferent domains. In subtask 2, participants were handed a series of texts, all
of which were produced by six distinct AI models, each representing a stepwise increase in
neural parameters of the generative model spanning from 2 billion to 175 billion. Both subtasks
provided texts written in English and Spanish. However, our group focused on the English
datasets and all experiments reported have been done in these language text collections.
      </p>
      <p>In subtask 1, AI writing detection can be considered as a binary classification task with
two classes (AI-generated vs. Human) which can be accommodated in the standard machine
learning supervised classification paradigm. In this sense, this task is analogous to the authorship
attribution task in which the AI and Human represent two diferent authors with separate
stylometric profiles that can be distinguished due to their diferent frequency profiles of a
common set of stylometric features. In subtask 2, each generative model has been pretrained
with diferent neural parameters and produces linguistic output using a unique generative
probabilistic method. For this reason, we can expect each generative model to produce distinct
stylometric profiles and behave like a distinct author. Again, in this subtask, we can employ a
standard authorship attribution model based on a multiclass supervised classification model.</p>
      <p>The present paper is organized in five sections. In the next section (Section 2), we will delve
into a literature review, focusing on the area of AI-generated text detection, while showcasing
the latest studies in this field. Section 3 will outline the methodology we implemented in this
shared task, detailing the various stylometric characteristics and machine learning techniques
we utilized. In Section 4, we will highlight our range of experiments and the final iterations
submitted for the shared task. Finally, in Section 5, we will explore the impact of our research,
discuss some limitations, and consider potential future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>
        The recent advancements in automatic text generation facilitated by powerful language models
have opened up new possibilities for research and development in various domains. Language
models such as Generative Pre-trained Transformer (GPT) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] , Pathways Language Model
(PaLM) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and ChatGPT have demonstrated impressive capabilities in producing contextually
coherent and fluent text [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Most of these models’ architecture is transformer-based [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Models such as GPT-2 and CTRL
have billions of parameters and they are trained on very large amounts of raw text from diferent
sources (e.g., Wikipedia, Reddit). Such models can also be fine-tuned for a domain-specific task.
      </p>
      <p>
        Text generative models have been utilized successfully in applications that range from code
auto-completion to question-answering systems and even story generation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, the
malicious use of these models to spread fake news, fake product reviews, and other misuses
by adversaries (e.g., spamming, phishing) has become a pressing concern [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The task of
guessing whether a text is produced by a human or a robot has become famous since the Turing
test [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] but recently has emerged as a pressing AI safety task. Crucially, humans exhibit
limited capability in accurately discerning fake news and comments generated by these models,
performing no better than chance [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] show that human raters -even expert, trained
oneshave consistently worse accuracy than automatic discriminators, while the performance gap is
expected to grow as researchers train bigger and better models.
      </p>
      <p>
        The approaches to address the task of identifying machine-generated text from human text
mainly fall into three categories: Simple classifier approaches, zero-shot, and fine-tuning-based
detection [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        In simple classifier approaches, the classifier is trained from scratch. Some of these approaches
employ classical machine learning methods. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use a simple baseline model with tf-idf unigram
and bigram features on top of a logistic regression model. Then, the authors test a zero-shot
approach. In zero-shot approaches, a pretrained model is employed to detect generated text
from itself or similar models. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] thus present a baseline consisting of a threshold on the total
probability of the text sequence. The text is predicted as machine-generated if its likelihood,
according to GPT-2, is closer to the mean likelihood over all machine-generated sequences than
to the mean of human-generated sequences.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] provide GLTR (Giant Language Model Test Room), a tool of baseline statistical methods
that can highlight the distributional diferences in text generated by the GPT-2 model and
human-written text. The generated text is sampled word by word from a next token distribution.
The distribution usually difers from the one that humans subconsciously use when they write
or speak [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        The authors of GROVER [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], follow the fine-tuning based approach, using BERT,
GPT2, and GROVER as pre-trained language models. They find that GROVER outperforms the
other models, concluding that the best models for generating disinformation are also the best
at detecting their own generations. However, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] found that fine-tuning a RoBERTa-based
detector achieved higher accuracy on GPT-2 generated texts than fine-tuning a GPT-2 detector
with equivalent capacity. This result might be due to the superior quality of the bidirectional
representations employed by the RoBERTa language model compared to the unidirectional
representation learning of GPT-2 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. RoBERTa-based detectors also performed better than
other detectors in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Finally, [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] experimented with several models: the
Groverbased detector, GLTR, RoBERTa-based detector, and a simple ensemble that fused these detectors
using logistic regression. The RoBERTa-based detector again outperformed the Grover-based
detector and GLTR.
      </p>
      <p>
        More recently [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] proposed a novel algorithm using stylometric signals to detect
AIgenerated tweets. The authors formulated two main tasks: firstly, detecting whether tweets in a
timeline are human-written or AI-generated, and secondly, identifying the point in a mixed
timeline where the authorship changes from human to AI. To achieve this, they proposed using
stylometric features such as phraseology, punctuation, and linguistic diversity to capture writing
style. These features are used in conjunction with pretrained language model embeddings. The
authors created an in-house dataset of human-written and AI-generated tweets for evaluation
and also use an existing dataset called TweepFake. They tested two models: a fusion model
that combines stylometric features and language model embeddings to detect human vs AI
tweets, and a change point detection model (StyloCPA) using stylometric time series that
detects when authorship changes from human to AI. The experiments showed that stylometric
features improve the performance of pretrained language model-based detectors, especially
when limited training data is available or the input text is short. The change point detection
model also outperforms baselines in detecting the localization of human-AI authorship changes.
The analysis showed that punctuation and phraseology features are most useful, while linguistic
features require longer text to be efective models.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Datasets</title>
        <p>AuTexTification provided two English corpora (one for each subtask) for training purposes.
Detailed descriptive statistics of both corpora per class can be found in Table 1 and Table 2
respectively:</p>
        <p>After the training period, two separate unlabeled testing datasets (each for each subtask) were
provided to be used for class prediction using the classification models which were developed
in the training phase of the shared task. Descriptive statistics of both testing corpora can be
found in Table 3 below:</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Features and classification algorithms</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Transformers</title>
          <p>
            Ever since the development of the BERT model by the team of Google Brain as well as the
publication of the revolutionary “Attention Is All You Need” paper [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], the transformer
architecture remains the backbone of some of the most prominent Deep Learning-based classification
algorithms. The usage of the Attention Mechanism has given transformers a significant leverage
against RNN-based architectures (LSTM, GRU), and, at the same time, Transfer Learning has
allowed for the fine-tuning of already existing Transformer architectures in accordance with the
data at hand. In order to obtain optimal results, it is also possible to pretrain certain architectures
in various tasks, them being Masked Language Modelling (MLM) and Next Sentence Prediction
(NSP). For the purposes of the two diferent subtasks of the competition, a Transfer Learning
approach was adopted, which involved the pretraining, customization and hyperparameter
tuning of three diferent architectures:
• BERT: The original BERT model developed by Google, in its base-cased form [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], was
the first model that was used for the purposes of the competition. The tokenization
process allows for the conversion of words and subword units into an ID dictated by
the vocabulary of the tokenizer. The IDs are then converted to 768-dimensional word
embeddings based on the context. The texts are also padded, with padding itself being
ignored due to the creation of an attention mask.
• RoBERTa: A more robustly optimized pretraining/training approach for BERT [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]. The
roberta-base-openai-detector model was utilized, having been trained in a corpus of texts
generated by the GPT and GPT-2 models. Again, tokenization is essentially identical,
with the 768-dimensional word embeddings.
• ELECTRA: A “discriminator” approach in training transformers [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ]. Pretraining involves
replacing certain words with plausible alternatives to allow for the distinction between
the original text and the reconstructed version. The contextual embeddings of the
electrabase-discriminator model used for the task have a dimensionality of 1024.
• XLNet: Yet another diferent pretraining approach, this time favoring a permutation-based
approach instead of predicting masked words in sequential order [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. All words in a
sentence can be predicted given the context of all other words in any given permutation.
The xlnet-base-cased model used for the tasks at hand entails contextual embeddings not
too dissimilar to those of the BERT transformer, with a dimensionality of 768.
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Stylometric features</title>
          <p>
            Over the past few decades, computational stylistics has developed into a robust and reliable field
within Natural Language Processing (NLP). Its aim is to dissect an author’s writing style into a
variety of linguistic features. When combined, these features create a multi-faceted stylometric
profile that uniquely correlates with the author’s identity. The linguistic features that have been
suggested as useful in authorship attribution tasks are numerous and cover all linguistic levels.
Given that an author’s style is a multi-layered phenomenon involving a complex and dynamic
interaction of various linguistic levels, the most efective strategies tend to combine features
that represent diferent and complementary facets of linguistic structure. For the needs of the
shared task, we decided to use several feature groups that capture a wide range of linguistic
phenomena and capture complementary linguistic functions. More specifically, we calculated
the following feature groups in both training datasets:
• Author’s Multilevel Ngram Profiles (AMNP): The Author’s Multilevel N-gram Profile
(AMNP), first proposed by [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ] , and since then it has established its efectiveness in
author attribution and profiling tasks, as supported by multiple studies [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ] [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ] [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ] [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ]
[
            <xref ref-type="bibr" rid="ref28">28</xref>
            ]. AMNP is a method of representing a document that utilizes expanding n-gram sizes
in both character and word units. This dual construction approach allows it to cover
various linguistic levels. For the purposes of the shared study, we selected the 500 most
frequent character and word 2-grams and the 500 most frequent words, yielding a total
feature vector of 1,500 elements. The resulting vector constitutes the AMNP, a document
representation that simultaneously captures sequences of both characters and words. All
frequencies were converted to relative frequencies to avoid diverse text sizes bias.
• Stylometric and Language complexity indices (Stylo): In stylometric research, there is
a long tradition of stylometric features that have been used for authorship attribution,
including word and sentence length, lexical diversity indices, and a variety of features that
capture quantitative properties of the text (e.g., distribution parameters of word
frequencies, word entropy, etc.). Although these indices are not as eficient as the combination of
word and n-gram frequencies, when they supplement other more robust document
representations, they can increase the overall accuracy of any statistical model that predicts
author identity. In this shared task we computed 74 stylometric features organized in the
following wider thematic areas: a) Word and Sentence Lengths and standard deviations
(measure in characters, words and syllables) b) Word Frequency distribution indices
such as hapax legomena, dis legomena c) Quantitative aspects of text such as entropy,
perplexity, and the corresponding standard deviations d) Lexical diversity indices such as
various text size neutralized TTR measures (log-transformed TTR, root-transformed TTR,
Mass TTR, Mean Segmental TTR, Moving Average TTR), Measure of Textual Lexical
Diversity (MTLD), Hypergeometric Distribution D measure, functional diversity (ratio
of content to functional words) e) Readability indices including Flesch-Kincaid reading
ease and grade level, SMOG, Automated Readability Index, Dale-Chall, Linsear Write
formula, Gunning-Fog score, Coleman-Liau index f) A large variety of metrics related to
coherence, text quality, syntactic complexity and PoS tags frequencies. For more details
about the indices involved in this feature group see [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ].
• LIWC features (LIWC): LIWC (Linguistic Word Inquiry) is a dictionary-based tool which
has been mainly used to measure aspects related to the psychological properties of a
given text segment. It was created in the 1990s by James Pennebaker [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ] as an attempt
to automatically analyze essays and determine whether people who talk or write about
their distressing personal experiences could show improvement in their physical and
psychological health status. Although initially the LIWC tool has been primarily employed
in psychological research, later it proved to be a successful “prediction tool” for several
NLP (Natural Language Processing) tasks including authorship attribution [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ] and author
profiling tasks [
            <xref ref-type="bibr" rid="ref32">32</xref>
            ] [33]. Moreover, in recent experimentation [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ] LIWC features were
found to be the most eficient in predicting whether a text has been written by ChatGPT
(GPT-3 model) or human outperforming even the GPT-3 embeddings. In this shared
task we utilized the LIWC-22 tool [34] and its related English dictionary counting 120
linguistic features.
• GPT-2 word embeddings (GPT-2): Word Embeddings are amongst the most popular
representation of a text’s vocabulary. Words or phrases from the lexicon are transformed
into vectors of real numbers and, subsequently, can be used in language modelling and
feature learning approaches. These numerical vectors can capture text representations in
an n-dimensional space, where words with the same meaning are represented similarly.
This indicates that two comparable words are located extremely closely together in
the vector space and thus have almost identical vector representations. Therefore, the
objective of creating a word embedding space is to record some form of relationship in
that space, be it a relationship based on meaning, morphology, context, or any other
type. Since language produced by diferent authors exhibits systematic diferences across
all levels of its linguistic organization [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ], we foresee that word embeddings can be
used efectively for discriminating AI from Human writing. In this study we used the
pretrained GPT-2 word embeddings [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] . OpenAI GPT-2 is a large autoregressive language
model, built on a transformer-based architecture that encompasses 1.5 billion parameters.
It was trained on a comprehensive dataset named WebText, composed of 8 million web
pages, equating to 40 GB of internet text. The primary training objective of GPT-2 was to
predict subsequent words in a sequence. Notably, GPT-2 demonstrated its remarkable
capability for zero-shot task transfer, exhibiting proficiency in diverse tasks like machine
translation and reading comprehension. For each text we calculated the average of word
embeddings across the words it contains resulting in 768 dimensions for each text.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Subtask 1 (English texts)</title>
        <sec id="sec-4-1-1">
          <title>4.1.1. Experiments with Transformers</title>
          <p>All experiments involving transformer-based classification algorithms were conducted using
the PyTorch library, an open-source Deep Learning framework developed by Meta AI, allowing
for the development of neural network architectures with an Object-Oriented Programming
approach. PyTorch exhibits great interoperability with the Hugging Face Transformers library,
thus making it an excellent choice for Transfer Learning approaches.</p>
          <p>BERT and RoBERTa models were initially pretrained by using MLM (Masked Language
Modeling), where 15% of words in each sentence of every given text was masked, while the
model attempted to restore all missing tokens, and NSP where given a sentence, the model made
an attempt to guess the next sentence in the text. Experiments were made with and without
pretraining the models, but the resulting models (bert-base-autextification and
roberta-baseautextification) did not seem to impact the results in any meaningful way.</p>
          <p>For the purposes of the binary classification task, performance was better when data
preprocessing was minimal to nonexistent. Words and subword units were then converted to IDs,
while padding to the maximum text length (depending on the vocabulary of the tokenizer and
the subword units included) was applied. Padding was ignored during the training process due
to the creation of an attention mask for every given entry.</p>
          <p>The classifier architectures were created as classes, each with a constructor containing the
various layers of the architecture, and a forward function for performing forward propagation.
In every case, the first layer involved is that of the transformer architecture, followed by a
dropout layer with a probability of 0.3, followed by a linear classification head with an input
corresponding to the dimensionality of the transformer embeddings (e.g., 768 for BERT). The
linear head input is taken either in the form of the pooled output (BERT/RoBERTa), the last
timestep (ELECTRA), or the logits (XLNet). Since we are dealing with a binary classification
task, the classification head is activated by a sigmoid function, converting the raw logits into a
probability between 0 and 1 and being rounded. 0 represents generated texts, while 1 represents
human-made ones.</p>
          <p>Training was done in mini-batches with a batch size of 64 for the duration of 2 epochs total.
Loss was calculated by the BCELoss function (Binary Cross-Entropy) provided by PyTorch, while
the AdamW optimizer (Adam with weight decay) was used for optimization purposes, with an
initial learning rate of 5e-5. Evaluation metrics were performed on a validation set (extracted by
the train_test_split) function provided by the scikit-learn framework and comprising 20% of the
original dataframe (all entries being random). Performance was evaluated by using Accuracy,
Precision, Recall and F1-score. The ranking of the models in the subtask 1 dataset is displayed
in Table 4:</p>
          <p>The best results were yielded by the ELECTRA transformer, even though it is evident that
transformer architectures seldom fall beneath the 0.9 threshold.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Experiments with Stylometric Features</title>
          <p>Experiments using the stylometric feature were performed using the PyCaret library [35] .
PyCaret is a Python-based, open-source machine learning library that simplifies and automates
machine learning workflows with its low-code approach. As an all-encompassing machine
learning and model management tool, PyCaret accelerates the experimental process and fosters a
versatile workspace conducive to eficient machine learning training. PyCaret compares several
state-of-the art machine learning algorithms and provides a dashboard where the trained models
are ranked by their classification metrics.</p>
          <p>In the data preprocessing phase, we converted all values into z-scores to ofset the impact of
diferent scales across various features. Additionally, we eliminated any features demonstrating
perfect collinearity to prevent bias in the training of multiple classification models. We then
evaluated the models using a 5-fold cross-validation technique and ranking them according
to their macro F1. Table 5 presents the ranking of models that were trained individually using
each feature group, as well as a model trained using a concatenation of all features utilized in
this study.</p>
          <p>The highest F1 score (0.89) among the various feature groups is obtained by combining all
the feature groups and using the XGBoost algorithm. The most single efective feature groups
used are the GPT 2 embeddings and the LIWC features that yielded a F1 score of 0.86 and
0.85 respectively. To further enhance the classification eficiency of the employed models we
experimented with various methods to combine the feature groups tabular datasets with the
text only transformer models. We applied a process known as data prepending to our dataset.
This involved appending a sequence of feature measurements to the end of each text item,
with each measurement separated by the token [SEP]. To manage the resulting increase in
sequence length, we utilized the Longformer and BigBird transformer models. These models are
specifically designed to handle longer sequences compared to traditional transformer models
and can accommodate a window size of 4092 dimensions. However, this method didn’t work as
expected and didn’t increase the F1 in the training dataset. A much simpler approach was finally
adopted using an ensemble method based on majority voting. We combined the predictions of
the All-features group with the predictions with the best scoring transformers (ELECTRA and
RoBERTa) and obtained the highest F1 (0.95) compared to all the other used methods. Given the
best performance in the training data we decided to adopt it as one of our methods in the final
run of the AuTexTification shared task. The final results in the testing dataset (Macro-F1: 60.78,
CI: 60.14-61.49), justified our decision since our ensemble method scored better compared to
our other runs in both subtasks and got the 32nd position out of 76 runs.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Subtask 2 (English texts)</title>
        <sec id="sec-4-2-1">
          <title>4.2.1. Experiments with Transformers</title>
          <p>Procedures concerning the multilabel classification task were -to a degree- identical, with the
main diference being the encoding of the given labels (A to F) into numbers (0 to 5), and the
usage of the LogSoftmax function as activation for the linear classification head, as well as
regular cross-entropy as a loss function (CrossEntropyLoss as given by PyTorch). For evaluating
performance, the weighted average method of computing Accuracy, Precision, Recall and
F1-score was preferred. The results can be seen in Table 6:</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Experiments with Features</title>
          <p>In subtask 2 we followed the same methodology discussed in subtask 1 and discussed in section
4.1.2. We used PyCaret library and utilized the 4 feature groups (GPT 2, LIWC, Stylo, AMNP) and
their combined version. Table 7 presents the ranking of models that were trained individually
using each feature group, as well as a model trained using a concatenation of all features utilized
in this study.</p>
          <p>The ranking reveals that the detection of diferent generative AI models is based on diferent
features than the AI vs Human text classification. In this subtask, the most standard authorship
attribution features, i.e., the stylometric features (Stylo) and the multilevel ngram profiles
(AMNP) seem to work better and ofer increased discriminatory power.</p>
          <p>Again, in this subtask the best F1 was obtained through a simple majority voting between the
two most accurate transformer models (ELECTRA and RoBERTa) and the All-features group.
This ensemble approach was the best performing among our runs in the testing dataset of
AuTexTification shared task obtaining the 21st position among 38 runs with Macro-F1 55.87
and CI ranging from 54.86 to 56.81.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>This study explored various stylometric features and transformer-based models for detecting
AI-generated texts. We experimented with diferent methods to improve classification accuracy,
including preprocessing techniques, hyperparameter tuning, and ensemble methods. Our results
demonstrate that combining multiple complementary approaches yields the most efective
solution.</p>
      <p>In the training dataset of the subtask 1, discriminating between human-written and
AIgenerated texts, our highest accuracy (95% F1 score) was achieved using an ensemble method
based on majority voting that fused an XGBoost model trained on the combined features from
our feature groups with predictions from the ELECTRA and RoBERTa transformer models. In
subtask 2, identifying the specific AI model employed, an ensemble of the CatBoost model
trained on the same combined features and the ELECTRA and RoBERTa transformer models
achieved the top F1 score of 60%.</p>
      <p>
        These findings highlight the benefits of fusing stylometric features and neural language
models to detect AI writing. Stylometric features have a long history of efectiveness in authorship
analysis and capture nuanced properties of writing style. In comparison, transformer models
can detect more subtle patterns in the data. Combining these two approaches leverages their
complementary strengths, confirming relative previous research [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>Our study has some limitations. We only experimented with a subset of possible stylometric
features and transformer architectures. Testing additional features and models may further
enhance accuracy. Furthermore, our work focused on English texts; exploring other languages
could reveal cross-linguistic patterns. Finally, continued progress in neural language generation
will likely require continual advancement in detection techniques.</p>
      <p>In conclusion, as concerns grow over misuse of AI writing, improved methods for
discriminating machine- from human-generated text are increasingly important. Our work suggests
that hybrid approaches uniting stylometric features and neural networks show promise for
addressing this challenging problem. We hope our findings inspire continued progress in this
critical area of AI safety.
(EISIC), IEEE, 2017, pp. 54–60.
[33] P. Panicheva, T. Litvinova, Matching liwc with russian thesauri: an exploratory study,
in: Artificial Intelligence and Natural Language: 9th Conference, AINL 2020, Helsinki,
Finland, October 7–9, 2020, Proceedings 9, Springer, 2020, pp. 181–195.
[34] R. L. Boyd, A. Ashokkumar, S. Seraj, J. W. Pennebaker, The development and psychometric
properties of liwc-22, Austin, TX: University of Texas at Austin (2022) 1–47.
[35] Pycaret: An open source, low-code machine learning library in python, Online, 2023.</p>
      <p>Available at: https://www.pycaret.org.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fröhling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Feature-based detection of automated language models: tackling gpt-2, gpt-3 and grover</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <article-title>e443</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Keskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <article-title>Limits of detecting text generated by large-scale language models</article-title>
          ,
          <source>in: 2020 Information Theory and Applications Workshop</source>
          (ITA), IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Solaiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brundage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kreps</surname>
          </string-name>
          , et al.,
          <article-title>Release strategies and the social impacts of language models</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>09203</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Sarvazyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Á.</given-names>
            <surname>González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Franco</given-names>
            <surname>Salvador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Overview of autextification at iberlef 2023:
          <article-title>Detection and attribution of machine-generated text in multiple domains</article-title>
          ,
          <source>in: Procesamiento del Lenguaje Natural</source>
          , Jaén, Spain,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes-y</surname>
          </string-name>
          <string-name>
            <surname>Gómez</surname>
          </string-name>
          ,
          <source>Overview of IberLEF 2023: Natural Language Processing Challenges for Spanish and other Iberian Languages, Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schuh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tsvyashchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maynez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Prabhakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Reif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hutchinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Austin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gur-Ari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Duke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Levskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Michalewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fedus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ippolito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spiridonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sepassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Omernick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Pillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pellat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lewkowycz</surname>
          </string-name>
          , E. Moreira,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Polozov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Saeta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Firat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Catasta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Meier-Hellstern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fiedel</surname>
          </string-name>
          , Palm:
          <article-title>Scaling language modeling with pathways</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2204</volume>
          .
          <fpage>02311</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dauphin</surname>
          </string-name>
          ,
          <article-title>Hierarchical neural story generation</article-title>
          , arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>04833</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jawahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdul-Mageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. V.</given-names>
            <surname>Lakshmanan</surname>
          </string-name>
          ,
          <article-title>Automatic detection of machine generated text: A critical survey</article-title>
          , arXiv preprint arXiv:
          <year>2011</year>
          .
          <volume>01314</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>A. M. Turing</surname>
          </string-name>
          ,
          <article-title>Computing machinery and intelligence</article-title>
          ,
          <source>Mind</source>
          <volume>59</volume>
          (
          <year>1950</year>
          )
          <fpage>433</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ippolito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Duckworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eck</surname>
          </string-name>
          ,
          <article-title>Automatic detection of generated text is easiest when humans are fooled, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>1808</fpage>
          -
          <lpage>1822</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>164</volume>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>164</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fagni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Falchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gambini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Martella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tesconi</surname>
          </string-name>
          , Tweepfake:
          <article-title>About detecting deepfake tweets</article-title>
          ,
          <source>Plos one 16</source>
          (
          <year>2021</year>
          )
          <article-title>e0251415</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Strobelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Gltr:
          <article-title>Statistical detection and visualization of generated text</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>04043</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zellers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bisk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Roesner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <article-title>Defending against neural fake news</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Uchendu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Authorship attribution for neural text generation</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8384</fpage>
          -
          <lpage>8395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. I.</given-names>
            <surname>Adelani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yamagishi</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Echizen</surname>
          </string-name>
          ,
          <article-title>Generating sentiment-preserving fake online reviews using neural language models and their humanand machine-based detection</article-title>
          ,
          <source>in: Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020)</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>1341</fpage>
          -
          <lpage>1354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kumarage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Trapeznikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruston</surname>
          </string-name>
          , H. Liu,
          <article-title>Stylometric detection of ai-generated text in twitter timelines</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>03697</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Electra:
          <article-title>Pre-training text encoders as discriminators rather than generators</article-title>
          , arXiv preprint arXiv:
          <year>2003</year>
          .
          <volume>10555</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Mikros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Perifanos</surname>
          </string-name>
          ,
          <article-title>Authorship attribution in greek tweets using author's multilevel n-gram profiles</article-title>
          ., in: AAAI Spring Symposium: Analyzing Microtext,
          <year>2013</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Mikros</surname>
          </string-name>
          ,
          <article-title>Authorship attribution and gender identification in greek blogs</article-title>
          ,
          <source>Methods and Applications of Quantitative Linguistics</source>
          <volume>21</volume>
          (
          <year>2012</year>
          )
          <fpage>21</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Mikros</surname>
          </string-name>
          ,
          <article-title>Systematic stylometric diferences in men and women authors: A corpusbased study</article-title>
          , in: R. Köhler, G. Altmann (Eds.),
          <article-title>Issues in Quantitative Linguistics 3: Dedicated to Karl-Heinz Best on the Occasion of His 70th Birthday, number 13 in Studies in Quantitative Linguistics</article-title>
          , RAM - Verlag, Lüdenscheid,
          <year>2013</year>
          , pp.
          <fpage>206</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Mikros</surname>
          </string-name>
          ,
          <article-title>Blended authorship attribution: Unmasking elena ferrante combining diferent author profiling methods</article-title>
          , in: A.
          <string-name>
            <surname>Tuzzi</surname>
          </string-name>
          , &amp; MA Cortelazzo (
          <year>2018</year>
          ).
          <source>Drawing Elena Ferrante's Profile. Workshop Proceedings</source>
          , Padova,
          <year>2017</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Mikros</surname>
          </string-name>
          ,
          <article-title>Detection of ai-generated texts and quantitative analysis of large language model outputs</article-title>
          ,
          <source>in: Quantitative Linguistics Conference 2023 (QUALICO</source>
          <year>2023</year>
          ),
          <year>2023</year>
          . To be presented at the conference,
          <source>June 28-30</source>
          ,
          <year>2023</year>
          , Lausanne, Switzerland.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Cortelazzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Mikros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tuzzi</surname>
          </string-name>
          ,
          <article-title>Profiling elena ferrante: a look beyond novels</article-title>
          ,
          <source>in: JADT 2018: Proceedings of the 14th International Conference on Statistical Analysis of Textual Data</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Olsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Enevoldsen</surname>
          </string-name>
          ,
          <article-title>Textdescriptives: A python package for calculating a large variety of metrics from text</article-title>
          ,
          <source>Journal of Open Source Software</source>
          <volume>8</volume>
          (
          <year>2023</year>
          )
          <fpage>5153</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Pennebaker</surname>
          </string-name>
          ,
          <article-title>Writing about emotional experiences as a therapeutic process</article-title>
          ,
          <source>Psychological science 8</source>
          (
          <year>1997</year>
          )
          <fpage>162</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gaston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Dozier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Cothran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Arms-Chavez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Authorship attribution via evolutionary hybridization of sentiment analysis, liwc, and topic modeling features</article-title>
          ,
          <source>in: 2018 IEEE Symposium Series on Computational Intelligence (SSCI)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>933</fpage>
          -
          <lpage>940</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>T.</given-names>
            <surname>Isbister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <article-title>Gender classification with data independent features in multiple languages, in: 2017 European Intelligence</article-title>
          and Security Informatics Conference
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>