<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Human or Not? Light-Weight and Interpretable Detection of AI-Generated Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maximilian Seeliger</string-name>
          <email>maximilian.seeliger@tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Styll</string-name>
          <email>patrick.styll@tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moritz Staudinger</string-name>
          <email>moritz.staudinger@tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allan Hanbury</string-name>
          <email>allan.hanbury@tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TU Wien Informatics</institution>
          ,
          <addr-line>Favoritenstraße 9-11, 1040 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Text generated by Large Language Models (LLMs) is becoming less distinguishable from their human-written counterparts. Reliable detection of the diferences between the two is increasingly important to limit the spread of fake content, plagiarism and the manipulation of public opinion. We study the binary classification problem of distinguishing human-written from AI-generated text. We propose a two-step learning algorithm. In the first step, it calculates the correlation between the rows of the binary term-document matrix (TDM) and the binary labels associated with the documents. This step runs in (max + ) time, where  is the number of texts, max is the maximum text length, and  is the vocabulary size. In the second step, it uses these values to map any text to a sequence of correlations, which can be interpreted as a signal. This can be done in linear time () where  is the size of the text. Together with other statistical measurements, this signal serves as a feature for standard machine learning algorithms. Furthermore, we give a perspective on the interpretability of our proposed approach for global and local (instance-level) explanations. Our work demonstrates that while large language models like RoBERTa remain state-of-the-art in terms of raw accuracy for AI-text identification, our interpretable and computationally eficient approach ofers a competitive alternative, particularly in scenarios where interpretability is important. We evaluate our approach within the Voight-Kampf Generative AI Detection task, which is part of the PAN lab at CLEF 2025.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AI-Generated Text</kwd>
        <kwd>Explainability</kwd>
        <kwd>Signal Processing</kwd>
        <kwd>PAN 2025</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>To demonstrate the efectiveness of our approach, we use data from the Voight-Kampf Generative AI</p>
      <p>
        Detection challenge, which is part of the PAN lab at CLEF 2025 [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. The challenge is divided
into two tasks: (1) binary classification of texts as either human- or AI-generated, and (2) multi-class
classification estimating the degree of human or machine authorship in mixed-authorship texts. Each
task consists of an individual dataset.
      </p>
      <p>Our contributions include:
• A novel two-step learning algorithm that transforms text into a sequence of correlation values,
interpretable as a signal.
• A collection of global and local interpretations based on the output of the learning algorithm.
• A simple approach to use hand-crafted linguistic features together with correlation signals, fed into
a standard machine learning algorithm, to achieve competetive performance for distinguishing
human-written from AI-generated text.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Main Method</title>
      <sec id="sec-2-1">
        <title>2.1. Problem Setting</title>
        <p>We formally introduce the problem setting and propose the concept of correlation signals as well as a
simple way to use them for classification. We use binary term-document matrices and the Phi-coeficient
as fundamental building blocks to obtain a word-correlation value for each word. We map the words in
a given text to their respective word-correlation and call this sequence a correlation signal.
We study the problem of distinguishing human-written from AI-generated text in a supervised binary
classification setting. Let  be the instance space, containing all possible texts, and let  = {0, 1}
denote the binary label space, where label 0 represents human-written text and label 1 AI-generated
text. For training, we get a set of  labeled training instances {(, )}=1 ⊆  ×  and try to find a
function  :  →  that correctly classifies unseen instances.</p>
        <p>Let  = {1, 2, . . . , } be the set of texts from the training data. We consider each text  as a
sequence of word tokens (1, 2, . . . , ), resulting from tokenization (cf. Section 4.3), and expand
the notation of set inclusion to allow  ∈  to denote that the word  is contained at any position in
the text . We define the vocabulary of  as Vocab( ) = { |  ∈  for all  ∈  } and say that
 = |Vocab( )| is the number of words in the text corpus.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Correlation Signals</title>
        <p>We construct a binary term-document matrix B from  , where a row represents for a specific word the
inclusion relation to each text from the training dataset.</p>
        <p>Definition 1. A binary term-document matrix B ∈ {0, 1}×  indicates at position B, whether a
word  ∈ Vocab( ) is contained in document  for 1 ≤  ≤  and 1 ≤  ≤ :</p>
        <p>B, =
{︃1 if  ∈</p>
        <p>0 otherwise
Given the ’th row B,· ∈ {0, 1} and the label vector y = (1, 2, . . . , ), we are interested in
quantifying the predictive power that the occurrence of the word  has (i.e. which label is more likely,
after knowing that  occurs in the text). For this, we calculate the correlation between these two
vectors.
defined as
where
Definition 2.</p>
        <p>The Phi-coeficient</p>
        <p>
          [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (also known as Matthews correlation coeficient) is a special case
of the Pearson correlation coeficient for binary vectors. Given two binary vectors
x, y ∈ {0, 1} it is
 (x, y) = √︀¯(1 − ¯) · ¯(1 − ¯)
1 ∑︀=1  −
 ·
        </p>
        <p>¯¯
¯ =
 =1
1 ∑︁ , ¯ =
1 ∑︁ .
 =1
This leads to the definition of word-correlations. For a word , represented in the ’th row of the
term-document matrix, we denote its word-correlation with the function  () =  (B,· , y), where y
is the label vector. We further extend this notation to texts and say that text  = (1, 2, . . . , ) is
mapped to its correlation signal with</p>
        <p>︁(
 ( ) =  (1),  (2), . . . ,  ()
︁)
calculation of the Phi-coeficient for each word individually takes
For the given corpus  of size | | =  with a vocabulary of size |Vocab| = , let max be the length of
the longest text. We do preprocessing of the training corpus in (max + ) time. Constructing
the binary term-document matrix takes (max) time by reading through each text in (max) time
and updating entries in the matrix corresponding to occurring words in (1) time. The subsequent
() time and is done in cumulative
() time. The preprocessing results in a associative datastructure of size (), that maps each
word to its word-correlation. Given constant lookup in this datastructure (e.g. hash table), we only need
() time to construct a correlation signal for a query text  of size | | = .</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Classifier</title>
        <p>Given the mapping  from a text to its correlation signal, we define a classifier
 ( ) =
{︃1 if |1 | ∑︀
0 otherwise
∈ ( )  &gt; 
for a given parameter  . Intuitively, the average correlation signal acts as a soft decision boundary: if
a text contains more words that tend to appear in AI-generated texts, its average correlation will be
positive, and vice versa. The threshold  determines the decision boundary in this latent correlation
space. In practice the optimal decision threshold  is chosen to minimize classification error for the
given distribution of the training data (see Figure 1).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Interpretability</title>
      <p>ifnal predictions.</p>
      <sec id="sec-3-1">
        <title>3.1. Correlation Signals</title>
        <p>This section gives a perspective on the interpretability of the proposed approach. Correlation signals
are based on the word-correlations assigned to each individual word. This word level contribution
ofers ways to analyze the underlying model on a global and local (instance-level) scale to explain the
Globally, we can look at the magnitude of the correlations and see that AI models appear to avoid
specific words more (strong negative correlation, min∈Vocab( )  () = − 0.4849) than they seem to</p>
        <p>Human-written</p>
        <p>Threshold</p>
        <p>AI-generated
91.7% &lt;</p>
        <p>93.1% &gt;
0.08 0.05</p>
        <p>Signal Value
0.20
0.17
0.14
0.11
0.02
0.01
0.04
0.07
0.10
favor specific words (positive correlation, max∈Vocab( )  () = 0.3338). A list of tokens with the
largest/smallest correlation scores is given in Table 7 in Appendix B. Furthermore, interpreting text as a
correlation signal opens the door to more advanced analyses, such as spectral methods to investigate
global patterns and structural trends (see Appendix C).</p>
        <p>On the local scale, these scores can be used to explain individual instances, as the final output sum can
be traced back to the specific token-level contributions at each point in the sequence. Predictions are
constructed sequentially from the individual word-correlations in a text. This allows to pinpoint exactly
the word or sub-sentence structure that lead to either predicted class. Given an appropriate threshold  ,
we can see in Figure 2 how the models prediction changes from one class to the other as a result of
words with an opposing word-correlation occurring.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. -gram extension</title>
        <p>We generalize our approach to -grams by treating them the same as simple word tokens. We calculate
an -gram-correlation score analogous to word-correlations and build the final correlation signal as a
sequence of such -gram-correlations.</p>
        <p>Intuitively, we can capture more nuanced language interactions from the text by using -grams as they
capture local contextual dependencies. However, -grams for  &gt; 1 are sparse. There is a total of 56987
tokens contained in the text corpus of the training data. Only 0.3% of the tokens in the validation set
are not present during training. However, about 34% of the 2-grams and 84% of the 3-grams in the
validation set have not been seen during training. This leads to poor generalization to unseen data,
while the ability to find -gram-correlations that fit the training dataset improves with larger . (This
efect explains the reduced performance of the corsig-2gram and corsig-3gram runs in Table 3.)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Evaluation</title>
      <p>
        We evaluate the performance of correlation signal classifiers. There are two main objectives in our
experiments: (1) Determine the ability of our approach to generalize to new instances and (2) identify
if correlation signals contain additional predictive information, not contained in simple linguistic
measures. We will evaluate our approach on the dataset provided in the PAN Lab’s Voight-Kampf
Generative AI Detection challenge [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This challenge is split into two tasks. Task 1 consists of training
and validation data for the binary classification setting presented in Section 2.1. Task 2 is a variation
with 6 classes for diferent human-AI collaboration schemes (cf. Table 1).
      </p>
      <p>The experiments are implemented in Python and the code is available on GitHub1.</p>
      <sec id="sec-4-1">
        <title>4.1. Exploratory Data Analysis</title>
        <p>For both tasks of the Voight-Kampf Generative AI Detection challenge, separate datasets are provided.
As shown in Table 1, the class distributions in the training and validation sets are relatively balanced
for task 1. In contrast, task 2 shows significant imbalances, both across individual classes and between
the training and validation splits. Specifically, in the training set, classes 3–5 together account for
less than 10% of the data. This is even more prominent in the validation set, where classes 4–5
collectively represent only 1.01% of samples. The most significant inconsistency appears in class 3:
while it comprises just 3.72% of the training data, it dominates the validation set with 51.16%. Such
inconsistencies between training and validation distributions can severely impair controlled evaluation
of model performance, as they lead to incorrect representations of the target data distribution during
training.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baselines</title>
        <p>
          We introduce a simple baseline classifier that takes several hand-crafted features into account. For
task 1, simple classification based on the respective optimal threshold  of said features already achieves
a high performance that translates well from training to validation data (see Table 2). The features are
calculated separately for the train and validation set and then fed into any standard machine learning
algorithm (Random Forest, RF, in our case) to serve as a baseline. Additionally, we employ Facebook’s
RoBERTa base model [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] (roberta-base2 via Hugging Face) as a Language Model (LM) baseline
classifier, which has proven beneficial in previous studies [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We fine-tune RoBERTa using a maximum
input sequence length of 500 tokens, running for three epochs on a T4 GPU provided by Google Colab.
1https://github.com/max-seeli/steely
2https://huggingface.co/FacebookAI/roberta-base
The selected hyperparameters are based on default values and were chosen to establish a reasonable
initial baseline for comparison.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Data Preprocessing</title>
        <p>
          To prepare the input data for processing into correlation signals, we first use a word-tokenizer that
is sensitive to punctuation for the English language. Subsequently, we employ the Porter stemming
algorithm [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and remove English stopwords.
        </p>
        <p>For the RoBERTa baseline, we use the model specific tokenizer and do not further preprocess the inputs.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Task 1: Binary Classification</title>
        <p>For task 1, we analyze six systems and present the evaluation metrics in Table 3. We run the statistical
baseline with the name stats and the RoBERTa baseline as roberta. The systems corsig-&lt;n&gt;gram
for  ∈ {1, 2, 3} uses our main approach as presented in Section 2 as well as the extension to -grams
from Section 3.2. Finally, the system stats-corsig is an adaptation to the statistical baseline, that
uses the correlation signal |1 | ∑︀∈  () for each text  ∈  as an additional feature.
We can clearly see the negative efect -grams with  &gt; 1 have on the discriminative power of
correlation signals, as we witness a slight decline in the performance metrics from corsig-1gram
to corsig-2gram and a significantly more pronounced drop in performance when looking at
corsig-3gram. The reason for this behaviour is the sparsity of -grams as explained in Section 3.2.
Furthermore, system stats-corsig displays a substantial increase over the stats baseline. This
indicates that correlation signals contain statistical information, not available from simple linguistic
features. stats-corsig also shows that combined with correlation signals, a simple statistical baseline
is suficient for competetive performance levels to the roberta baseline.
0.957
0.894
0.984</p>
        <p>F1</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Task 2: Multi-Class Classification</title>
        <p>For task 2, it is important to note that we are no longer dealing with binary classification, but rather a
multi-class setting with six distinct classes. Consequently, our approach for creating correlation signals
via a binary label vector y and classifying the summed up signals via  , as introduced in Sections 2.2
and 2.3, no longer works. We define y = (1, 2, . . . , ) ∈ {0, 1, 2, 3, 4, 5} and build the correlation
signals according to  () =  (B,· , y). Instead of using a threshold  for classification, we use the
RF classifier as described in Section 4.2, both with and without normalization |1 | ∑︀∈  () for each
 ∈  .</p>
        <p>The results of our experiments on the validation-set can be seen in Table 4. The RoBERTa baseline
(roberta) clearly outperformed the RF classifiers, both with ( stats-corsig) and without (stats)
the correlation signal, which just slightly outperform guessing levels.</p>
        <p>We hypothesize that the lack of performance can be attributed to an inconsistent class distribution
between the training and validation sets, as described in Section 4.1. To verify this, we combined the
original training and validation data and performed a new stratified split. The results on the new
validation set confirm our assumptions, as we receive an F1-score of 96% via the RoBERTa baseline
(roberta-strat). Additionally, we can now we observe a clear performance gain when using the
correlation signal as a feature in the RF classifier ( stats-corsig-strat) compared to using baseline
features alone (stats-strat). Nonetheless, the RF classifier still underperforms relative to the LM
baseline, suggesting that our feature-based approach may be less efective for multi-class classification
tasks.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this work, we presented a lightweight and interpretable approach for distinguishing human-written
from AI-generated text. Our method leverages the statistical correlation between individual words
and class labels, encoding texts as correlation signals that can be processed eficiently and explained
both globally and locally. We demonstrated that this signal-based representation achieves strong
performance in the binary classification setting and adds complementary value when combined with
standard statistical features.</p>
      <p>In the multi-class classification setting, we observed that correlation signals alone may not capture
the full complexity of mixed-authorship scenarios. However, they still ofer predictive gains when
incorporated into classical models, provided that the data distribution is properly balanced. While
language models like RoBERTa remain state-of-the-art in terms of raw accuracy, our findings show
that interpretable, transparent, and computationally eficient methods can provide competitive
alternatives—particularly when interpretability is a key concern.</p>
      <p>In future work, we plan to introduce a relevance weight (e.g. tf-idf) for each word to calculate a weighted
correlation signal, ensuring that more significant words impact the overall signal more. When removing
stopwords, we already saw a performance improvement, which indicates that less relevant terms
primarily add noise, hindering the prediction. Future work also includes extending correlation-based
features to more fine-grained signals over richer linguistic representations (e.g., syntactic or semantic
structures), and exploring hybrid models that combine the interpretability of correlation signals with
the expressiveness of neural networks.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Declaration on Generative AI</title>
      <p>During the preparation of this work, we used ChatGPT to paraphrase and reword. After using this
service, we reviewed and edited the content as needed and take full responsibility for the publication’s
content.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Further Results</title>
      <p>The PAN Lab’s challenge organizers evaluated the submitted models from Task 1 on additional datasets.
The test-set is a previously unknown part of the original dataset for competition purposes and the
Eloquent dataset comes from a related competition, where participants are asked to generate text, such
that it is indistinguishable from human text. We present the results in Tables 5 and 6.
0.916
0.933
0.575</p>
      <p>F1</p>
    </sec>
    <sec id="sec-8">
      <title>B. Significant Word-Correlations</title>
    </sec>
    <sec id="sec-9">
      <title>C. Spectral Analysis of Correlation-Signals</title>
      <p>Since we are looking at texts in the form of signals (see Section 2.2), we hypothesize that there are
certain structural diferences between human-written and AI-generated texts that can be uncovered by
analyzing their frequency components. Specifically, let  ( ) denote the real-valued correlation signal
of the word at position  of text  . We interpret  ( ) as a discrete-time process, which encodes some
sort of evidence towards AI- or human-authorship. Our goal is to examine the power spectral density
(PSD) for a text  via the periodogram  , which serves as a basic estimator for the PSD.  is defined
as
where  ∈ {0, 1, . . . ,  − 1} is the discrete frequency index and  is the length of document  .</p>
      <p>
        To conduct spectral analysis, will will use Welch’s method, which segments the signal into overlapping
windows, applies a tapering function and finally averages the resulting periodograms. This method,
however, assumes stationarity of the signal, which means that the mean and variance do not change
over time; this is non-trivial for natural language. Similarly to [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], we applied the Augmented
DickeyFuller (ADF) test [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to examine this property. Our null hypothesis 0 of the ADF test is non-stationarity,
meaning that  &lt; .05 test results would reject 0 and hence accept the alternative hypothesis of
stationarity in the signals. For the training set of task 1, we see that 99.92% of texts accept 1, which is
also why we assume that Welch’s method can be applied to this kind of correlation signal. An example
of a resulting PSD for an AI-generated text can be seen in Figure 3.
      </p>
      <p>After calculating  for all  ∈  , we average the values of these periodograms within each individual
class; Figure 4 shows that there are indeed distinct diferences in the mean power density spectra of the
correlation scores.</p>
      <p>For task 1, we can see that both classes have a peak in the low-frequency range, which means that
occurring patterns change slowly across the texts. In our context, this would indicate that the correlation
scores remain mostly positive or negative over many words. This aligns with our expectation that
human- and machine-authored segments typically span full sentences or paragraphs rather than just
single words.</p>
      <p>We see a similar trend in task 2. There are two large low-frequency peaks for human-initiated and
machine-continued text as well as deeply-mixed texts, suggesting that machine- and human-authored
parts are interleaved on the sentence- or paragraph-level. As expected, such a peak does not exist
for fully human-written texts. Interestingly, we can see a minor peak at higher frequencies for the
machine-written, then human-edited category. This could indicate that human editors made small local
changes, such as modifying individual words or short phrases, rather than rewriting entire segments.
Such finer-grained edits introduce higher-frequency peaks in the correlation signal.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Sarvazyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Ángel</given-names>
            <surname>González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franco-Salvador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Overview of autextification at iberlef 2023:
          <article-title>Detection and attribution of machine-generated text in multiple domains</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2309.11285. arXiv:
          <volume>2309</volume>
          .
          <fpage>11285</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cabanac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Labbé</surname>
          </string-name>
          ,
          <article-title>Prevalence of nonsensical algorithmically generated papers in the scientific literature</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>72</volume>
          (
          <year>2021</year>
          )
          <fpage>1461</fpage>
          -
          <lpage>1476</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.24495.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shamsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <article-title>Cross-domain detection of GPT-2- generated technical text</article-title>
          , in: M.
          <string-name>
            <surname>Carpuat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. de Marnefe</surname>
            ,
            <given-names>I. V.</given-names>
          </string-name>
          <string-name>
            <surname>Meza Ruiz</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the</source>
          <year>2022</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>1213</fpage>
          -
          <lpage>1233</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .naacl-main.
          <volume>88</volume>
          /. doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2022</year>
          .naacl-main.
          <volume>88</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wadden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Zuylen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <article-title>Fact or fiction: Verifying scientific claims</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>7534</fpage>
          -
          <lpage>7550</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          . emnlp-main.
          <volume>609</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>609</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Strobelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , GLTR:
          <article-title>Statistical detection and visualization of generated text</article-title>
          , in: M. R. Costa-jussà, E. Alfonseca (Eds.),
          <article-title>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>116</lpage>
          . URL: https://aclanthology.org/P19-3019/. doi:
          <volume>10</volume>
          . 18653/v1/
          <fpage>P19</fpage>
          -3019.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khazatsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Finn</surname>
          </string-name>
          , Detectgpt:
          <article-title>Zero-shot machinegenerated text detection using probability curvature</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2301.11305. arXiv:
          <volume>2301</volume>
          .
          <fpage>11305</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Fast-detectgpt:
          <article-title>Eficient zero-shot detection of machinegenerated text via conditional probability curvature</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2310.05130. arXiv:
          <volume>2310</volume>
          .
          <fpage>05130</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Detecting subtle diferences between human and model languages using spectrum of relative likelihood</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2406.19874. arXiv:
          <volume>2406</volume>
          .
          <fpage>19874</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Face: Evaluating natural language generation with fourier analysis of cross-entropy</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.10307. arXiv:
          <volume>2305</volume>
          .
          <fpage>10307</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsivgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Elozeiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “VoightKampf” Generative AI Authorship Verification Task at PAN</article-title>
          and
          <article-title>ELOQUENT 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Continuous Integration for Reproducible Shared Tasks with TIRA.io</article-title>
          ,
          <source>in: Advances in Information Retrieval. 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B. W.</given-names>
            <surname>Matthews</surname>
          </string-name>
          ,
          <article-title>Comparison of the predicted and observed secondary structure of t4 phage lysozyme</article-title>
          ,
          <source>Biochimica et Biophysica Acta (BBA)-Protein Structure</source>
          <volume>405</volume>
          (
          <year>1975</year>
          )
          <fpage>442</fpage>
          -
          <lpage>451</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          , CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1907</year>
          .11692. arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Porter</surname>
          </string-name>
          ,
          <article-title>An algorithm for sufix stripping</article-title>
          ,
          <source>Program</source>
          <volume>14</volume>
          (
          <year>1980</year>
          )
          <fpage>130</fpage>
          -
          <lpage>137</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dickey</surname>
          </string-name>
          , W. Fuller,
          <article-title>Distribution of the estimators for autoregressive time series with a unit root, JASA</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>74</volume>
          (
          <year>1979</year>
          ). doi:
          <volume>10</volume>
          .2307/2286348.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>