<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Texts⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kateryna Antipova</string-name>
          <email>antipova.katerina@chmnu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hlib Horban</string-name>
          <email>hlib.horban@chmnu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Technology and Implementation, IT&amp;I-2025</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Petro Mohyla Black Sea National University</institution>
          ,
          <addr-line>10, 68-Desantnykyv St., Mykolaiv, 54003</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <fpage>51</fpage>
      <lpage>60</lpage>
      <abstract>
        <p>The rapid development of large language models has raised serious concerns about the reliability of detecting content created by artificial intelligence. This article compares the stylistic metrics of texts generated using a multimodal model and an autoregressive model. The results show that the generated text is very similar to human-written text in terms of lexical diversity and semantic coherence. In terms of perplexity and burstiness, the rephrased texts are practically indistinguishable from the original humanwritten texts, which leads to a high level of false negatives in autoregressive detectors. Our analysis highlights the need for new detection methods and suggests further directions, including more specific stylometric signatures. Relying solely on a single stylometric metric leads to unreliable differentiation between generated and human-written text.</p>
      </abstract>
      <kwd-group>
        <kwd>academic abstracts</kwd>
        <kwd>ai-generated texts</kwd>
        <kwd>detectors</kwd>
        <kwd>large language models</kwd>
        <kwd>natural language processing</kwd>
        <kwd>stylometric analysis 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rapid development of large language models (LLMs) has brought about a new era in text
generation, opening up a wide range of applications, from automatic content creation to dialogue
agents. However, this progress is causing concern in fields that have traditionally relied on
humangenerated text, and raises important ethical questions. In modern implementations of AI-generated
text, it is not uncommon to simply request an essay and then literally copy the result. Fortunately,
there are several tools available to assess the likelihood that a text was created using AI. These
detection methods mainly target the results of traditional autoregressive models (ARMs).</p>
      <p>Most existing AI-based text recognition tools, such as DetectGPT, GPTZero, and RADAR, are
designed and tuned to detect the output of autoregressive architectures (GPT-4, LLaMA, etc.). They
use sequential token prediction artifacts, such as local peaks in logarithmic probability and
perplexity profiles, to distinguish machine-generated text. This reliance on ARM signatures raises
the question of whether these detectors can also identify AI-generated content that was created by
models with other generation mechanisms. For our research, we focused on multimodal LLMs
(MLLMs).</p>
      <p>MLLMs are trained on datasets that pair images with descriptive captions or user/assistant
interactions. MLLMs learn to describe pictures and answer questions about visual content, which
often demands conciseness and step-by-step logic. When applied to a purely text prompt, the
model may avoid complex or long clauses, use simpler syntax, and gravitate towards more
template-like responses.</p>
      <p>Despite the growing number of detection methods, there has been no systematic comparison of
detectability across ARM and MLLM. To address this gap, we generated samples from ARM and
MLLM for two tasks, rephrasing and text generation. Instead of testing the performance of existing
detectors, we calculated chosen</p>
      <p>metrics: perplexity, burstiness, lexical diversity, semantic
coherence, BLEU and ROUGE scores. These stylometric and linguistic metrics were used to assess
how distinguishable the different types of generated texts are in practice.</p>
      <p>In this work, we:
•
•
•
generation tasks).
and linguistic metrics.</p>
      <p>from a MLLM.</p>
    </sec>
    <sec id="sec-2">
      <title>2. AI-text Detection</title>
      <sec id="sec-2-1">
        <title>2.1. Detection Methods</title>
        <p>
          Introduce a new dataset of 2,000 samples (500 ARM and 500 MLLM for both rephrasing and
Compare MLLM- vs. ARM-style outputs and human-written text based on the stylometric
Use these metrics to measure detection effectiveness, discussing implications for the
performance of current autoregressive-focused detection tools when faced with outputs
Supervised learning approach of fine-tuning the language model with or without adding a
classification module was used by OpenAI for their RoBERTa-based classifier [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This approach
entails fine-tuning language models on a mixture of human-authored and LLM-generated texts,
enabling the implicit capture of textual distinctions. Despite the strong performance, obtaining
annotations for detection data can be challenging in real-world applications, making the supervised
paradigms inapplicable in some cases. While deep learning approaches often yield superior
detection outcomes, their black-box nature restricts interpretability [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          DetectGPT [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is a zero-shot detection method that does not require training a separate
classifier on human or AI
that
machineprobability estimates. The main idea is that AI-generated sequences leave a characteristic
«signature» in the probability space of the specific model that generated them. DetectGPT assumes
probability function. Simply put, the model assigns higher probability to the text it generated than
to adjacent alternative fragments. Based on this hypothesis, DetectGPT transforms the input text
using a mask-filling language model. It then detects the AI-text by comparing the probabilities of
the text and its filled-in variants. Minor changes in human-written text, such as rephrasing or word
substitutions, have virtually no impact on the logarithmic probability of the model. Existing
zeroshot detectors rely mainly on statistical features and use pre-trained large language models to
gather them.
        </p>
        <p>DetectGPT treats a candidate text x and a set of perturbed variants { ̃ }, and computes
(1)
∑ 
 =1</p>
        <p>A large positive D(x) means that x is a sharp log-probability peak in model M
x was probably generated by model M.</p>
        <p>
          DetectGPT and FastDetectGPT [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] are earlier examples of perplexity-based methods which look
at the local curvature in probability space around a given example. Binoculars [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is an even more
effective recent approach which uses the cross-perplexity between two different LLMs as a signal
that text is LLM-generated.
        </p>
        <p>
          In contrast, GPTZero [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] uses a trained classifier that relies on perplexity and burstiness:
•
•
        </p>
        <p>Perplexity analysis. GPTZero computes sentence-level perplexities to gauge how
predictable each sentence is to a language model.</p>
        <p>Burstiness analysis. It measures how much perplexities fluctuate sentence to sentence.</p>
        <p>GPTZero flags AI text by combining low average perplexity under a reference model and low
burstiness (i.e. consistently uniform sentence perplexities). Thresholds on these statistics are tuned
to maximize separation between human-written and generated samples.</p>
        <p>
          Ghostbuster [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] feeds LLM-generated texts into a series of weaker LLMs (from unigram models
to unadjusted GPT-3 davinci) to obtain token probabilities, and then conducts a structured search
on the combinations of these model outputs and trains a linear classifier to distinguish text
generated by LLM. This detector achieves an average F1 score of 99.0, which is an increase of 41.6
F1 score over GPTZero and DetectGPT.
        </p>
        <p>
          Unlike traditional binary classification tasks, stylometry-based approaches focus on
distinguishing between the writing styles of different authors. Each AI model has its own
stylometric signature, and identifying these different styles proves to be more effective than simple
binary classification tasks. DeTeCtive [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is a multi-task and multi-level platform for contrastive
learning that achieves excellent results in detecting AI-generated texts both within and outside of
distribution scenarios. It also introduces a novel feature called «training-free incremental
adaptation», which allows adaptation to new data without retraining. Shah et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] propose a
novel approach that combines features such as lexical diversity, readability metrics, and semantic
distribution with machine learning models for classification.
        </p>
        <p>As AI models continue to evolve, the detectors themselves must also adapt to maintain high
levels of performance and accuracy. Adversarial methods have been developed to intentionally
alter the output of LLMs to evade detection. These methods can include changes in phrasing,
structure, or the introduction of artificial noise that confounds detection tools.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Evading Detection</title>
        <p>
          Much of the literature has also focused on whether or not AI-generated text can be detected at all
[10]. Different techniques to attack or evade detectors have been developed and are an active area
of research. Evasion techniques such as word or sentence substitution, recursive paraphrasing, and
prompting have been developed to point out the failures in detectors [
          <xref ref-type="bibr" rid="ref10">11</xref>
          ].
        </p>
        <p>
          A group of researchers [
          <xref ref-type="bibr" rid="ref11">12</xref>
          ] devised a framework to rank LLMs based on their detectability,
claiming that more recent models like GPT-4 are less detectable because perplexity and burstiness
are less useful evidence markers. The authors of [
          <xref ref-type="bibr" rid="ref12">13</xref>
          ] discuss the critical limitations of existing
detectors, including issues related to real-world data issues, potential attacks, and the lack of an
effective evaluation framework.
        </p>
        <p>
          In addition, other studies have examined methods of attacking AI detectors, as well as other
ways to circumvent or avoid AI detection. Sadasivan et al. [
          <xref ref-type="bibr" rid="ref13">14</xref>
          ] showed that AI text detectors can
be fooled by paraphrasing attacks. The basic principle is to apply a lightweight paraphrase model
on LLMs' outputs and change the distribution of lexical and syntactic features of the text to confuse
the detector. Simple rephrasing techniques are sufficient to evade early zero-shot detectors and
trained detectors, but recursive rephrasing is necessary to effectively evade more reliable detectors.
To this end, Krishna et al. [
          <xref ref-type="bibr" rid="ref14">15</xref>
          ] proposed DIPPER, a powerful T5-based paraphrasing model that
significantly enhances the effectiveness of such attacks.
        </p>
        <p>
          RADAR [
          <xref ref-type="bibr" rid="ref15">16</xref>
          ] is a detector based on RoBERTa-large and trained using an adversarial learning
model. In this model, a paraphraser is designed to rephrase machine-generated text and mimic
paraphrase attacks. The RADAR framework incrementally refines the paraphrase model, drawing
on feedback garnered from the detector and employing the proximal policy optimization algorithm,
outperforming zero-shot detection methods including DetectGPT and OpenAI detector.
        </p>
        <p>
          AI humanizer, also known as a paraphrasing tool or «text humanization» tool, is used to rewrite
the AI's output data multiple times in order to imitate the characteristics of human writing style.
The authors of [
          <xref ref-type="bibr" rid="ref16">17</xref>
          ] evaluated 19 popular humanizer tools (e.g., Undetectable AI, WriteHuman,
StealthWriter) and found that many state-of-the-art detectors fail to flag humanized AI text in over
80% of cases (e.g. only 15 20% detection rates). Simple paraphrasing loops restore perplexity and
burstiness to a human-like level and effectively neutralize autoregressive detectors.
        </p>
        <p>
          Detection tools are also shown to be unable to cope with texts translated from other languages.
According to a report released by OpenAI, their AI-text detector is not fully reliable on that front
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In the reported evaluation of some challenging cases for English texts, their classifier only
correctly identifies 26% of generated texts while incorrectly classifying 9% of human-written texts.
The authors of [
          <xref ref-type="bibr" rid="ref17">18</xref>
          ] study the effect on AI detectors of translating AI-generated text through
multiple languages before translating it back into English and find some methods significantly
more robust than others.
        </p>
        <p>
          The accuracy and reliability of AI-generated text detection tools can vary depending on several
factors, such as the specific tool used, the type of AI model generating the text, and the content
being analyzed. Most of the detection tools achieve a 70-80% accuracy rate in detecting text
generated by models like GPT-3. Detectors also struggle with short text paragraphs and with more
advanced outputs from later-generation models like GPT-4 [
          <xref ref-type="bibr" rid="ref10">11</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Multimodal LLMs</title>
      <p>
        The trend toward integrating multiple modalities into architectures is becoming increasingly
widespread and is leading to the emergence of multimodal large language models (MLLMs).
Multimodal generation represents the pinnacle of achievements in individual modalities and
integrates text, images, video, and audio into context-aware outputs. For example, tasks such as
text-to-image, text-to-video, and text-to-speech represent multimodal systems that go beyond pure
text generation and use text prompts to control the generation of visual content [
        <xref ref-type="bibr" rid="ref18">19</xref>
        ].
      </p>
      <p>The architecture is typically modular or monolithic. However, most existing MLLMs use a
modular architecture in which visual encoding and language decoding are processed separately.
This approach is typically realized by combining a pre-trained visual encoder (e.g., a CLIP-based
ViT) with a LLM [20].</p>
      <p>These models differ from traditional text-only LLMs not only in architecture but also in the
diversity of their training data, which includes image-text pairs, visual reasoning tasks, and
crossmodal alignments. Broader training scope introduces new challenges for detectors: while
traditional detectors focus on linguistic features, MLLM-generated text may exhibit distinct
stylistic patterns influenced by multimodal conditioning, making detection strategies based solely
on text stylometry less reliable.</p>
      <p>Both commercial models, such as GPT-4o and the Gemini series, and open-source ones, such as
BLIP [21] and LLaVA [22], have been actively working on combining image and language
modalities. They often link LLMs with large vision models (LVMs) through intermediate layers.
Recent open-source frameworks demonstrate the efficacy of modular designs. Through large-scale
multimodal pre-training and advanced visual-language alignment techniques, they achieve
outcomes on par with leading commercial models.</p>
      <p>Despite their multimodal capabilities, MLLMs can perform pure text generation tasks,
functioning similarly to autoregressive LLMs. During inference without visual inputs, the language
component processes text prompts and generates continuations based on learned distributions.
However, their stylistic tendencies often differ from text-only LLMs because of exposure to
imagecaption datasets and conversational multimodal instructions during training. This bias can
manifest as shorter, more descriptive sentences, preference for concrete nouns, and a more
directive or explanatory tone. Additionally, MLLMs often rely on special tokens or structured
prompts to manage dialogue or multimodal context, which influences their default response
format.</p>
      <p>The text generation process in MLLMs, as in LLMs, depends on sampling methods that control
diversity and determinism. In multimodal contexts, sampling occurs after modality fusion or token
alignment, so the language model conditions its predictions on both text and any embedded visual
features. For deterministic tasks such as text generation, non-sampling settings are typically
preferred to ensure consistency and minimize stylistic variance. It affects not only output fluency
but also the detectability of generated text, as different decoding strategies produce different
stylometric fingerprints.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Metrics</title>
      <p>Stylistic features primarily focus on the frequency of words that specifically highlight the stylistic
elements of the text, including the frequency of capitalized words, proper nouns, verbs, past tense
words, stopwords, technical words, quotes, and punctuation. Complexity features are extracted to
represent the complexity of the text, such as the type-token ratio and textual lexical diversity.
Psychological features are generally related to sentiment analysis and can be derived based on
existing tools to calculate sentiment scores, or extracted using sentiment classifiers.</p>
      <p>To quantify stylistic and statistical differences among original, ARM-generated, and
MLLMgenerated texts, we compute the following metrics.</p>
      <p>Perplexity, to measure how predictable a text is to a strong ARM. For a model M the perplexity
   ( 1: ) = 
( −</p>
      <p>(  | 1: −1))

1

Burstiness, which refers to how unevenly or clustered certain words or other features appear
in a text. The variance of sentence-level perplexities:

( ) = 
{   ( )| ∈ 
( )}</p>
      <p>Human writing often shows higher burstiness than machine-generated texts; generated outputs
tend to be more uniform. Autoregressive models show less burstiness especially if they are trained
to avoid repetition. Burstiness can be measured by the following:
•
•
•
variance-to-mean ratio (index of dispersion) for word frequency across segments of a text;
statistical indicators of deviation from a uniform distribution;
temporal autocorrelation in sequential token occurrence.</p>
      <sec id="sec-4-1">
        <title>Lexical diversity, a type-token ratio. Semantic consistency, average cosine similarity between adjacent sentence embeddings: of text x1:N is</title>
        <p>human text.
(2)
(3)
(4)
ROUGE-1, refers to the overlap of unigrams between the model and reference sequences.</p>
        <p>ROUGE-L, is based on the longest common subsequence. It considers sentence-level structure
similarity and identifies longest co-occurring in sequence n-grams.</p>
        <p>These stylometric and linguistic metrics reveal both surface-level and deeper linguistic patterns.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Part</title>
      <p>For our experiments we used the ArXiv Paper Abstracts dataset [23], which comprises articles'
titles, and abstracts. We randomly sample 500 title-abstract pairs from this corpus. We define two
tasks for each selected pair:</p>
      <p>Rephrasing task. We feed the original abstract to each model with the prompt:
Rephrase the following academic abstract: {original abstract}. Provide only the rephrased abstract.</p>
      <p>Write an academic abstract for a paper titled: {title}. Provide only the abstract.</p>
      <p>We compare text variants per each pair (original, ARM output, MLLM output) for both tasks, for
a total of 2 × 3 × 500 = 3000 samples.</p>
      <p>ARM baseline: We use the Microsoft model Phi-3-Mini-4K [24] via HuggingFace Transformers.
At inference we apply deterministic sampling with the following settings:</p>
      <p>temperature = 0.0, top_p = 1.0, max_new_tokens = 128, do_sample = False.</p>
      <p>MLLM baseline: We use the IDEFICS-9B model [25] with the same settings and a dummy image
the model requires.</p>
      <p>Perplexity was calculated using pretrained gpt2 model, the smallest version of GPT-2 with 124M
parameters [26]. Semantic consistency was calculated using all-MiniLM-L6-v2, a pretrained
Sentence Transformers model with over 22M parameters [27].</p>
      <p>All runs are performed on a Google Colab A100 GPU.</p>
      <p>ARM
MLLM</p>
      <p>To give a better overview the distributions are illustrated in Figures 1-3.</p>
      <p>In the rephrase task ARM paraphrases at T = 0 achieve perplexities virtually indistinguishable
from human originals, whereas MLLM outputs yield burstiness only slightly lower than human
originals, which makes its outputs stealthier. Therefore, detectors that report low perplexity miss
both models whose deterministic samples fall within the human range.</p>
      <p>In the text generation task, both models used T = 0 again, yielding much lower perplexity and
burstiness than human-written abstracts, which makes both models quite predictable. Although,
the models' outputs can still trick many detectors, that are tuned only to perplexity and
sentencelength variability.</p>
      <p>Setting the temperature to zero emphasizes the characteristic features of each model's style and
at the same time ensures that the output results are completely deterministic and predictable
compared to texts whose samples were selected at higher temperatures. Since our analysis is
limited to abstract-length sentences, we may miss stylometric cues that are only found in longer
documents.</p>
      <p>Main strengths and weaknesses of the two models are listed below.</p>
      <sec id="sec-5-1">
        <title>High perplexity in the rephrasing task.</title>
        <p>High lexical diversity and semantic coherence in both tasks support novel wording and
stylistic variety.</p>
        <p>In text generation, perplexity is low which makes the results obvious to autoregressive
detectors.
•
•
•
•</p>
        <p>In rephrasing, perplexity and burstiness remain within human range.</p>
        <p>High BLEU/ROUGE scores preserve source wording.</p>
        <p>High lexical diversity and semantic coherence is demonstrated in both tasks.</p>
        <p>Low perplexity and burstiness in the text generation task.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this work, a comprehensive stylometric analysis was conducted to assess the
detectionperformance of AI-generated abstracts, comparing the results from an autoregressive model, a
multimodal model, and original human-written texts. All model outputs were generated with a
Deterministic approach reduces perplexity, but makes autoregressive detectors over-confident.
Future studies should examine higher-temperature samples to assess its effect on detectability.</p>
      <p>The obtained results show that reliance on a single metric, such as a fixed perplexity threshold,
is insufficient for robust AI-text detection. Detection pipelines should combine multiple stylometric
signals (perplexity, burstiness, lexical diversity, etc.) to improve sensitivity to both ARM and
MLLM outputs. These results indicate the need for next-generation detection. Future research will
expand the dataset to other model families and explore other linguistic and semantic metrics.</p>
      <p>Future research will extend the analysis from articles abstracts to full-text in order to
investigate how stylometric metrics evolve with document length and topic. In addition,
attackand-defense cycles will be studied to evaluate the resistance of detectors to adversarial attacks.
Adaptation to novel humanization tools will enhance detectors robustness.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Phi-3-Mini-4K and IDEFICS-9B models to
generate datasets for subsequent linguistic analysis. The authors used pretrained gpt2 model to
calculate perplexity in text. The authors used all-MiniLM-L6-v2 model to calculate semantic
consistency in text.
[20] Z. Chen, W. Wang, H. Tian, S. Ye, Z. Gao, How far are we to gpt-4v? closing the gap to
commercial multimodal models with open-source suites, in: Science China Information
Sciences, vol. 67(12), 2024. doi:10.48550/arXiv.2404.16821.
[21] J. Li, D. Li, S. Savarese, S. Hoi, BLIP-2: Bootstrapping language-image pre-training with frozen
image encoders and large language models, in: International conference on machine learning,
2023, pp. 19730-19742. doi:10.48550/arXiv.2301.12597.
[22] H. Liu, C. Li, Y. Li, B. Li, Y. Zhang, Llavanext: Improved reasoning, ocr, and world knowledge,
2024. URL: https://llava-vl.github.io/blog/2024-01-30-llava-next
[23] Kaggle, arXiv paper abstract dataset for building multi-label text classifiers. URL:
https://www.kaggle.com/datasets/spsayakpaul/arxiv-paper-abstracts
[24] M. Abdin, S.A. Jacobs, A.A. Awan, J. Aneja, A. Awadallah, Phi-3 Technical Report: A Highly</p>
      <p>Capable Language Model Locally on Your Phone, 2024. doi:10.48550/arXiv.2404.14219
[25] HuggingFace, IDEFICS, 2023. URL: https://huggingface.co/HuggingFaceM4/idefics-9b-instruct
[26] HuggingFace, GPT-2, 2019. URL: https://huggingface.co/openai-community/gpt2
[27] Sbert.net, Sentence Transformers Documentation, 2025. URL:
https://www.sbert.net/index.html</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] New AI classifier for indicating AI-written text</article-title>
          ,
          <year>2023</year>
          . URL: https://openai.com/blog/new-aiclassifier
          <article-title>-for-indicating-ai-written-text</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>The Science of Detecting LLM-Generated Text</article-title>
          ,
          <source>in: Communications of the ACM</source>
          , vol.
          <volume>67</volume>
          ,
          <year>2023</year>
          , pp
          <fpage>50</fpage>
          -
          <lpage>59</lpage>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2303.07205.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khazatsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , C. Finn,
          <article-title>DetectGPT: Zero-Shot MachineGenerated Text Detection using Probability Curvature</article-title>
          ,
          <source>in: International Conference on Machine Learning</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2301.11305.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          Fast-detectgpt:
          <article-title>Efficient zero-shot detection of machine-generated text via conditional probability curvature</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2310.05130.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schwarzschild</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cherepanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kazemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <article-title>Spotting LLMs with Binoculars: Zero-shot detection of machine-generated text</article-title>
          ,
          <source>in: International Conference on Machine Learning</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2401.12070.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] AI Detector the Original AI Checker for ChatGPT</article-title>
          &amp; More, 2023 URL: https://gptzero.me/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fleisig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tomlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <article-title>Ghostbuster: Detecting Text Ghostwritten by Large Language Models</article-title>
          , in: North American Chapter of the Association for Computational Linguistics,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2305.15047.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Guo</surname>
          </string-name>
          , Sh. Zhang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Feng,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          , Ch. Ma, Detective:
          <article-title>Detecting aigenerated text via multi-level contrastive learning</article-title>
          ,
          <source>in: Neural Information Processing Systems</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2410.20964.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ranka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Dedhia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Muni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <article-title>Detecting and unmasking aigenerated texts through explainable artificial intelligence using stylistic features</article-title>
          , in:
          <source>International Journal of Advanced Computer Science and Applications</source>
          , vol.
          <volume>14</volume>
          (
          <issue>10</issue>
          ),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Antipova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Horban</surname>
          </string-name>
          ,
          <article-title>Improving detection of AI-generated text in education, in: Directions for the development of science in the context of global transformations</article-title>
          , Baltija Publishing, Riga, Latvia,
          <year>2025</year>
          , pp.
          <fpage>1</fpage>
          <lpage>19</lpage>
          . doi:
          <volume>10</volume>
          .30525/
          <fpage>978</fpage>
          -9934-26-562-4-1.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M. T.</given-names>
            <surname>Tonmoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Zaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gautam</surname>
          </string-name>
          , T. Kumar,
          <article-title>Counter Turing Test (CT2): AI-generated text detection is not as easy as you may think introducing AI detectability index (ADI</article-title>
          ) in
          <source>: Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2206</fpage>
          <lpage>2239</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>136</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <surname>J. Wu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Zhan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>L. S.</given-names>
          </string-name>
          <string-name>
            <surname>Chao</surname>
            ,
            <given-names>D. F.</given-names>
          </string-name>
          <string-name>
            <surname>Wong</surname>
          </string-name>
          ,
          <article-title>A survey on LLM-generated text detection: Necessity, methods, and future directions</article-title>
          ,
          <source>in: Computational Linguistics</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>65</lpage>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2310.14724.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Sadasivan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Balasubramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feizi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Can</surname>
            <given-names>AI</given-names>
          </string-name>
          <source>-generated text be reliably detected?</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2303.11156.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karpinska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wieting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iyyer</surname>
          </string-name>
          ,
          <article-title>Paraphrasing evades detectors of AIgenerated text, but retrieval is an effective defense</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , vol.
          <volume>36</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>27469</fpage>
          -
          <lpage>27500</lpage>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2303.13408.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chen</surname>
          </string-name>
          , T. Ho, RADAR:
          <string-name>
            <surname>Robust</surname>
            <given-names>AI</given-names>
          </string-name>
          -Text
          <source>Detection via Adversarial Learning</source>
          ,
          <year>2023</year>
          . doi.org/10.48550/arXiv.2307.03838.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Masrour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Emi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spero</surname>
          </string-name>
          , Damage:
          <article-title>Detecting adversarially modified AI generated text</article-title>
          ,
          <year>2025</year>
          . doi.org/10.48550/arXiv.2501.03437.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ayoobi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Knab</surname>
          </string-name>
          , W. Cheng, D. Pantoja,
          <string-name>
            <given-names>H.</given-names>
            <surname>Alikhani</surname>
          </string-name>
          , Esperanto:
          <article-title>Evaluating synthesized phrases to enhance robustness in AI detection for text origination</article-title>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2409.14285.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , R. He,
          <string-name>
            <surname>Survey on AI-Generated Media Detection: From Non-MLLM to</surname>
            <given-names>MLLM</given-names>
          </string-name>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2502.05240.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>