<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bi-Directional Cross-Entropy Loss and Stylometric Feature Combined Classifier</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yitao Sun</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Svetlana Afanaseva</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Stowe</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kailash Patil</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Pindrop</institution>
          ,
          <addr-line>Atlanta</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pindrop</institution>
          ,
          <addr-line>Chicago</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Pindrop</institution>
          ,
          <addr-line>New York</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Pindrop</institution>
          ,
          <addr-line>Seattle</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the context of the PAN 2025 Voight-Kampf Generative AI Detection Task, Subtask 1[ 1], we present a hybrid method that leverages BiScope's bi-directional cross-entropy loss[2] alongside a suite of stylometric features to enhance detection performance. BiScope captures perplexity asymmetries between forward and backward language modeling, revealing latent inconsistencies characteristic of generated content. To complement this, we extract stylometric features-covering lexical diversity, syntactic complexity, and structural idiosyncrasies. Empirical results on the PAN 2025 benchmark datasets demonstrate that this integrated framework is a strong contender for efective generative AI detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2025</kwd>
        <kwd>Voight-Kampf AI Detection Sensitivity Task</kwd>
        <kwd>AI-generated text detection</kwd>
        <kwd>Bidirectional cross-entropy loss</kwd>
        <kwd>Stylometric analysis</kwd>
        <kwd>Feature fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rise of large language models (LLMs) has made machine-generated text nearly indistinguishable
from human writing, creating a pressing need for reliable detection methods. This challenge is central to
the PAN 2025 Voight-Kampf Generative AI Detection Task, Subtask 1 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which focuses on identifying
AI-generated content from a single text segment.
      </p>
      <p>
        In response, we propose a hybrid detection framework that combines BiScope’s bi-directional
crossentropy loss[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] with a rich set of stylometric features[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. BiScope captures asymmetries in token
predictability from both forward and backward language models, revealing distributional irregularities
often present in generated text. While efective, this approach alone may miss deeper stylistic cues that
characterize human authorship.
      </p>
      <p>To enhance detection accuracy, we integrate stylometric features—including lexical richness, syntactic
patterns, and punctuation usage—that reflect consistent writing habits. This combination of
lowlevel probabilistic signals and high-level stylistic markers provides a more holistic representation of
authorship.</p>
      <p>Our method is model-agnostic and domain-flexible. Experiments on the PAN 2025 dataset
demonstrate that this dual-modality approach outperforms single-feature baselines, highlighting the value of
combining linguistic signals for robust generative AI detection.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        Our approach is motivated by the NIST 2024 Generative AI (GenAI) Text-to-Text (T2T) Discriminator
Task[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which evaluated systems for distinguishing human-written from AI-generated summaries.
      </p>
      <p>Input Text</p>
      <p>Bi-CE Loss
Stylometric</p>
      <p>Features</p>
      <sec id="sec-2-1">
        <title>3.1. Stylometric Features</title>
        <p>Feature Fusion</p>
        <p>Classifier</p>
        <p>
          Prediction
We build on insights from the top-performing teams in the challenge: the first-place system employed
BiScope’s bi-directional cross-entropy loss to uncover token-level distributional anomalies[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], while
the third-place system leveraged stylometric analysis to capture higher-level linguistic patterns such as
lexical diversity and syntactic style. By combining these complementary strategies, we aim to enhance
detection robustness and interpretability.
        </p>
        <p>By integrating BiScope’s probabilistic analysis with stylometric feature extraction, our method aims
to leverage the strengths of both approaches. This hybrid framework is designed to enhance detection
accuracy by capturing both low-level distributional irregularities and high-level stylistic nuances,
providing a more robust solution for identifying AI-generated text.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System Overview</title>
      <p>
        We tested a variety of linguistic and stylometric features. The features are largely based on previous
work in AI-generated text detection [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Additionally, we used a large language model (LLM) Claude
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for suggestions of relevant features and implemented these. We broadly categorize these features
into five diferent categories:
• Character-level: proportions of special characters, punctuation
• Lexical: unique words, abstract nouns
• Syntactic: part-of-speech-based features, multi-clause sentences
• Structural: total words, total sentences, sentence and paragraph length
• Stylistic: repetition, discourse markers, readability
      </p>
      <p>A total of 101 features were initially generated and subsequently refined through univariate
feature selection. We determined that selecting the top 25 most significant features produces optimal
performance. The final set of these 25 features is listed in the Appendix 6.</p>
      <sec id="sec-3-1">
        <title>3.2. Bi-directional Cross-entropy Loss Features</title>
        <p>
          Bi-directional Cross-entropy (Bi-CE) loss is a method used to improve the detection of AI-generated
text by measuring the consistency of token predictions in both forward and backward directions[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
Traditional cross-entropy loss evaluates the likelihood of the next token given the previous context
(left-to-right). Bi-CE extends this by also considering the reverse context (right-to-left), thus providing
a more robust estimation of token likelihood.
        </p>
        <p>Formally, the Bi-CE loss is computed as the sum of the forward and backward cross-entropy losses:
where</p>
        <p>ℒBi-CE = ℒforward + ℒbackward,
ℒforward = −

∑︁ log  ( | &lt;),
=1
(1)
ℒbackward = −

∑︁ log  ( | &gt;).</p>
        <p>=1</p>
        <p>By capturing information from both directions, Bi-CE loss features provide a stronger signal for
distinguishing human-written text from AI-generated content, as the latter tends to exhibit patterns
that are less coherent when evaluated bidirectionally.</p>
        <p>In our method, these features are extracted from a pre-trained language model and fed into
downstream classifiers to enhance detection performance.</p>
        <p>We transform a single text sample into a numerical feature vector by:
• Summarizing the text to create a prompt.
• Feeding prompt and text into a model. (Llama2-7b)
• Computing token-level forward and backward losses.
• Extracting statistical features over segments of the token losses. (mean, max, min, and standard
deviation of both FCE and BCE losses)</p>
        <p>We created 72 diferent statistical features of both FCE and BCE losses, similar to stylometric features,
we then filtered these based on univariate feature selection. We reatined the 25 most important features
yields the best results</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Classifier</title>
        <p>The proposed classifier is an ensemble model that combines five diferent machine learning algorithms.
This architecture integrates probabilistic, boosting, and tree-based techniques using a soft voting scheme
with tuned weights. The main components of the ensemble include:
• Gaussian Naive Bayes: A probabilistic classifier based on the assumption of Gaussian-distributed
features, serving as a baseline model.
• AdaBoost Classifier : An adaptive boosting algorithm implemented with a fixed random seed
for reproducibility.
• LightGBM Classifier : A gradient boosting model optimized for eficient parallel computation.
• CatBoost Classifier : A gradient boosting algorithm optimized for production environments.
• Random Forest Classifier : A bagging ensemble of 256 decision trees that provides diverse and
robust predictions.</p>
        <p>
          The classifier is trained on 50 retained Bi-CE Loss and Stylometric features extracted from the text
dataset provided by the PAN competition for training[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this work, we proposed a hybrid method for detecting AI-generated text that leverages both
bidirectional cross-entropy (Bi-CE) loss and a comprehensive set of stylometric features. By combining
statistical patterns captured from pre-trained language models with linguistic cues traditionally used
in authorship analysis, our system ofers a robust approach to distinguishing human-written from
machine-generated content. Through univariate feature selection, we refined 173 initial features down
to the most informative 50, balancing model complexity and performance. The final ensemble classifier,
composed of five complementary algorithms, demonstrated strong predictive capability on the PAN
2025 testing dataset. Our findings underscore the efectiveness of combining intrinsic language model
signals with surface-level stylistic features for advanced text forensics. Future work will explore model
generalization across domains and further integration of semantic features.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This work was conducted as part of our research at Pindrop Secruity. We thank our colleagues across the
Pindrop team for their support and contributions to the experiments and development eforts described
in this paper.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, we used GPT-4 in order to conduct grammar and spelling check.
In addition, we used GPT-4 for figures 1 in order to generate figure format. After using these tools,
we reviewed and edited the content as needed and assume full responsibility for the content of the
publication.
J. C. de Albornoz, J. Gonzalo, L. Plaza, A. G. S. de Herrera, J. Mothe, F. Piroi, P. Rosso, D. Spina,
G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction.
Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF 2025),
Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2025.
[8] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: Advances in Information
Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes in Computer
Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241.
[9] S. Bird, E. Klein, E. Loper, Natural Language Processing with Python: Analyzing Text with the</p>
      <p>Natural Language Toolkit, O’Reilly Media, 2009. URL: https://www.nltk.org/.
[10] P. University, About wordnet, 2010. URL: https://wordnet.princeton.edu/.
[11] L. Shen, Lexicalrichness: A small module to compute textual lexical richness, 2022. URL: https:
//github.com/LSYS/lexicalrichness.
[12] A. Hahn, textstat: Text statistics for python, 2018. URL: https://github.com/shivam5992/textstat.</p>
      <p>Additional Notes
Punctuation defined using Python’s string.punctuation
Special characters defined by regex
Percentage of words that occur only once in the text
Number of verbs not in the most common 5000 words per WordNet[10]
Stop words defined using NLTK’s stopwords
Variance in term-frequency / document-frequency by sentence
Number of unique words / number of total words
Calculated with NLTK ngram
Calculated with NLTK ngram
Word count based on regular expression match
Unique word count provided by the LexicalRichness package [11]
Unique Word Count (regex) / Word Count
Word count calculated by splitting text by spaces
Word count provided by the LexicalRichness package [11]
Flesch Reading Ease scores calculated using the textstat package [12]
Gunning Fog Index scores calculated using the textstat package [12]
Count of tags starting with ’RB’
Count of words tagged with specific ’RB’ part of speech
Count of sentences that contain more than one verb phrase
Occurrences of most common pattern / number of sentences
Calculated using NLTK parse
Total count of dependency relations
Sentences split with NLTK sent_tokenize
Std / mean of sentence lengths
Words matching NLTK pronoun tag
Cosine similarity between BERT sentence embeddings</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsivgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Elozeiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “VoightKampf” Generative AI Authorship Verification Task at PAN</article-title>
          and
          <article-title>ELOQUENT 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          , S. Cheng,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , K. Zhang, G. Tao,
          <string-name>
            <given-names>G.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Biscope:
          <article-title>Ai-generated text detection by checking memorization of preceding tokens</article-title>
          ,
          <source>in: Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)</source>
          , Vancouver, Canada,
          <year>2024</year>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2024/hash/ bc808cf2d2444b0abcceca366b771389-Abstract-Conference.html.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kumarage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Trapeznikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruston</surname>
          </string-name>
          , H. Liu,
          <article-title>Stylometric detection of ai-generated text in twitter timelines (</article-title>
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2303.03697. arXiv:
          <volume>2303</volume>
          .
          <fpage>03697</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Opara</surname>
          </string-name>
          , Styloai:
          <article-title>Distinguishing ai-generated content with stylometric analysis</article-title>
          ,
          <source>in: Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials</source>
          , Industry and
          <string-name>
            <given-names>Innovation</given-names>
            <surname>Tracks</surname>
          </string-name>
          , Practitioners,
          <source>Doctoral Consortium and Blue Sky</source>
          , Springer Nature Switzerland,
          <year>2024</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>114</lpage>
          . URL: https://arxiv.org/abs/2405.10129.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Awad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Diduch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Peterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seo</surname>
          </string-name>
          , I. Soborof,
          <string-name>
            <given-names>H.</given-names>
            <surname>Iyer</surname>
          </string-name>
          , 2024
          <string-name>
            <given-names>NIST</given-names>
            <surname>Generative</surname>
          </string-name>
          <article-title>AI (GenAI): Evaluation Plan for Text-to-</article-title>
          <string-name>
            <surname>Text (T2T) Discriminators</surname>
          </string-name>
          ,
          <source>Technical Report, National Institute of Standards and Technology</source>
          ,
          <year>2024</year>
          . URL: https://tsapps.nist.gov/publication/get_pdf.cfm? pub_id=
          <fpage>957332</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Anthropic</surname>
          </string-name>
          ,
          <source>Claude llm (version 1.0)</source>
          ,
          <source>Large language model</source>
          ,
          <year>2023</year>
          . URL: https://www.anthropic.com, accessed: Dec.
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>