<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Style Change Detection Using Graph and Structural Linguistic Features for Multi-Author Writing Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ioana-Roxana Boriceanu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andra-Elena Băltoiu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National University of Science and Technology POLITEHNICA Bucharest</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents our approach to the Multi-Author Writing Style Analysis task at PAN 2025. The goal is to detect sentence level style changes that may indicate a shift in authorship. We propose a handcrafted, feature based pipeline that integrates graph based properties from Word Adjacency Networks (WANs), lexical and syntactic measures, sentence level context features, and similarity metrics computed over embeddings produced by Sentence-BERT (SBERT). The system is tuned to handle all three levels of dificulty by adapting feature processing and model calibration. Predictions are made using a Gradient Boosting classifier. Results on the validation and test sets show that our interpretable and lightweight method performs competitively across all dificulty levels.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Style Change Detection</kwd>
        <kwd>Authorship Attribution</kwd>
        <kwd>Word Adjacency Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Stylometry is the computational analysis of writing style and has been widely applied in authorship
attribution [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], plagiarism detection [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and other related tasks. It relies on the idea that authors leave
behind consistent stylistic traces, even when writing about similar topics. These traces can include
lexical choices, syntactic preferences, punctuation patterns, and sentence structure, among others.
      </p>
      <p>While traditional stylometric studies focus on attributing entire documents to a known set of authors,
recent eforts have explored more fine-grained tasks, such as detecting shifts in writing style within a
single document. These style shifts may occur in collaborative writing, edited content, or deceptive texts
authored by multiple individuals. Unlike full document attribution, this sentence level task requires high
sensitivity to local changes in style and the ability to distinguish them from content or topic variation.</p>
      <p>
        The PAN 2025 lab [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] addresses the problem of style change detection through the Multi-Author
Writing Style Analysis task. The goal is to detect changes in writing style at the sentence level within
a document written by multiple authors. This task is intrinsic, meaning that no reference texts or
author profiles are provided. The system must identify boundaries where the style shifts, using only
information from within the document itself. Each sentence is assumed to be written by a single author,
and style changes are assumed to occur only between sentences.
      </p>
      <p>The dataset includes three dificulty levels: easy, medium, and hard. These levels difer in how much
topical variation is present within each document. In the hard setting, all sentences are on the same
topic, so topical cues are not useful. In the easy and medium settings, there is some variation in topic,
but relying too heavily on topic shifts can be misleading.</p>
      <p>Our system does not try to separate style from content directly. Instead, it uses a diverse set of features
that are designed to capture stylistic information. We adjust the feature configuration depending on the
dificulty level. For each level, we train a separate Gradient Boosting classifier, apply appropriate feature
scaling, and adapt the graph based analysis accordingly. The system is lightweight, interpretable, and
performs well across all dificulty settings.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Previous research on style change detection has largely been shaped by the PAN shared tasks, which
have progressively introduced more fine-grained and realistic scenarios for intrinsic authorship analysis
[
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. These tasks have encouraged systems to go beyond surface level features and explore deeper
representations of writing style. Recent advancements in style change detection have leveraged a
diverse array of methodologies, ranging from deep contextual embeddings to graph based and
structurallinguistic features. In the PAN 2024 shared task, numerous teams adopted transformer based models
such as BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and RoBERTa [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to identify stylistic shifts at the paragraph level.
      </p>
      <p>Many top-performing systems in PAN 2024 relied heavily on deep contextual embeddings from
pretrained language models. For instance, one team combined RoBERTa, DeBERTa, and ERNIE models
within a majority voting framework, achieving strong results across all dificulty levels [ 9]. Another
system, which ranked first overall, utilized embeddings from the LLaMA-3-8B decoder, fine-tuned using
low-rank adaptation (LoRA) to perform label classification, demonstrating the efectiveness of large
language models (LLMs) in capturing subtle stylistic variation [10].</p>
      <p>In addition to these deep models, another approach integrated transformer based embeddings with
handcrafted stylometric features to enhance interpretability and robustness [11]. The team combined
RoBERTa representations with features that reflect text formality, grammatical structure, and readability,
including metrics such as the Flesch-Kincaid grade level and the SMOG index. The authors also used
the Mann–Whitney U test to assess whether diferences in feature distributions were statistically
significant across authorial boundaries, reafirming the continued relevance of classical stylistic features
in detecting writing style changes.</p>
      <p>
        Graph based representations of text have been explored as an alternative to purely sequential or
embedding based approaches in authorship analysis. One such method, introduced in [12], transforms
texts into syntactically informed graphs where words are represented as nodes labeled with their
part-of-speech (POS) tags. Edges are created based on sentence structure and syntactic grouping, and
the resulting graph is characterized using centrality measures such as degree, closeness, betweenness,
and eigenvector centrality. These features are then used to train classification models for author
identification. A diferent strategy proposed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] applies graph modeling to multi-author documents
by constructing Co-Authorship Graphs (CAGs), where text segments are connected based on stylistic
similarity computed via modified Hausdorf distance. These models have shown promising results on
both synthetic and real world datasets, ofering a structurally motivated perspective on style change
detection.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The PAN 2025 Style Change Detection dataset comprises English language documents constructed from
Reddit comments [13]. Each document is a sequence of sentences authored by multiple individuals,
with the objective being to identify the positions at which the author changes. Specifically, for each
pair of consecutive sentences, the task is to determine whether a change in authorship has occurred.
The dataset is divided into three dificulty levels: easy, medium, and hard. In the easy set, sentences
cover a variety of topics, allowing models to utilize topic information as a cue for detecting authorship
changes. The medium set contains documents with limited topical variety, compelling models to focus
more on stylistic features. The hard set consists of documents where all sentences pertain to the same
topic, necessitating reliance solely on stylistic cues for detecting author changes.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Processing</title>
        <p>Each document in the dataset is first segmented into individual sentences. These sentences undergo a
series of linguistic preprocessing steps including tokenization, part-of-speech (POS) tagging,
lemmatization, and removal of stopwords and punctuation. The specific combination of preprocessing steps
depends on the dificulty level and is controlled through a configuration schema.</p>
        <p>To model the local lexical and syntactic structure of sentences, we construct Word Adjacency Networks
(WANs). After preprocessing, each word in a sentence is represented as a node in a directed graph.
A directed edge is added between two nodes if the corresponding words appear consecutively in the
sentence. If an edge between two nodes already exists, its weight is incremented to reflect the frequency
of that word pair. Redundant edges, such as self-loops, are removed to maintain structural clarity. In
certain configurations, additional nodes and edges are introduced to represent POS tag transitions,
allowing the network to capture grammatical relationships beyond surface word order. These enriched
networks ofer a stylometric representation of sentence structure and are used to extract graph features
such as centrality scores, clustering coeficients, and entropy measures.</p>
        <p>A simplified illustration of a WAN is presented in Figure 1, based on the famous phrase "To be, or not
to be, that is the question" from Hamlet by William Shakespeare (Act III, Scene 1). Each node represents
a token (e.g., word or punctuation mark) in the sentence and may be labeled with its corresponding
part-of-speech (POS) tag, depending on the preprocessing configuration. The POS tag, shown below
the token, follows the Penn Treebank format and was generated automatically using the spaCy NLP
toolkit [14]. To maintain visual clarity, the figure omits separate POS transition nodes and edges. It
is intended as a minimal illustration of lexical adjacency rather than a full depiction of the enriched
network structure. The first word is highlighted in green and the final word in red.</p>
        <p>To study the impact of preprocessing choices on the strength of stylistic signals, we define sixteen
WAN configurations, representing all possible combinations ( 24 = 16) of four optional preprocessing
steps: punctuation removal, stopword removal, lemmatization, and POS transition modeling. Each
configuration can be represented using a four-bit binary mask, where each bit indicates whether a
preprocessing step is applied (1) or omitted (0), in the order: punctuation, stopwords, lemmatization, POS
tags. For example, the mask 1111 indicates that all steps are applied, while 0000 means no preprocessing
is performed. For each dificulty level in the task, we selected a configuration that empirically achieves
a balance between preserving stylistic signals and suppressing topical or content related noise.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Feature Engineering</title>
        <p>To detect stylistic shifts at the sentence level, we extract a diverse set of features that reflect lexical,
syntactic, semantic, and discourse level patterns. These features fall into six broad categories and were
primarily selected based on their demonstrated utility in previous work on authorship attribution and
stylistic analysis, as discussed in the Related Work section:
• Embedding based similarity metrics: computed over SBERT embeddings, including cosine
similarity, Euclidean distance, Manhattan distance, and norm ratio between sentence
representations. We also include semantic drift across skip distances and directional embedding angles to
capture local variation in meaning.
• WAN features: graph metrics such as average degree, clustering coeficient, density, assortativity,
and degree entropy; pairwise similarity across centrality measures (degree, closeness, eigenvector);
POS transition entropy and drift across sliding windows.
• Lexical and syntactic features: type-token ratio, Yule’s K, average word length,
character-totoken ratios, punctuation burstiness, POS ratios (pronouns, conjunctions, prepositions),
dependency depth, passive constructions, and sentence length variance.
• Contextual features: named entity and lemma overlap between adjacent sentences, discourse
marker detection, subject continuity, and average similarity of embeddings between neighboring
sentences.
• Deep style indicators: formality score, clause complexity, modality ratio, discourse marker
ratio, rhetorical questions, sarcasm markers, person based pronoun usage, sentiment polarity,
subjectivity, and LIWC-inspired afective, authentic, certainty, and tentativeness features.
• Readability and rhythm: Flesch Reading Ease, Gunning Fog Index, Automated Readability</p>
        <p>Index, syllable count, and Dale-Chall readability score.</p>
        <p>In selecting these features, we aimed to capture diverse aspects of writing style using only the
information contained within each document. Since the task is intrinsic and sentence based, external
resources such as author profiles or reference texts are not available. All features are therefore designed
to detect stylistic change through local linguistic and structural signals.</p>
        <p>To guide our design choices, we also conducted manual inspection of the dataset. This close reading
helped us better understand the structure and surface properties of the texts across diferent dificulty
levels. In the medium subset, for example, we frequently encountered recurring messages such as:
• “In general, be courteous to others. Debate/discuss/argue the merits of ideas, don’t attack people.</p>
        <p>Personal insults, shill or troll accusations, hate speech, any suggestion or support of harm, violence,
or death, and other rule violations can result in a permanent ban. For those who have questions
regarding any media outlets being posted on this subreddit, please click to review our details as to
our approved domains list and outlet criteria.”
• “r/politics is currently accepting new moderator applications. If you want to help make this community
a better place, consider !”
• “I am a bot, and this action was performed automatically. Please if you have any questions or
concerns.”</p>
        <p>These recurring fragments often appeared at stylistic boundaries and provided useful cues for feature
selection. They also reinforced the need for features that could capture repetition, discourse markers,
and formal structure, particularly in the medium setting where topic variation is limited but not fully
absent.</p>
        <p>To address these aspects, we constructed feature sets that describe lexical richness, punctuation usage,
and syntactic preferences. Word adjacency networks (WANs) were used to extract structural features
that capture not only the arrangement of words within sentences, but also patterns of grammatical
dependencies, local syntactic structure, and characteristic usage preferences that vary between authors,
as demonstrated in prior work on WAN-based authorship attribution [12]. Contextual features evaluate
coherence between neighboring sentences, including named entity overlap, subject continuity, and
transitions in discourse markers, drawing on strategies efective in stylometric analysis [ 15]. Deeper
stylistic indicators capture tone, formality, and rhetorical patterns, building on prior work that leverages
sentiment analysis and LIWC derived features for author profiling and stylistic characterization [ 16, 17].
Readability and rhythm metrics provide insight into variation in fluency and pacing across authors. To
complement these handcrafted features, we also integrated semantic similarity scores derived from
SBERT [18] embeddings. Although these embeddings encode both content and style, we found them to
be highly efective across all dificulty levels. The ability of SBERT to capture fine grained variation at
the sentence level contributed significantly to the overall performance of the system.</p>
        <p>By combining quantitative modeling with manual insight, we constructed a feature set that is both
comprehensive and tailored to the stylistic patterns observed in the data. This helped the system
generalize more efectively across all dificulty levels.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Classification</title>
        <p>For each dificulty level, we train a separate Gradient Boosting classifier [ 19] using the handcrafted
feature set described above. A distinct model is trained for the easy, medium, and hard subsets, allowing
us to tailor preprocessing and calibration to the specific characteristics of each case.</p>
        <p>Instead of predicting class labels directly, we use the probability scores returned by the classifier. This
allows us to apply a custom threshold when converting probabilities into binary predictions. A fixed
threshold of 0.35 was selected based on validation performance. This value provided a better balance
between false positives and false negatives compared to the default threshold of 0.5, particularly in the
hard and medium settings where stylistic changes tend to be subtle and less frequent.</p>
        <p>We also adjust preprocessing based on dificulty level. In the easy and medium subsets, we apply
min-max scaling to normalize the feature values and ensure consistency across features with diferent
numeric ranges. In the hard setting, we do not apply scaling. Preserving the original feature distributions
helped retain stylistic variation that may be weakened through normalization. This proved efective in
settings where topic cues are absent and subtle style diferences are the only available signal.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>We conducted extensive experiments to compare multiple classification models, including Support
Vector Machines, Random Forest, Naive Bayes, K-Nearest Neighbors, and Gradient Boosting. Among
these, Gradient Boosting consistently achieved the highest F1 scores on the validation set across
all dificulty levels. It also demonstrated more stable performance in handling subtle stylistic shifts,
especially in the hard setting. Based on these observations, we selected Gradient Boosting as the final
classifier for our pipeline.</p>
      <p>Table 1 presents the F1 scores obtained by the Gradient Boosting classifier on the validation set across
all sixteen WAN configurations. Each row corresponds to a specific WAN configuration, expressed as
a 4-bit binary mask in the order: punctuation, stopwords, lemmatization, and POS tags. A bit value
of 1 indicates that the corresponding preprocessing step is applied. This means that punctuation is
removed, stopwords are removed, words are reduced to their base forms through lemmatization, and
part-of-speech transitions are included in the Word Adjacency Network. For example, the configuration
1110 applies all steps except POS tag modeling. This representation highlights how performance varies
across diferent preprocessing combinations. The columns represent the three sub-tasks from the PAN
2025 Style Change Detection challenge: Task 1 (easy), Task 2 (medium), and Task 3 (hard), which difer
in the amount of topical variation present in the documents.</p>
      <p>The best performing configuration for each task is highlighted in bold. In the easy setting (Task 1),
multiple WAN configurations achieve nearly identical F1 scores, all within a narrow range around 0.963.
This suggests that in the presence of strong topical variation, the specific choice of preprocessing has a
limited efect on overall performance. In contrast, performance in the medium and hard settings is more
sensitive to preprocessing. Configuration 0100 (only stopwords removed, no lemmatization, no POS tag
modeling) yields the best result in Task 2, while configuration 1110 (all steps except POS modeling)
achieves the highest F1 score in Task 3.</p>
      <p>A clear pattern in the results is the benefit of stopword removal, which is included in almost all of
the best performing configurations. Lemmatization has mixed efects and is often missing from the top
setups. Punctuation removal helps in the medium task but seems less important in the hard setting.
POS tag modeling is often left out of the best configurations, especially for the hard task. This may be
because adding grammatical information makes the networks less sensitive to the specific ways authors
use language. Overall, using simpler and more selective preprocessing seems to work better when topic
cues are missing and the system needs to rely more on subtle diferences in writing style.</p>
      <p>Based on these validation scores, the best performing models using configuration 0110 for the easy
task, 0100 for the medium task, and 1110 for the hard task were selected and submitted to the TIRA
platform [20] for final evaluation in the PAN 2025 shared task.</p>
      <p>Table 2 presents the final F1 scores of our system on the oficial PAN 2025 test set. Our submission
was made under the TIRA team name stylospies. Our approach consistently outperforms the baseline
across all three tasks.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This work explored a primarily classical approach to the challenging task of sentence level style
change detection. Rather than relying fully on large pretrained language models, we focused on
handcrafted features grounded in linguistic structure, lexical patterns, and graph based representations.
We constructed Word Adjacency Networks under sixteen preprocessing configurations and extracted
a broad range of linguistic, structural, contextual, and embedding based features. Separate Gradient
Boosting classifiers were trained for each dificulty level, using tailored preprocessing and thresholding
to account for topical variability across the dataset. The results confirm that even in a high variability and
ifne grained task such as this one, traditional machine learning methods, when carefully engineered and
calibrated, remain competitive. This supports the continued relevance of lightweight and transparent
models, especially in scenarios where resource constraints or interpretability are important.</p>
      <p>As natural language processing advances, it is increasingly important to consider not only accuracy
but also the environmental and practical costs of AI systems. By emphasizing simplicity, transparency,
and eficiency, our work contributes to the broader efort toward sustainable AI. Future directions could
explore hybrid methods that combine the clarity of handcrafted features with the adaptability of neural
representations to further improve performance in stylistically complex tasks.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT to check grammar and spelling. The
authors reviewed and edited the content afterward and take full responsibility for the final publication.
[9] T. Lin, Y. Wu, L. Lee, Team nycu-nlp at pan 2024: integrating transformers with similarity
adjustments for multi-author writing style analysis, Working Notes of CLEF (2024).
[10] J. Lv, Y. Yi, H. Qi, Team fosu-stu at pan: Supervised fine-tuning of large language models for multi
author writing style analysis, Working Notes of CLEF (2024).
[11] E. Księżniak, K. Węcel, M. Sawiński, Team openfact at pan 2024: Fine-tuning bert models with
stylometric enhancements, in: CEUR Workshop Proceedings, volume 3740, 2024.
[12] E. Castillo, O. Cervantes, D. Vilarino, Authorship verification using a graph knowledge discovery
approach, Journal of Intelligent &amp; Fuzzy Systems 36 (2019) 6075–6087.
[13] E. Zangerle, M. Mayerl, M. Potthast, B. Stein, Pan25 multi-author writing style analysis, 2025. URL:
https://doi.org/10.5281/zenodo.15053260. doi:10.5281/zenodo.15053260.
[14] M. Honnibal, spacy 2: Natural language understanding with bloom embeddings, convolutional
neural networks and incremental parsing, (No Title) (2017).
[15] V. W. Feng, G. Hirst, Patterns of local discourse coherence as a feature for authorship attribution,</p>
      <p>Literary and Linguistic Computing 29 (2014) 191–198.
[16] J. Gaston, M. Narayanan, G. Dozier, D. L. Cothran, C. Arms-Chavez, M. Rossi, M. C. King, J. Xu,
Authorship attribution via evolutionary hybridization of sentiment analysis, liwc, and topic
modeling features, in: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE,
2018, pp. 933–940.
[17] G. A. Katsios, N. Sa, T. Strzalkowski, Figuratively speaking: Authorship attribution via multi-task
ifgurative language modeling, arXiv preprint arXiv:2406.08218 (2024).
[18] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, arXiv
preprint arXiv:1908.10084 (2019).
[19] J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics
(2001) 1189–1232.
[20] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: Advances in Information
Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes in Computer
Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sarwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Urailertprasert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vannaboot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rakthanmanon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chuangsuwanich</surname>
          </string-name>
          , S. Nutanong, :
          <article-title>Stylometric authorship attribution of multi-author documents using a co-authorship graph</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>18374</fpage>
          -
          <lpage>18393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Saini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Sri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <article-title>Intrinsic plagiarism detection system using stylometric features and dbscan</article-title>
          , in: 2021
          <source>International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zangerle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the Multi-Author Writing Style Analysis Task at PAN 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          , et al.,
          <article-title>Overview of pan 2024: Multi-author writing style analysis, multilingual text detoxification, oppositional thinking analysis, and generative ai authorship verification condensed lab overview</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Borrego-Obrador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chinea-Ríos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franco-Salvador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Heini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kredens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pęzik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          , et al.,
          <source>Overview of pan</source>
          <year>2023</year>
          :
          <article-title>Authorship verification, multi-author writing style analysis, profiling cryptocurrency influencers, and trigger detection: Condensed lab overview</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>459</fpage>
          -
          <lpage>481</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding/arxiv preprint</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>