<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>AI - Enhancing the Realism of Machine-Generated Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Soumodeep Saha</string-name>
          <email>soumodeepsahaa@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ronit Das</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dipankar Das</string-name>
          <email>dipankar.dipnil2005@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AI-Generated Text, Human-Authored Text</institution>
          ,
          <addr-line>Text Generation, GPT-2, Text Classification, Authorship Verification</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept of Computer Science and Engineering, Jadavpur University</institution>
          ,
          <addr-line>Kolkata, West BengaL</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Natural Language Processing, Adversarial Text Manipulation</institution>
          ,
          <addr-line>Machine Learning, Content Moderation</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The increasing sophistication of generative language models has blurred the lines between machine-generated and human-authored text, raising concerns about authenticity, misinformation, and consumer awareness. In response to these challenges, this Task Voight-Kampf by Eloquent Lab 2025 texts can be reliably distinguished from those written by humans. By leveraging prompts derived from various genres including encyclopedia entries, news articles, biographies, and more.Participants are asked to generate 500-word texts using language models. These outputs are then compared with genuine human-written samples to evaluate distinguishability. The task is conducted by the Eloquent Lab @ CLEF 2025. We secured 4th rank in this shared task participation . A unique aspect of the task is its focus on genre and stylistic consistency, not only assessing the ability to detect machine-authored content but also evaluating whether system-specific writing traits remain consistent across topics and genres. We used the pretrained GPT-2 model for text generation and applied additional post-processing techniques to make the generated text more human-like.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>https://github.com/SoumodeepSaha (S. Saha); https://cse.jadavpuruniversity.in/faculty/dipankar-das (D. Das)</p>
      <p>CEUR</p>
      <p>ceur-ws.org
human-like in tone, coherence, and expression that it can bypass even sophisticated AI detectors. Our
approach involves not only the use of powerful generative models like GPT-2 but also the incorporation
of post-processing techniques that inject spontaneity, emotional nuance, and informal speech patterns
into the generated outputs.</p>
      <p>By focusing on genre- and style-aware generation, we aim to simulate realistic human communication
and evaluate how convincingly these texts mimic genuine writing. This investigation not only sheds light
on the capabilities of generative models but also raises critical questions about authorship verification,
AI regulation, and the future of trustworthy communication in the age of synthetic media.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The increasing sophistication of large language models (LLMs) like ChatGPT has stimulated a growing
body of research that aims to understand, detect, and evaluate AI-generated content. Several studies
have examined the linguistic and stylistic distinctions between human-authored and machine-generated
texts. Sardinha (2024) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] conducted a multidimensional comparison of AI-generated versus
humanauthored texts, identifying distinct patterns in coherence, cohesion, and lexical richness. Similarly,
Hakam et al. (2024) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] analyzed academic literature in orthopedics to uncover qualitative diferences
between human and AI-written manuscripts. Matsubara and Matsubara (2025) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] focused on the
human touch in scientific writing, highlighting subtle stylistic traits often missing from LLM-generated
outputs. Alvero et al. (2024) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] explored the societal implications of synthetic authorship through the
lens of social demography and hegemonic patterns in publishing.
      </p>
      <p>
        To tackle the challenge of distinguishing between human and AI-generated content, numerous
detection techniques have been proposed. Kumar and Mindzak (2024) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] stressed the importance
of academic integrity and detection systems. Alhijawi et al. (2025) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Maktabdar Oghaz et al.
(2025) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] presented deep learning and transformer-based approaches for identifying synthetic scientific
content. Other eforts, such as those by Prajapati et al. (2024) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], leveraged LLMs themselves for
detection, while Soto et al. (2024) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] introduced stylistic few-shot learning methods. Cheng et al.
(2025) [10] pushed the boundary by proposing fine-grained classification models that factor in role
recognition and author involvement levels.
      </p>
      <p>The interpretability and transparency of LLM-aided authorship have also received considerable
attention. Hwang et al. (2025) [11] studied human-AI co-writing scenarios, emphasizing user preferences
for authenticity. Hoque et al. (2024) [12] ofered visual tools to trace AI contributions in co-authored
texts. Pividori and Greene (2024) [13] examined infrastructural and ethical considerations for AI-assisted
writing in academic publishing workflows.</p>
      <p>Within shared task environments, the PAN lab has long pioneered research in authorship verification
and stylometry. Their yearly evaluations have incrementally addressed tasks such as fake news profiling,
hate speech authorship, and multi-author style analysis [14, 15, 16, 17]. As LLM-generated content
became more prevalent, PAN introduced the “Voight-Kampf” task [ 18], a builder-breaker challenge
designed to test the limits of generative text detection. This task, named after the fictional replicant
detector in *Blade Runner*, evaluates whether human-authored text can be reliably distinguished
from machine-generated content. The methodological framing of the task reflects two paradigms of
detection—authorship attribution and authorship verification—as outlined by Bevendorf et al. [ 19].</p>
      <p>Building on its 2024 version, the PAN 2025 edition expands the Voight-Kampf task into two subtasks:
(1) binary classification of machine vs. human authorship [ 20]. Organized jointly with the ELOQUENT
Lab, Subtask 1 adopts a more challenging real-world setting in which only a single text is provided
for verification. The task continues to probe the indistinguishability and traceability of AI-generated
writing in increasingly obfuscated or human-like formats.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description</title>
      <p>
        At PAN 2024 [
        <xref ref-type="bibr" rid="ref10">21</xref>
        ], the “Voight-Kampf” Generative AI Authorship Verification task [
        <xref ref-type="bibr" rid="ref11">22</xref>
        ] was introduced
for the first time, attracting significant participation. As a starting point, various task variants were
formalized and organized in a hierarchy from easiest to hardest, as illustrated in Figure 1. For establishing
the baseline, the easiest variant was selected, where participants were provided with a pair of texts—one
authored by a human and the other generated by a machine.
      </p>
      <p>
        For PAN 2025 [
        <xref ref-type="bibr" rid="ref12">23</xref>
        ], the task progresses to a more challenging variant in which only a single text is
provided. This reflects a more realistic and open-ended scenario of authorship verification “in the wild,”
aligning with settings commonly explored in other LLM-generated content detection shared tasks.
      </p>
      <p>The task corresponds to a classic binary detection task: determining whether the given text is
human-authored or machine-generated. However, this year’s task raises the dificulty by introducing
deliberately “obfuscated” texts designed to evade detection. Despite attempts in PAN 2024—both
algorithmic and manual—the obfuscation strategies were largely inefective. Therefore, in this edition,
particular emphasis is placed on testing whether human authors can intentionally alter their writing
style to resemble machine output, and whether modern detectors can still correctly classify such texts.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset</title>
      <p>The generative model, GPT-2, was used to generate the AI-authored texts1 from the provided prompts,
while the human-authored texts were selected from a diverse range of genres and writing styles. The
test dataset consisted of 22 human-written texts, each ranging between 100 to 600 words in length.
In cases where original texts were longer, an appropriately representative section was selected by the
organisers. For each selected text, a summary was generated using OpenAI’s ChatGPT with the
following prompt: “Summarise the main points of the following text and give an overall description of
the genre and tone of the text.” These summaries were then shared with all the participants, including
myself, as the basis for generating new short texts. A sample summary test item is shown in Table 1,
and the complete list of item titles is provided in Table 2.</p>
      <p>We were provided with a suggested prompt: “Write a text of about 500 words which covers the
following items.”</p>
      <p>However, we were free to formulate our own prompts as we deemed appropriate. I submitted my
generated texts through the oficial submission form provided by the organisers, after which the entries
were forwarded to the PAN lab for classification and further evaluation.</p>
      <sec id="sec-4-1">
        <title>1https://huggingface.co/datasets/Eloquent/Voight-Kampff</title>
        <p>Content: The letter is from someone claiming to be Prince Joe Eboh, Chairman of the Contract Award
Committee of the Niger Delta Development Commission (NDDC). \n The sender explains that a surplus of
$25 million USD from petroleum contracts needs to be discreetly transferred out of Nigeria. \n Due to local
laws prohibiting civil servants from holding foreign accounts, they seek a foreign partner to temporarily
receive the funds. \n The recipient is promised 20% of the amount for their cooperation, while 75% will go
to committee members and 5% for expenses. \n The sender requests personal and banking details from
the recipient to initiate the transfer. \n The letter emphasizes secrecy and urgency, aiming to complete the
transaction in 21 working days.,
Genre and Style: Genre: Advance-Fee Fraud / Scam Letter (commonly known as a 419 scam) \n Tone:
Formal and persuasive, but suspiciously flattering and manipulative. It mimics oficial language to appear
legitimate, yet it contains telltale signs of deception and illegitimacy.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology</title>
      <p>Source
archive.org
archive.org
gutenberg.org
gutenberg.org
gutenberg.org
archive.org
lingvi.st
readbibleonline.net
cvce.eu
fanfic.net
gutenberg.org
gutenberg.org
gnu.org
lingvi.st
gutenberg.org
acm.org/cacm
nobelprize.org
gutenberg.org
gutenberg.org
gutenberg.org
wikipedia.org
wikipedia.org
The methodology for this task is centered around the use of a generative language model to create
AI-generated texts and then post-process those texts to make them sound more human-like. This
section outlines the steps followed to generate and process AI texts, from loading the dataset to saving
themas shown in figure 2.</p>
      <sec id="sec-5-1">
        <title>5.1. Dataset Preparation and Loading</title>
        <p>The first step in the methodology is loading the dataset from a JSON file. The dataset contains a
collection of human-authored texts, each associated with a specific genre and style. These texts are
then paired with AI-generated counterparts, which are created based on the same genres and styles.
The dataset is structured with prompts that provide the foundation for the generated text.</p>
        <p>The data is loaded using the Python json module, where each text prompt is extracted for use in
the text generation process. The dataset contains various texts, each with a unique id, a content
description, and its associated genre_and_style. This dataset forms the core of the task, as it allows for
the generation of AI texts in diferent styles, challenging the detection system to identify key diferences
between human and machine writing.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Text Generation Using GPT-2</title>
        <p>Once the dataset is loaded, the next step is to generate AI text using the GPT-22 language model. GPT-2
is a generative pre-trained transformer that can produce coherent and contextually appropriate text
based on a given prompt. In this task, GPT-2 is used to generate texts that are stylistically similar to
human-authored content but with the subtle characteristics of machine-generated text.</p>
        <p>The text generation process follows these steps:
• Input Prompt: Each text is generated from a given prompt (e.g., a scam letter or a scientific
exposition). The prompt serves as the foundation for the AI to generate contextually relevant
content.
• Model Parameters: The GPT-2 model is configured with specific parameters as shown in Table 3
to control the text’s randomness and quality. These parameters include:</p>
        <sec id="sec-5-2-1">
          <title>2https://huggingface.co/openai-community/gpt2</title>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Post-Processing to Improve Human-Likeness</title>
        <p>While GPT-2 generates text that is grammatically correct and contextually appropriate, additional
postprocessing is required to make the text sound more like a human wrote it. This involves introducing
conversational elements and imperfections that are often found in human speech or writing. The
post-processing steps include:
• Fillers and Interruptions: Common conversational fillers such as ”you know,” ”like,” ”um,”
and ”well” are inserted randomly into the text. These add spontaneity to the generated text and
simulate human speech patterns.
• Self-Doubt and Inconsistent Flow: To mimic the natural hesitations and changes in thought
that often occur in human writing, phrases like ”I’m not sure, but...” and ”Could be wrong, but...”
are randomly added.
• Emotions and Exclamations: To enhance the emotional tone of the text, phrases like ”Wow,
that’s crazy!” and ”Can you believe it?” are introduced. These expressions help make the text feel
more relatable and engaging.
• Irregular Punctuation: Random punctuation marks, such as exclamation points and ellipses,
are added at the end of sentences to simulate natural human pauses and excitement.</p>
        <p>These techniques result in a more human-like text, making it harder to diferentiate from
humanwritten content.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Saving and Organizing Generated Texts</title>
        <p>Once the text is generated and post-processed, it is saved as a text file for further analysis. Each
generated text is saved using its unique prompt ID as the filename, ensuring that each text can be easily
traced back to its corresponding prompt in the dataset.</p>
        <p>• Directory Structure: The generated texts are organized in folders, with each folder named
after the team working on the task. This structure ensures that each team’s outputs are easily
identifiable and separated.
• File Naming: Text files are named using the prompt ID, ensuring consistency and easy reference.</p>
        <p>For example, a file generated for prompt ID ”030” will be saved as 030.txt.</p>
        <p>Finally, the main execution function brings all the steps together, from loading the dataset, generating
and post-processing the texts, to saving and zipping the files. Once the process is complete, the user
receives a file containing all the generated texts, ready for evaluation.</p>
        <p>The process concludes with the printed message:</p>
        <p>”Task completed! Your generated texts have been saved and zipped.”</p>
        <p>This completes the methodology for generating and processing the AI-generated texts used in this
task.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Observations</title>
      <p>The following example illustrates the performance of the humanlike text generation system. The input
consists of an artificially crafted scam letter ofering a financial deal involving a foreign
account—characteristic of advance-fee frauds. It features formal, persuasive, and manipulative language designed
to deceive. In table 4,we present before you the input alongside the corresponding output generated
using GPT-2 with humanlike post-processing. The model preserves key factual information but
introduces conversational fillers and emotionally expressive phrases to simulate human writing. While this
enhances naturalness, it occasionally disrupts coherence and alters the original formal tone.
Content: The letter is from someone claiming to be Prince Joe Eboh, Chairman of the Contract Award
Committee of the Niger Delta Development Commission (NDDC). \n The sender explains that a surplus of
$25 million USD from petroleum contracts needs to be discreetly transferred out of Nigeria. \n Due to local
laws prohibiting civil servants from holding foreign accounts, they seek a foreign partner to temporarily
receive the funds. \n The recipient is promised 20% of the amount for their cooperation, while 75% will go
to committee members and 5% for expenses. \n The sender requests personal and banking details from
the recipient to initiate the transfer. \n The letter emphasizes secrecy and urgency, aiming to complete the
transaction in 21 working days.,
Genre and Style: Genre: Advance-Fee Fraud / Scam Letter (commonly known as a 419 scam) \n Tone:
Formal and persuasive, but suspiciously flattering and manipulative. It mimics oficial language to appear
legitimate, yet it contains telltale signs of deception and illegitimacy.</p>
      <p>Output The letter is from someone claiming to be Prince Joe Eboh, Chairman of the Contract Award
Committee of the Niger Delta Development Commission (NDDC). The sender explains that a surplus of
$25 million USD from petroleum contracts needs to be discreetly transferred out of Nigeria. Due to local
laws prohibiting civil servants from holding foreign accounts, they seek a foreign partner to temporarily
receive the funds. The recipient is promised 20% of the amount for their cooperation, while 75% will go
to committee members and 5% for expenses. The sender requests personal and banking details from
the recipient to initiate the transfer. The letter emphasizes secrecy and urgency, aiming to complete the
transaction in 21 working days. The recipient’s name is redacted, but the sender states that they are ”not
interested in any further details.” The message states, ”We are not interested.” A few days later, the letter
was sent to the Nigerian government. The Nigerian Embassy in Washington, DC, sent a letter to Nigeria’s
Foreign Ministry, stating that the ”Nigerian government is aware of this letter and is working with the
NDFDC to resolve this matter.” The Embassy also stated that ”the Nigerian Government is committed to
ensuring that all Nigerian citizens are protected from any potential threats to their security.” In the end, it
was decided that this was a ”very serious matter” and that it would be ”necessary to take action to ensure
that no further information is leaked to any other country.” This is the first time that an oficial from
Nigeria has been sent this kind of letter. The Nigerians have been very vocal about their concerns, and
have even been quoted as saying that their government has ”no intention of sending any more information
to anyone.”</p>
    </sec>
    <sec id="sec-7">
      <title>7. Results and Discussion</title>
      <p>In this section, we present the results of the experiments and provide a detailed analysis of the
performance of the detection system. The evaluation was conducted using several metrics, including the
Brier score, C@1, F1 score, F0.5u, and the Mean score. These metrics provide an overall understanding
of how well the system can distinguish between human-authored and AI-generated texts.</p>
      <sec id="sec-7-1">
        <title>7.1. Evaluation Results</title>
        <sec id="sec-7-1-1">
          <title>The evaluation results are summarized in the table below:</title>
          <p>The system’s performance was evaluated using multiple metrics. The Brier score, which measures
the accuracy of probabilistic predictions by calculating the mean squared diference between predicted
probabilities and actual outcomes (with 1 for correct and 0 for incorrect), yielded a value of 0.51444. This
suggests moderately accurate probability estimates, though improvement is needed to refine prediction
confidence. The C@1 score evaluates how often the model’s top prediction is correct, and with a
score of 0.43666, the system correctly classified approximately 44% of the texts. While this shows basic
classification capability, it underscores the need for further accuracy improvements, particularly in
identifying AI-generated content.</p>
          <p>The F1 score, which balances precision and recall as the harmonic mean, was 0.53274. This indicates
the system performed moderately well at correctly identifying both human and AI texts, reflecting a fair
trade-of between false positives and false negatives. In contrast, the F0.5u score assigns greater weight
to precision than recall. With a score of 0.668, the model demonstrated a conservative classification
strategy, efectively minimizing false positives, particularly important when misclassifying AI-generated
content as human is costly. Finally, the Mean score, which aggregates the overall performance across
metrics, stood at 0.43034. This result highlights moderate performance, suggesting the model can
diferentiate between human and AI texts to some extent, but with clear potential for enhancement
across precision, recall, and confidence calibration.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Error Analysis</title>
      <p>While the model demonstrated moderate success in distinguishing between human-authored and
AI-generated texts, several factors contributed to its misclassifications:
• Overfitting to Training Data: The model’s performance may have been afected by overfitting,
particularly when the training data lacked diversity or was imbalanced toward a specific class.
This limits the model’s ability to generalize to novel cases, especially when AI-generated texts
contain stylistic elements underrepresented in the training set.
• Adversarial Text Manipulation: The adversarial setup involved introducing grammatical
errors, uncommon vocabulary, and irregular punctuation to disguise AI-generated text. These
manipulations significantly reduced model accuracy, highlighting its vulnerability to deliberate
attempts to evade detection.
• Genre Sensitivity: Performance varied across diferent text genres. The model performed
relatively well on structured genres like scientific exposition but struggled with more fluid styles
such as narrative fiction or personal correspondence. This indicates limited capacity to adapt to
stylistic variations.
• Precision–Recall Trade-Of: A stronger focus on achieving high precision (as reflected by the
F0.5u score) came at the expense of recall. As a result, the model often failed to correctly identify
some human-authored texts, reflecting a conservative bias in classification.
• Subtle Linguistic Markers: The model had dificulty recognizing nuanced linguistic cues
that distinguish AI from human writing. While it could capture general patterns—such as
sentence uniformity and mechanical structure in AI texts—it often missed finer stylistic deviations,
especially when AI-generated content was designed to mimic human imperfection.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Conclusion</title>
      <p>The findings of this study ofer valuable insights into the evolving landscape of AI-generated content
and the challenges it presents in distinguishing such content from human-authored text. By employing
GPT-2 as the generative model and augmenting its output with targeted post-processing techniques, we
aimed to simulate realistic, genre-aligned, and stylistically human-like texts. Our approach succeeded
in introducing imperfections and variability that made the generated texts harder to classify using
conventional detection models.</p>
      <p>The system achieved moderate performance across various metrics, notably an F1 score of 0.53274,
and an F0.5u score of 0.668, indicating a conservative model that prioritizes precision over recall.
These results highlight that while the model could reliably identify a portion of AI-generated texts, it
struggled in cases where human-like traits were intentionally infused into machine-generated outputs.
Furthermore, genre sensitivity and stylistic inconsistency were identified as key areas where detection
performance deteriorated.</p>
      <p>Overall, our participation in the ELOQUENT Lab 2025 task reafirms the growing sophistication
of generative language models and the urgent need for more advanced, adaptable, and explainable
detection mechanisms. It also underscores the critical ethical implications for domains such as academic
writing, journalism, and social media, where the credibility of content is paramount.
10. Future Work
The study opens several promising directions for future research and improvement. As generative
models continue to evolve rapidly, maintaining the efectiveness of detection frameworks requires a
shift towards more adaptive and nuanced strategies. Below are the key areas for future exploration:
• Adoption of Advanced Generative Models: Leveraging more powerful models such as GPT-3,
GPT-4, or LLaMA-3 could result in more contextually rich and stylistically diverse outputs. This
would help simulate even more challenging scenarios for detection systems and test their limits
more rigorously.
• Adversarial Training Techniques: Incorporating adversarial examples during model training
can enhance robustness. These techniques involve training classifiers to withstand stylistic
obfuscation, such as unnatural punctuation, inconsistent tone, or deliberate insertion of
humanlike errors, which were found to significantly reduce model accuracy in our current setup.
• Genre-Aware Classification: Since performance varied substantially across genres, future
systems should incorporate genre-specific features and classifiers. For instance, models could use
distinct detection pipelines for narrative fiction, formal reports, or informal social media text,
each accounting for the unique stylistic markers of that genre.
• Ensemble and Hybrid Models: Utilizing an ensemble of classifiers—such as combining
rulebased, statistical, and neural approaches—may ofer improved generalization and resilience
across diverse input types. Hybrid systems that integrate authorship verification with semantic
coherence models may further boost accuracy.
• Larger and More Balanced Datasets: Expanding the training dataset to include more balanced
examples of both AI and human-authored text from various domains and demographics can
improve generalization and reduce bias toward any single writing style or source.
• Explainability and Transparency: Future detection tools should incorporate Explainable AI
(XAI) methodologies to provide insights into the reasoning behind classification decisions. This
would not only improve trust in the system but also help developers identify weaknesses and
optimize detection strategies.
• Real-Time Deployment Readiness: Finally, a focus on building real-time detection systems
capable of operating in high-throughput environments such as news platforms or academic
publishers would ensure practical applicability and scalability of the research.</p>
      <p>In conclusion, addressing these directions will not only enhance the scientific and technical quality
of future systems but also contribute to a more transparent and secure digital ecosystem where the
origin and authenticity of content can be more reliably verified.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgements</title>
      <p>We would like to express our sincere gratitude to the organizers of the ELOQUENT Lab @ CLEF 2025
for designing such an insightful and timely shared task. Participating in this challenge provided us with
a valuable opportunity to explore the evolving boundaries of AI-generated text and its detectability in
real-world contexts.</p>
      <p>We are especially thankful to the CLEF community for providing robust infrastructure, clearly defined
evaluation protocols, and constructive feedback throughout the process. Their dedication to fostering
innovation in authorship verification and stylometry continues to inspire meaningful research.</p>
      <p>We would also like to acknowledge the support and encouragement from the Department of Computer
Science and Engineering, Jadavpur University. Special thanks to our mentors and peers for their valuable
discussions, which greatly contributed to the development and refinement of our system.</p>
      <p>Finally, we are grateful for the open-source tools and platforms, including Hugging Face Transformers
and Python libraries, that made this research accessible and reproducible.</p>
    </sec>
    <sec id="sec-11">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used OpenAI-GPT-4 in order to: Grammar and
spelling check. Further, the author(s) used MermaidChart for figures 3 in order to: Generate images.
After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.
[10] Z. Cheng, L. Zhou, F. Jiang, B. Wang, H. Li, Beyond binary: Towards fine-grained llm-generated
text detection via role recognition and involvement measurement, in: Proceedings of the ACM on
Web Conference 2025, ACM, 2025, p. 2677–2688.
[11] A. H.-C. Hwang, Q. V. Liao, S. L. Blodgett, A. Olteanu, A. Trischler, ”it was 80% me, 20% ai”:
Seeking authenticity in co-writing with large language models, Proceedings of the ACM on
Human-Computer Interaction 9 (2025) 1–41. doi:https://doi.org/10.1145/3628684.
[12] M. N. Hoque, T. Mashiat, B. Ghai, C. D. Shelton, F. Chevalier, K. Kraus, N. Elmqvist, The hallmark
efect: Supporting provenance and transparent use of large language models in writing with
interactive visualization, in: Proceedings of the 2024 CHI Conference on Human Factors in
Computing Systems, 2024, p. 1–15.
[13] M. Pividori, C. S. Greene, A publishing infrastructure for artificial intelligence (ai)-assisted
academic authoring, Journal of the American Medical Informatics Association 31 (2024) 2103–2113.
doi:https://doi.org/10.1093/jamia/ocae105.
[14] J. Bevendorf, B. Ghanem, A. Giachanou, M. Kestemont, E. Manjavacas, I. Markov, M. Mayerl,
M. Potthast, F. Rangel, P. Rosso, G. Specht, E. Stamatatos, B. Stein, M. Wiegmann, E. Zangerle,
Overview of pan 2020: Authorship verification, celebrity profiling, profiling fake news spreaders
on twitter, and style change detection, in: Experimental IR Meets Multilinguality, Multimodality,
and Interaction. 11th International Conference of the CLEF Initiative (CLEF 2020), volume 12260 of
Lecture Notes in Computer Science, Springer, 2020, pp. 372–383. doi:10.1007/978- 3- 030- 58219- 7_
25.
[15] J. Bevendorf, B. Chulvi, G. L. D. L. P. Sarracén, M. Kestemont, E. Manjavacas, I. Markov, M. Mayerl,
M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann, M. Wolska, E. Zangerle,
Overview of pan 2021: Authorship verification, profiling hate speech spreaders on twitter, and
style change detection, in: 12th International Conference of the CLEF Association (CLEF 2021),
Springer, 2021. doi:10.1007/978- 3- 030- 85251- 1_26.
[16] J. Bevendorf, B. Chulvi, E. Fersini, A. Heini, M. Kestemont, K. Kredens, M. Mayerl, R.
OrtegaBueno, P. Pezik, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann, M. Wolska,
E. Zangerle, Overview of pan 2022: Authorship verification, profiling irony and stereotype
spreaders, and style change detection, in: Experimental IR Meets Multilinguality, Multimodality,
and Interaction. 13th International Conference of the CLEF Association (CLEF 2022), volume 13186
of Lecture Notes in Computer Science, Springer, 2022. doi:10.1007/978- 3- 031- 13643- 6.
[17] J. Bevendorf, I. Borrego-Obrador, M. Chinea-Ríos, M. Franco-Salvador, M. Fröbe, A. Heini, K.
Kredens, M. Mayerl, P. Pęzik, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann,
M. Wolska, E. Zangerle, Overview of pan 2023: Authorship verification, multi-author writing
style analysis, profiling cryptocurrency influencers, and trigger detection, in: Experimental IR
Meets Multilinguality, Multimodality, and Interaction. 14th International Conference of the CLEF
Association (CLEF 2023), volume 14163 of Lecture Notes in Computer Science, Springer, 2023, pp.
459–481. doi:10.1007/978- 3- 031- 42448- 9_29.
[18] J. Bevendorf, M. Wiegmann, J. Karlgren, L. Dürlich, E. Gogoulou, A. Talman, E. Stamatatos,
M. Potthast, B. Stein, Overview of the “voight-kampf” generative ai authorship verification
task at pan and eloquent 2024, in: Working Notes of CLEF 2024 – Conference and Labs of
the Evaluation Forum, volume 3740 of CEUR Workshop Proceedings, 2024, pp. 2486–2506. URL:
http://ceur-ws.org/Vol-3740/paper-225.pdf.
[19] J. Bevendorf, M. Wiegmann, E. Richter, M. Potthast, B. Stein, The two paradigms of llm detection:
Authorship attribution vs. authorship verification, in: Findings of the 63rd Annual Meeting of the
Association for Computational Linguistics (ACL 2025), Association for Computational Linguistics,
2025.
[20] J. Bevendorf, Y. Wang, J. Karlgren, M. Wiegmann, M. Fröbe, A. Tsivgun, J. Su, Z. Xie, M. Abassy,
J. Mansurov, R. Xing, M. Ta, K. Elozeiri, T. Gu, R. Tomar, J. Geng, E. Artemova, A. Shelmanov,
N. Habash, E. Stamatatos, I. Gurevych, P. Nakov, M. Potthast, B. Stein, Overview of the
“voightkampf” generative ai authorship verification task at pan and eloquent 2025, in: Working Notes of</p>
    </sec>
    <sec id="sec-12">
      <title>Online Resources</title>
      <sec id="sec-12-1">
        <title>The source code for our system are available via:</title>
        <p>• Hugging Face.
• GitHub.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. B.</given-names>
            <surname>Sardinha</surname>
          </string-name>
          ,
          <article-title>Ai-generated vs human-authored texts: A multidimensional comparison</article-title>
          ,
          <source>Applied Corpus Linguistics</source>
          <volume>4</volume>
          (
          <year>2024</year>
          )
          <article-title>100083</article-title>
          . doi:https://doi.org/10.1016/j.acorp.
          <year>2023</year>
          .
          <volume>100083</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Hakam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Prill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Korte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lovreković</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostojić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ramadanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Muehlensiepen</surname>
          </string-name>
          ,
          <article-title>Humanwritten vs ai-generated texts in orthopedic academic literature: Comparative qualitative analysis</article-title>
          ,
          <source>JMIR Formative Research</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <article-title>e52164</article-title>
          . doi:https://doi.org/10.2196/52164.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Matsubara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Matsubara</surname>
          </string-name>
          ,
          <article-title>What's the diference between human-written manuscripts versus chatgpt-generated manuscripts involving “human touch”?</article-title>
          ,
          <source>Journal of Obstetrics and Gynaecology Research</source>
          <volume>51</volume>
          (
          <year>2025</year>
          )
          <article-title>e16226</article-title>
          . doi:https://doi.org/10.1111/jog.16226.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Alvero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Regla-Vargas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Kizilcec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Antonio</surname>
          </string-name>
          ,
          <article-title>Large language models, social demography, and hegemony: Comparing authorship in human and synthetic text</article-title>
          ,
          <source>Journal of Big Data</source>
          <volume>11</volume>
          (
          <year>2024</year>
          )
          <article-title>138</article-title>
          . doi:https://doi.org/10.1186/s40537-024-00943-4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mindzak</surname>
          </string-name>
          ,
          <article-title>Who wrote this? detecting artificial intelligence-generated text from human-written text</article-title>
          ,
          <source>Canadian Perspectives on Academic Integrity</source>
          <volume>7</volume>
          (
          <year>2024</year>
          ). doi:https://doi. org/10.11575/cpai.v7i1.
          <fpage>77894</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Alhijawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jarrar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>AbuAlRub</surname>
          </string-name>
          , A. Bader,
          <article-title>Deep learning detection method for large language models-generated scientific content</article-title>
          ,
          <source>Neural Computing and Applications</source>
          <volume>37</volume>
          (
          <year>2025</year>
          )
          <fpage>91</fpage>
          -
          <lpage>104</lpage>
          . doi:https://doi.org/10.1007/s00521-023-09047-4.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Maktabdar Oghaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Babu</given-names>
            <surname>Saheer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dhame</surname>
          </string-name>
          , G. Singaram,
          <article-title>Detection and classification of chatgpt-generated content using deep transformer models</article-title>
          ,
          <source>Frontiers in Artificial Intelligence</source>
          <volume>8</volume>
          (
          <year>2025</year>
          )
          <article-title>1458707</article-title>
          . doi:https://doi.org/10.3389/frai.
          <year>2024</year>
          .
          <volume>1458707</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Prajapati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Baliarsingh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhoi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Mohanty</surname>
          </string-name>
          ,
          <article-title>Detection of ai-generated text using large language model</article-title>
          ,
          <source>in: 2024 International Conference on Emerging Systems and Intelligent Computing (ESIC)</source>
          , IEEE,
          <year>2024</year>
          , p.
          <fpage>735</fpage>
          -
          <lpage>740</lpage>
          . doi:https://doi.org/10.1109/ESIC60471.
          <year>2024</year>
          .
          <volume>10412345</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Soto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Andrews</surname>
          </string-name>
          ,
          <article-title>Few-shot detection of machinegenerated text using style representations</article-title>
          ,
          <source>arXiv preprint arXiv:2401.06712</source>
          (
          <year>2024</year>
          ). doi:https: //arxiv.org/abs/2401.06712. CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stakovskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-author writing style analysis, multilingual text detoxification, oppositional thinking analysis, and generative AI authorship verification</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>D. E.</given-names>
          </string-name>
          <string-name>
            <surname>Losada</surname>
          </string-name>
          , H. Müller (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 15th International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), volume
          <volume>14959</volume>
          of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg,
          <year>2024</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>259</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 71908- 0_
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dürlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gogoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Talman</surname>
          </string-name>
          , E. Stamatatos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “voight-kampf” generative ai authorship verification task at pan and eloquent 2024</article-title>
          , in: Working Notes of CLEF 2024-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , Grenoble, France,
          <year>2024</year>
          . URL: https://pan.webis.de, cEURWS.org, ISSN
          <volume>1613</volume>
          -0073.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsivgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mansurov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Elozeiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the “voightkampf” generative ai authorship verification task at pan and eloquent 2025</article-title>
          , in: Working Notes of CLEF 2025-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , Madrid, Spain,
          <year>2025</year>
          . URL: https://pan.webis.de, cEUR-WS.org, ISSN
          <volume>1613</volume>
          -0073.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [24] MermaidChart, MermaidChart - Online
          <source>Editor for Mermaid Diagrams</source>
          ,
          <year>2025</year>
          . URL: https:// mermaidchart.com/, accessed:
          <fpage>2025</fpage>
          -07-06.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>