<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Z. Su);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Hierarchical Generative Plagiarism Detection Method</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zongbao Su</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yong Han</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yihao Jia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leilei Kong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Generative Plagiarism Detection is a challenging task that requires systems not only to identify literal reuse but also to detect semantically similar text segments with diferent surface expressions. In the Generative Plagiarism Detection task at PAN@CLEF 2025, we propose a hierarchical detection approach. Our method integrates multiple embedding models and semantic similarity evaluation mechanisms to efectively identify complex paraphrased content. Specifically, we employ Sentence-BERT, MPNet, and TF-IDF to perform sentence-level vectorization of both suspicious and source documents, independently generating candidate pairs based on similarity scores. These candidate sets are then merged through a multi-strategy fusion mechanism. Furthermore, a fine-tuned BERT model is used to verify semantic similarity, enhancing the system's ability to detect generative paraphrasing. The final system outputs aligned text segments with high confidence. Experimental results demonstrate that our hierarchical matching strategy exhibits robustness and generalization across multiple evaluation metrics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Generative Plagiarism Detection</kwd>
        <kwd>large language model</kwd>
        <kwd>Hierarchical</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Plagiarism detection, or more broadly, text reuse detection, has long been a critical research area in
natural language processing and information retrieval. With the rapid progress of large language
models (LLMs) and the widespread adoption of generative AI, a new and more challenging form of
plagiarism—generated plagiarism—has emerged. This type of plagiarism often preserves semantic
meaning while rephrasing the original content, making it dificult for traditional surface-level similarity
methods to identify such rewritten segments [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Moreover, the proliferation of social media, Q&amp;A
forums, and academic writing assistance tools has further increased the risk of large-scale text reuse
generated by machines [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The PAN (Plagiarism, Authorship, and Near-Duplicate Detection) shared task series has played a
leading role in advancing research in this domain. In recent years, PAN has significantly increased
task dificulty—from machine-translated plagiarism and synonym substitution to now confronting
systems with generated plagiarism detection challenges (PAN 2025). The current task requires systems
to identify aligned text fragments rewritten and reused from source documents using large language
models [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>To this end, we propose a hierarchical detection strategy that integrates multiple semantic
representations and verification mechanisms for the PAN@CLEF 2025 Generated Plagiarism Detection task.
The motivation behind this design is to improve candidate fragment coverage by combining various
similarity evaluation methods, while using a semantic verification model to filter out false matches,
thereby achieving a better balance between precision and recall.</p>
      <p>Multi-Type N-Gram +
Elliptical Clustering
Hybrid Architecture
(Alignment + Clustering)
TER-p + Bigram N-Gram
Dual Strategy
CoReMo 2.3 Self-Tuning</p>
      <p>Model</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Plagiarism detection research has evolved through multi-stage frameworks and adaptive strategies to
address diverse obfuscation techniques. We referred to some papers from the 2014 CLEF competition
and compared them, as shown in Table 1.Table 1 summarizes five prominent methods from that
competition [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9</xref>
        ].Finally, a hierarchical generative plagiarism detection method was proposed.
• Fusion of regular n-grams,
stopword n-grams, named entity
ngrams, and context-aware features
• Noise-sensitive elliptical clustering
      </p>
      <p>for feature aggregation
• VSM cosine similarity for result
ver</p>
      <p>ification
• Text alignment via
Smith</p>
      <p>Waterman algorithm for ordered
plagiarism
• Clustering with Jaccard coeficient
for non-ordered cases (e.g.,
summaries)
• Tiered content word thresholds for</p>
      <p>precision-recall balance
• TER-p (machine translation metric)</p>
      <p>for strict sentence-level matching
• Bigram n-gram for fragmented
pla</p>
      <p>giarism detection
• Result merging with 80-character</p>
      <p>gap threshold
• Extended Contextual N-grams
(XCTnG) allowing skip words for
word order adjustment
• Dynamic parameter adjustment
based on suspicious/source
document length ratio</p>
      <p>Adaptive Strategy
Switches to variant B (optimized for
summary obfuscation) when suspicious
document length is significantly shorter than
source document
Dynamically selects from 4 preset
strategies based on global noise level and cluster
characteristics
Clustering activated when alignment fails,
with adaptive parameters for diferent
obfuscation types
Balances precision (TER-p ≥ 0.9) and recall
(bigram n-gram) for diferent obfuscation
levels
3-stage rules (e.g., 8% filtering distance for
susp/src &lt; 1.6) for cross-corpus adaptability</p>
      <p>The proposed “hierarchical generative plagiarism detection method” sets up triple similarity filtering
and BERT verification by referencing and analyzing the above methods, so as to carry out more rigorous
and accurate plagiarism detection.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>Our system adopts a multi-stage processing pipeline that integrates various state-of-the-art natural
language processing techniques and similarity computation methods. It consists of four main stages: text
preprocessing and sentence segmentation, multi-model vector representation, hierarchical similarity
matching with block merging, and result output. The following sections provide a detailed description
of the entire processing workflow.</p>
      <sec id="sec-3-1">
        <title>3.1. Text Preprocessing and Sentence Segmentation</title>
        <p>
          Given a pair of input documents (a suspicious document and a source document), we first perform text
preprocessing and sentence segmentation:
1. Use the English language model en_core_web_sm from the spaCy library (version 3.8.5) for
accurate sentence segmentation [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
2. Record character-level ofsets (start position and length) for each sentence.
3. Normalize sentence text (e.g., strip leading/trailing whitespace).
        </p>
        <p>The output of the preprocessing stage includes:
1. A list of sentences: textual content of all sentences in the document.
2. A list of ofsets: positional information (start ofset, length) of each sentence in the original text.
3. Original raw text: retained for precise ofset calculation in later stages.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Multi-Model Vector Representation</title>
        <p>To comprehensively capture the semantic features of text, we employ three diferent vectorization
methods:</p>
        <sec id="sec-3-2-1">
          <title>1. Sentence-BERT encoding</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>2. MPNet encoding</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>3. TF-IDF representation</title>
          <p>
            (a) Encode sentences into semantic vectors using a pre-trained Sentence-BERT model [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] from
          </p>
          <p>
            HuggingFace (all-MiniLM-L6-v2).
(b) Generate 384-dimensional dense vector representations.
(c) Build eficient vector indices using the FAISS library (version 1.9.0) [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
(a) Use a pre-trained MPNet model [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] from HuggingFace (all-mpnet-base-v2) to obtain
alternative semantic embeddings.
(b) Serve as a complement to Sentence-BERT, ofering a diferent semantic perspective.
(c) Also indexed with FAISS for fast similarity search.
(a) Generate sparse feature vectors using TF-IDF with 1–2 grams.
          </p>
          <p>(b) Capture lexical-level frequency-based features.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Hierarchical Similarity Matching with Block Merging</title>
        <p>To achieve eficient and accurate sentence-level plagiarism detection, we propose a hierarchical similarity
matching algorithm that integrates multiple similarity metrics in a tiered decision structure.</p>
        <p>Given two sentence sets from a suspicious and a source document, we compute similarity scores
using three distinct methods:
• Sentence-BERT similarity (simSBERT): We encode sentences using a pre-trained
SentenceBERT model and compute cosine similarity between their 384-dimensional embeddings. A
sentence pair is considered matched if simSBERT(, ) &gt;  (where  = 0.35).
• MPNet similarity (simMPNet): We encode the same sentences using the MPNet model and
compute cosine similarity between embeddings. A match is accepted if simMPNet(, ) &gt; 
(where  = 0.50).
• TF-IDF similarity (simTFIDF): Using 1–2 gram TF-IDF vectors, we compute cosine similarity
between sparse representations. A match is accepted if simTFIDF(, ) &gt;  (where  = 0.55).</p>
        <p>These three methods are applied in sequence, and the first method to surpass its threshold results in
early acceptance. If none of the methods exceed their thresholds, we invoke a fallback scoring procedure
using a fine-tuned BERT model:
• BERT re-evaluation (simBERT): A sentence pair (, ) is passed to a pairwise classifier based
on BERT, and the predicted similarity score is used. If simBERT(, ) &gt;  (with  = 0.45), the
match is accepted.To reduce computational overhead, the BERT re-evaluation is applied only as a
fallback mechanism. Specifically, the set of candidate sentences  is constructed by taking the
union of the top- most similar sentences retrieved by Sentence-BERT, MPNet, and TF-IDF. This
ensures that BERT operates exclusively on a compact, high-quality candidate pool, balancing
precision with eficiency.</p>
        <p>Once sentence-level matches are identified, we apply a block merging algorithm to group adjacent
matches into longer plagiarized spans. Two sentence pairs (, ) and (+1, +1) are considered part
of the same block if:
1. +1 &gt;  and +1 &gt;  (strictly increasing order),
2. +1 −  ≤  and +1 −  ≤  (gap constraints),
3. The resulting block contains at least  matched pairs (length constraint).</p>
        <p>This merging logic is encapsulated in a function isABlock(), which determines whether two pairs
can be joined based on the above criteria. The final result is a set of merged blocks indicating contiguous
regions of potential plagiarism.</p>
        <p>Our approach combines deep learning-based models (Sentence-BERT, MPNet) with traditional TF-IDF
features to perform complementary semantic and surface-level analysis. The fine-tuned BERT classifier
is used to resolve borderline cases, while the multi-stage merging strategy significantly improves the
detection of continuous plagiarized segments.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Settings</title>
        <p>To assess the performance of our hybrid similarity computation and block merging algorithm, we used
the following hyperparameters:
• Threshold  for Sentence-BERT similarity: 0.35
• Threshold  for MPNet similarity: 0.50
• Threshold  for TF-IDF similarity: 0.55
• Fallback threshold  for BERT similarity: 0.45
• Maximum allowed position gap  for block merging: 5
• Minimum block length  : 2
• Top- retrieved candidates per method: 5</p>
        <p>These parameters were selected empirically based on performance on the training set. Sentence
pairs passing any of the first three thresholds are accepted. If none qualify, BERT re-evaluation with
threshold  is applied. Finally, adjacent sentence matches are merged into blocks using a context-aware
policy governed by  and</p>
        <p>All experimental code and configurations used in this study have been released as open-source and
are available at: https://github.com/CCheZi/Generative-Plagiarism-Detection</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>Our system achieved a Plagdet score of 0.496 on the
llm-plagiarism-detection-spot-check-20250521training dataset. Table 2 shows the detailed evaluation performance, including precision, recall, and
granularity. These results demonstrate the efectiveness of the hybrid similarity scoring and block
merging strategy.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>Our plagiarism detection system achieves initial detection capabilities by integrating multi-model
semantic representations (Sentence-BERT, MPNet) with traditional TF-IDF features, combined with a
hybrid similarity computation strategy. While the current method demonstrates basic efectiveness,
there is still considerable room for improvement. Experimental results show that the system performs
well in detecting overt plagiarism (e.g., verbatim copying), but remains limited in handling texts that
have undergone complex rewrites.</p>
      <p>The main contributions of this work include: (1) Multi-model fusion architecture: This is the first
approach to compute Sentence-BERT, MPNet, and TF-IDF in parallel, enabling early-stage matching
decisions through a threshold-based mechanism; (2) Hybrid similarity computation: A three-stage
matching strategy is designed—direct matching → candidate merging → BERT-based verification—to
balance eficiency and accuracy; (3) Dynamic block merging algorithm: Non-contiguous plagiarized
segments are handled via adjustable gap parameters (max_susp_gap/max_src_gap), allowing more
lfexible detection.</p>
      <p>For future work, we plan to optimize the current method from the following aspects: (1) Parameter
tuning: Currently, most parameters are heuristically set. We plan to adopt more advanced parameter
optimization algorithms, such as genetic algorithms, to achieve global optimization. This will enable
more precise adaptation to the characteristics of diferent corpora, thereby improving detection
performance; (2) Incorporation of linguistic features: To better handle paraphrasing and semantic shifts, we
aim to incorporate linguistically informed techniques, such as semantic role labeling and dependency
parsing. These methods can help capture deeper semantic similarities between texts and reduce false
positives; (3) Utilization of contextual information: The current detection approach primarily focuses
on sentence-level similarity and overlooks document-level context. We will explore incorporating
contextual information, taking into account sentence positioning and discourse context to further
enhance detection accuracy.</p>
      <p>In conclusion, although this study has made progress in plagiarism detection, there is still room for
performance improvement. With the proposed enhancements, we believe our approach can be further
refined to become more competitive for future real-world applications.
This work is supported by the National Social Science Foundation of China (Grant No. 22BTQ101).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT-4 for the following activities: content
drafting, grammar and spelling check, paraphrasing, and rewriting. After using this tool/service, the
authors reviewed and edited the content as needed and take full responsibility for the publication’s
content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Martí</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>Plagiarism meets paraphrasing: Insights for the next generation of automatic plagiarism checkers</article-title>
          ,
          <source>Computational Linguistics</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Alzahrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Salim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abraham</surname>
          </string-name>
          ,
          <article-title>Understanding plagiarism: Linguistic patterns, textual features, and detection methods</article-title>
          ,
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the pan 2023 shared tasks on digital text forensics</article-title>
          , in: Working Notes of CLEF,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>PAN</given-names>
            <surname>@CLEF Lab</surname>
          </string-name>
          , Generated plagiarism detection - pan
          <source>at clef</source>
          <year>2025</year>
          , https://pan.webis.de/clef25/ pan25-web/generated-plagiarism-detection,
          <year>2025</year>
          . Accessed: May 28,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanchez-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          ,
          <article-title>A winning approach to text alignment for text reuse detection at pan 2014</article-title>
          , in: Working Notes for CLEF,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Palkovskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Belov</surname>
          </string-name>
          ,
          <article-title>Developing high-resolution universal multi-type n-gram plagiarism detector</article-title>
          , in: Working Notes for CLEF,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Glinos</surname>
          </string-name>
          ,
          <article-title>A hybrid architecture for plagiarism detection</article-title>
          , in: Working Notes for CLEF,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shrestha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maharjan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Solorio</surname>
          </string-name>
          ,
          <article-title>Machine translation evaluation metric for text alignment</article-title>
          , in: Working Notes for CLEF,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rodríguez Torrejón</surname>
          </string-name>
          ,
          <source>J. Martín Ramos, Coremo</source>
          <volume>2</volume>
          .
          <article-title>3 plagiarism detector text alignment module</article-title>
          , in: Working Notes for CLEF,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Explosion</surname>
            <given-names>AI</given-names>
          </string-name>
          , spacy: Industrial-strength
          <source>natural language processing in python (version 3.8.5)</source>
          , https://spacy.io,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , M. Douze,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jégou</surname>
          </string-name>
          ,
          <article-title>Billion-scale similarity search with gpus</article-title>
          ,
          <source>IEEE Transactions on Big Data</source>
          (
          <year>2019</year>
          ). FAISS Library: https://github.com/facebookresearch/faiss.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          , T.-Y. Liu,
          <article-title>Mpnet: Masked and permuted pre-training for language understanding</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of pan 2025:
          <article-title>Voight-kampf generative ai detection, multilingual text detoxification, multi-author writing style analysis, and generative plagiarism detection</article-title>
          ,
          <source>in: CLEF 2025, Lecture Notes in Computer Science</source>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Continuous integration for reproducible shared tasks with tira</article-title>
          .
          <source>io, in: Advances in Information Retrieval. 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Springer,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Wahle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ruas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aizawa</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Potthast, Overview of the generative plagiarism detection task at pan 2025</article-title>
          , in: Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>