<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TextDetox CLEF 2025/Multilingual Text Detoxification 2025 Jiaozipi: A Multilingual Text Detoxification Method Based on Large Language Model-Based Ensemble Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiaohui Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yusheng Yi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhaotian Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simin Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zijun Ke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xin Guo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yubo Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenxuan Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiayi Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yong Han</string-name>
          <email>hanyong2005@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper proposes a solution for the multilingual text detoxification task at CLEF 2025. The task requires detoxification of explicit toxic texts across 15 languages while saving the main content as much as possible. To address the task, we propose a solution based on prompt engineering and ensemble of LLMs. As a first step, we extend the oficial dataset to construct a parallel text detoxification dataset and a toxic keywords list. We first employ the RISE prompting framework to generate initial system instructions. These instructions, combined with few-shot examples and user input, form structured prompts that guide multiple commercial large language models (DeepSeek, Qwen, Kimi) to produce detoxified outputs. Finally, the best results are selected via multi-dimensional evaluation considering semantic preservation, toxicity reduction, style consistency, and fluency. Our method is ranked 9th in automatic evaluation metrics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2025</kwd>
        <kwd>multilingual text detoxification</kwd>
        <kwd>large language model</kwd>
        <kwd>RISE</kwd>
        <kwd>Few-shot Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the rapid development of social media, toxic texts on online platforms have increased sharply,
including racial discrimination remarks, personal attacks, hate speech, and other inappropriate content.
To address this issue, text detoxification has been proposed as an intervention approach grounded
in natural language generation. The advanced approach of the text detoxification primarily employs
deep learning models to automatically detect toxic elements in text, such as insulting or discriminatory
expressions.Then deep learning models transform them into neutral formulations that preserve the
original semantic [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The task of multilingual text detoxification at CLEF 2025 [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] aims at presenting a neutral version
of a user message which preserves the original meaning. This task covers 15 languages, including
highresource languages such as English, Chinese, and Spanish, as well as low-resource or morphologically
complex languages such as Amharic and Tatar.
      </p>
      <p>The challenge of this task is implicit types toxiciy —like sarcasm, passive aggressiveness, or direct
hate to some group where no neutral content can be found. Such implicit toxicity types are challenging
to be detoxified so the intent will indeed become non-toxic.</p>
      <p>CLEF 2025 Working Notes, 9 – 12 September 2025, Madrid, Spain
* Corresponding authors: Yi is the first corresponding author and Han is the second corresponding author.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Relate Work</title>
      <p>
        Text detoxification tasks aim to convert toxic text into neutral expressions while preserving the original
meaning. In 2024, Peng et al. proposed a method based on a few-shot learning and the CO-STAR
framework, combined with chat models like Kimi for multilingual text detoxification. By generating
fewshot learning contexts and structured prompts, this approach significantly improved the detoxification
performance in high-resource languages like English and Chinese. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the same year, Řehulka and
Šuppa explored retrieval-augmented generation (RAG) and dynamic prompt construction to enrich large
language models (LLMs) with external knowledge, achieving competitive results in multilingual
detoxiifcation tasks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, for some low-resource languages such as Amharic, the lack of suficient
training data substantially limited performance. Therefore, they adopted a deletion approach, directly
removing toxic keyword to ensure detoxification efectiveness. These approaches demonstrate notable
progress in detoxification for high-resource languages, but their efectiveness remains constrained
by limited multilingual training data. Efectively leveraging existing data to improve detoxification
performance on low-resource languages remains a challenge.
      </p>
      <p>
        LLMs, pretrained on massive corpora via self-supervised learning, acquire broad and emergent
linguistic capabilities. However, achieving strong performance on specific downstream tasks often
necessitates fine-tuning, which require substantial annotated data and computational resources. In
contrast, prompt engineering enables the activation of latent model capabilities by designing efective
prompt instructions. The prompt instructions can improve the relevance, coherence, and accuracy of
model outputs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Prompt engineering is a systematic approach to designing, writing, and optimizing input prompts for
LLMs to guide them in producing expected output. To enhance the efectiveness of prompt engineering,
various prompt frameworks have been proposed. For example, chain-of-thought (CoT) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and few-shot
prompting [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] improve the interpretability and adaptability of LLMs in logical reasoning tasks and
low-data scenarios by guiding models to break down complex problems or provide example references.
      </p>
      <p>
        Although prompt engineering has enhanced the ability of LLMs to perform text detoxification tasks,
single models still face challenges such as output instability and residual toxicity. Ensemble learning is
a method that integrates the predictions of multiple base models to improve the robustness, accuracy,
and generalization ability of the system [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. However, existing ensemble methods are often static,
relying on simple strategies such as majority voting or average scoring, which limits their flexibility
and efectiveness in complex generation tasks like text detoxification.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Datasets</title>
      <p>
        In this task, we need to detoxify 15 languages. However, the provided parallel text detoxification
dataset1 [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ] covers only 9 languages. Therefore, we used Yuanbao AI2 to translate the English
portion of the parallel text detoxification dataset into Italian, French, Hebrew, Hinglish, Japanese, and
Tatar, with 100 translations for each language. The process we performed is shown in table 1.
      </p>
      <sec id="sec-3-1">
        <title>1https://huggingface.co/datasets/textdetox/multilingual paradetox 2https://yuanbao.tencent.com/</title>
        <p>Although the coverage ability of current mainstream commercial LLMs on parallel text detoxification
dataset has significantly improved, they still fail to recognize toxic keywords with cultural dependence,
semantic ambiguity or distorted expressions.</p>
        <p>To enhance the ability to recognize fine-grained toxic texts, we attempt to extract toxic words using
the Toxic Keywords3 [13, 14] provided in the task introduction. But it is insuficient to support the
replacement of toxic text, because there are fewer words in it for example, Amharic toxic keywords
have only 245 records. So we extract the negative words in Toxic Span4 [15] and merged them with
Toxic Keywords. The process that we extract negative words is shown in table 2.
all you trump c∗owns are seriously m∗ssed up. c∗owns,m∗ssed
allowing whole colonies of such r∗bbish to arise should r∗bbish, p∗nishable, f∗ring
be p∗nishable by f∗ring the oficials.
almost as f∗cked up as the cia funding and arming f∗cked up
bin laden.
amy , your ignorance is showing again. i∗norance
and start sending cunts home. c∗nts</p>
        <p>Note:Negative Connotations are what we need to extract and merge with Toxic Keywords .</p>
        <p>As a first step, we generate a parallel text detoxification dataset and a toxic keywords list from the
oficial dataset.</p>
        <p>Ultimately, we obtained the datasets and toxic keywords lists of 15 languages, as follows:
• The extended datasets: There are 100 samples each for Italian(it), French(fr), Hebrew(he),
Hinglish(hin), Japanese(ja), and Tatar(tt), and 400 samples each for English(en), Spanish(es),
German(de), Chinese(zh), Arabic()ar, Hindi(hi), Ukrainian(uk), Russian(ru), and Amharicen(am).
These samples will be provided as examples to the large model for the optimization of the model’s
output.
• Toxic Keywords List: The summary of each language entry is in Table 3. These toxic keywords
will be replaced with * in the toxic sentence. The replaced sentence is called the toxic voc replaced
result below.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <p>Our method consists of three main steps: 1) constructing prompt using the RISE framework, 2) inputting
toxic sentences into three LLMs (Kimi5, DeepSeek6, and Qwen7) to generate detoxification results, 3)
putting the detoxification results of the large models and the toxic voc replaced results into Qwen for
quality evaluation and finally return the best result as the output.</p>
      <sec id="sec-4-1">
        <title>3https://huggingface.co/datasets/textdetox/multilingual toxic lexicon</title>
        <p>4https://huggingface.co/datasets/textdetox/multilingual toxic spans
5https://www.kimi.com
6https://chat.deepseek.com
7https://www.tongyi.com/</p>
        <sec id="sec-4-1-1">
          <title>4.1. Constructing input texts</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>1. Input prompt guided by The RISE framework</title>
        <p>Practical prompt construction is essential for eliciting optimal responses from LLMs. The RISE
framework serves as our structured template for prompt design, as illustrated in Figure 2. Its
operational methodology for this task is delineated as follows: Role (R):The model is required to
function as a domain expert in linguistic processing, specifically tasked with text detoxification.
Input (I):The source material consists of toxic text along with supplementary contextual data,
utilized for model training and refinement. Steps (S): A systematic approach—comprising keyword
elimination and syntactic optimization—is employed to ensure precision and operational feasibility.
Expectation (E):The output must preserve the original meaning while achieving semantic
equivalence, linguistic fluency, and formal coherence. The response shall include a JSON output
format like [toxic sentence: "", neutral sentence: "", lang: ""]
2. Generate few-shot learning context</p>
        <p>This section shows how we generate the contents of a few-shot learning context.
Task Demonstration: to assist LLMs in accurately understanding the task requirements, we
provide a brief description of the task.</p>
        <p>Few-shot learning content: to help the model understand the neutral version of toxic text, we
provide few-shot learning content. This content contains toxic sentences and their corresponding
neutral sentence pairs in parallel text detoxification dataset of the target language. Figure 4 shows
an example of English (en), and the processing methods for other languages are the same. The
parallel text detoxification dataset of each language is stored in dictionary form, making it easier
to call up small sample learning content in the corresponding language later.
3. Input toxic sentences
As Figure 5 shows, we insert a toxic sentence into template &lt; |toxic sentence| &gt;&lt; |toxic sentence| &gt;,
and send it to the large language model. With the help of few-shot learning and prompts based
on the RISE framework, the large language model will return formatted neutral sentences, Figure
5 demonstrates the real detoxification process.</p>
        <sec id="sec-4-2-1">
          <title>4.2. Evaluation:</title>
          <p>In this section, we introduce how to use the Qwen model as an evaluator to evaluate and select optimal
detoxification results.</p>
          <p>1. Input Prompt of evaluation
This prompt involves selecting the optimal detoxified output from a list of candidate texts for
a large language model. Our selection criteria and weightings are as follows: lowest toxicity
score(weight: 0.3); highest semantic similarity to the original text (weight: 0.4); fluency and
naturalness of the generated sentences(weight: 0.2); consistency in style(weight: 0.1). And we
required a JSON output format like [toxic sentence: "", neutral sentence: "", lang: ""]
2. The evaluation process of large models
We inserted the toxic sentence, the list of neutral sentences and corresponding language into
template like Figure 7, and send it to the Qwen model. With the help of prompt, the large language
model returns formatted neutral sentences. Figure 7 demonstrates the real valuation process.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiment</title>
      <sec id="sec-5-1">
        <title>5.1. Settings</title>
        <p>For all 15 languages, we repeat the following steps:
1. Input of the few-shot learning context : Construct the few-shot learning context using datasets
and input it into the model. For diferent languages, replace the context sample information and
language identifiers accordingly.
2. Input prompts guided by the RISE framework : Input the prompt words into the large model
to guide it to generate the correct output.
3. Input of toxic sentence : Embed the toxic text between &lt;toxic sentence&gt; and &lt;toxic sentence&gt; of
the framework (as shown in Figure 5), and then input it into the large language model.
4. Evaluate: The result of DeepSeek, Qwen and Kimi, and the toxic voc replaced results were input
into the Qwen model for evaluation (as shown in Figure 7). Finally, best result as the output was
returned.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Result</title>
        <p>We applied our method to conduct systematic comparative experiments based on oficial datasets. In the
comparative experiment of the prompt framework, in order to control the variables, the experiment first
ifxed the single model benchmark, and used DeepSeek for the large language model. the prompt
framework combined with word replacement strategy was uniformly applied. We focus on the performance
diferences between the COSTAR framework and the RISE framework. Experimental data show that the
RISE framework shows significant advantages in the core indicators, with an AvgP value of 0.636 and
an AvgNP value of 0.565, compared with the corresponding index values of the COSTAR framework
of 0.623 and 0.553 respectively (see Table 4 for details). Based on this empirical result, we decided to
use the RISE framework as the prompt framework of the large model in the follow-up experiments to
ensure the best detoxification efect of the experiment.
1. Single large model + Prompt + Word replacement: Using a single large model combined
with prompt framework, targeted vocabulary replacement is performed on the parts with poor
results to maintain the basic semantic structure after context processing;
2. Single large model + Prompt + Back-translation: Using a single large model combined
with prompt framework, through cross-language conversion and secondary detoxification of the
preliminary results to improve the efect of multilingual detoxification;
3. Single large model + Prompt + Translation Detoxification: Using a single large model
combined with a prompt framework, first perform language conversion for the weak language of
the model, then uniformly use the large model for detoxification, and finally translate the specific
language back to the original language type;
4. Multiple large models + Prompt + Word replacement: Integrate the detoxification results
output by multiple large models, and select the optimal detoxification text results in combination
with word replacement;</p>
        <p>As Table 5 shows that in the detoxification scheme using a single model, Strategy 1 which combines the
prompt framework and word replacement strategy, exhibits the best decontamination efect. Compared
to the other two methods, this method demonstrates significant advantages in six languages: German
(de), Arabic (ar), Ukrainian (uk), Russian (ru), Tatar (tt), and Hinglish (hin), with both AvgP (0.636)
and AvgNP (0.572) metrics outperforming those of other single model methods. In the follow-up,
we compared it with the multi-model integrated detoxification method (Strategy 4) in comparative
experiment. Further comparative experiment show that the multi-model ensemble detoxification method
achieves a breakthrough in the detoxification efect. Not only did the detoxification efect of French (fr)
jump to 0.801, but it also surpassed the detoxification performance of a single model in all test languages
except Amharic (am). This multilingual text detoxification method achieved the best experimental
results so far, increasing the AvgP to 0.656 and the AvgNP to 0.607.</p>
        <p>As table 6 shows our model outperforms most of the baseline methods in terms of Avgp score,
including baseline gpt4, baseline o3mini, baseline gpt4o, baseline delete, baseline backtranslation, and
baseline duplicate. Among the languages evaluated, Ukrainian (uk; 5th), Spanish (es; 3rd), and Hindi
(hi; 2nd) achieved top-5 rankings in terms of performance. Furthermore, our AvNP score outperforms
all baseline models and achieves 4th place overall in the test-phase evaluation. For this ranking , the
top-performing languages are Japanese (ja; 4th), French (fr; 3rd), Hindi (hin; 4th), and Hebrew (he; 5th).</p>
        <p>However, this method cannot completely solve the problem of homophones in diferent languages and
cultures. For example, the English word "house" overlaps with the Chinese word "haosi", which means
"good end". When this homophone appears in some toxic sentences, such as "You’ll die a miserable
death", this method cannot find the corresponding Chinese meaning. The sentence may be understood
as "You won’t have a good house".</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Summary</title>
      <p>This paper briefly describes our work on the multilingual text detoxification task at PAN 2025. We
propose using an ensemble of LLMs combined with prompt from the RISE framework to detoxify
text across multiple languages. Initially, we constructed a toxicity-neutral text alignment dataset
and a toxicity keyword list using the oficial dataset. Model inputs were created by integrating the
RISE framework with few-shot methods. These inputs were used to drive multiple commercial LLMs
(DeepSeek, Qwen, Kimi) to generate detoxified candidate outputs. Finally, the optimal output was
selected through multi-dimensional evaluation, considering toxicity score, semantic integrity, and
language fluency.For specific code, please refer to our release on github 8.</p>
      <p>In this work, as shown in Table 6, the results demonstrate that our proposed method efectively handles
the task of multilingual text detoxification, showing good adaptability and stability across diferent
languages. However, the method does not adequately address homophones present in various languages
and cultures. Future work will require more data for contextualization and research into frameworks
for understanding homophones in LLMs. Additionally, we plan to enhance the tone restoration of
detoxified text and construct a corresponding knowledge base to guide the result generation of LLMs.</p>
      <sec id="sec-6-1">
        <title>8https://github.com/lxh44126/Detoxification/tree/code</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This work is supported by the National Natural Science Foundation of China (No.62276064).</p>
    </sec>
    <sec id="sec-8">
      <title>8. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used DouBao9, YuanBao in order to: Grammar and
spelling check, Paraphrase and reword. After using this tool, the authors reviewed and edited the
content as needed and take full responsibility for the publication’s content.</p>
      <p>E. Stamatatos, B. Stein, M. Taulé, D. Ustalov, M. Wiegmann, E. Zangerle, Overview of PAN 2024:
Multi-author writing style analysis, multilingual text detoxification, oppositional thinking analysis,
and generative AI authorship verification - extended abstract, volume 14613 of Lecture Notes in
Computer Science, Springer, 2024, pp. 3–10.
[13] D. Dementieva, D. Moskovskiy, N. Babakov, A. A. Ayele, N. Rizwan, F. Schneider, X. Wang, S. M.</p>
      <p>Yimam, D. Ustalov, E. Stakovskii, A. Smirnova, A. Elnagar, A. Mukherjee, A. Panchenko, Overview
of the multilingual text detoxification task at pan 2024, CEUR-WS.org, 2024.
[14] J. Bevendorf, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag, M. Fröbe, D.
Korencic, M. Mayerl, A. Mukherjee, A. Panchenko, M. Potthast, F. Rangel, P. Rosso, A. Smirnova,
E. Stamatatos, B. Stein, M. Taulé, D. Ustalov, M. Wiegmann, E. Zangerle, Overview of PAN 2024:
Multi-author writing style analysis, multilingual text detoxification, oppositional thinking analysis,
and generative AI authorship verification - extended abstract, volume 14613 of Lecture Notes in
Computer Science, Springer, 2024, pp. 3–10.
[15] D. Dementieva, N. Babakov, A. Ronen, A. A. Ayele, N. Rizwan, F. Schneider, X. Wang, S. M. Yimam,
D. A. Moskovskiy, E. Stakovskii, E. Kaufman, A. Elnagar, A. Mukherjee, A. Panchenko, Multilingual
and explainable text detoxification with parallel corpora, Association for Computational Linguistics,
Abu Dhabi, UAE, 2025, pp. 7998–8025.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Yang,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Towards comprehensive detection of chinese harmful memes</article-title>
          , volume
          <volume>37</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2024</year>
          , pp.
          <fpage>13302</fpage>
          -
          <lpage>13320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Protasov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          , I. Alimova,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brune</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Konovalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liebeskind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litvak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Shah</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Takeshita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vanetik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <article-title>Overview of the multilingual text detoxification task at pan 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Peng,</surname>
          </string-name>
          <article-title>A Machine-Generated Text Detection Model Based on Text Multi-Feature Fusion, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>2593</fpage>
          -
          <lpage>2602</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Řehulka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šuppa</surname>
          </string-name>
          , RAG Meets Detox:
          <article-title>Enhancing Text Detoxification Using Open-Source Large Language Models with Retrieval Augmented Generation, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>3021</fpage>
          -
          <lpage>3031</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          , G. Neubig,
          <article-title>Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>
          , volume
          <volume>35</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Prompt-based meta-learning for few-shot text classification, Association for Computational Linguistics</article-title>
          , Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>1342</fpage>
          -
          <lpage>1357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <article-title>Ensemble learning method using stacking with base learner, a comparison</article-title>
          , Springer, Singapore,
          <year>2023</year>
          , pp.
          <fpage>181</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ronen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stakovskii</surname>
          </string-name>
          , E. Kaufman, A.
          <string-name>
            <surname>Elnagar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <article-title>Multilingual and explainable text detoxification with parallel corpora, Association for Computational Linguistics, Abu Dhabi</article-title>
          ,
          <string-name>
            <surname>UAE</surname>
          </string-name>
          ,
          <year>2025</year>
          , pp.
          <fpage>7998</fpage>
          -
          <lpage>8025</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stakovskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <article-title>Overview of the multilingual text detoxification task at pan 2024, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korencic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Smirnova,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>