<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HF_Detox at TextDetox CLEF 2025: Prompt-Driven Multilingual Detoxification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Humaira Farid</string-name>
          <email>humaira.farid@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zainab Ahmad</string-name>
          <email>zainabshaukat09@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahmad Mahmood</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iqra Ameer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, The Pennsylvania State University</institution>
          ,
          <addr-line>University Park, PA 16802</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Division of Science and Engineering, The Pennsylvania State University</institution>
          ,
          <addr-line>Abington, PA, 19001</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación(CIC)</institution>
          ,
          <addr-line>Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The text detoxification task aims to automatically transform toxic or ofensive sentences into neutral, semantically equivalent paraphrases. In this study, we evaluated a lightweight, prompt-based chain-of-thought approach using OpenAI's GPT-4o-mini on the PAN 2025 multilingual ParaDetox benchmark, encompassing 15 typologically diverse languages. Without any fine-tuning, our method leverages a single system instruction with a few-shot setup. During the initial evaluation, our approach achieved top-3 Joint (J) scores in six languages, most notably French (J = 0.775, rank 2) and Hebrew (J = 0.613, rank 1), and ranked in the top 10 for 9 languages overall. In the post-evaluation phase, our system ranked fourth overall in the parallel and third non-parallel data tracks, achieving average J scores of 0.768 and 0.718, respectively. Strong results were observed in English (J = 0.775), Spanish (J = 0.814), Japanese (J = 0.819), and Hebrew (J = 0.671), achieving the second position, demonstrating our method's robustness in both high-resource and morphologically diverse languages. These findings underscore the potential of multilingual large language models, guided by carefully designed prompts, to serve as plug-and-play detoxifiers even in the absence of task-specific fine-tuning.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2025</kwd>
        <kwd>Multilingual Text Detoxification (TextDetox) 2025</kwd>
        <kwd>Style Transfer</kwd>
        <kwd>Multilingual NLP</kwd>
        <kwd>GPT-4o-mini</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid and widespread growth of social media platforms, including YouTube, Facebook, Twitter, and
Instagram, in recent years has transformed the culture of communication and interaction around the
world. These platforms generate a huge amount of user-generated content on a daily basis, providing
rich data for natural language processing (NLP) research and applications [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, they’re also
hosts of toxic and harmful speech that can lead to harassment, exclusion, and even radicalization [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        To mitigate this, most online platforms rely on content moderation techniques such as blocking or
deleting harmful posts. Although these are good countermeasures to some extent, these measures are
reactive and often result in the loss of potentially meaningful content. A more proactive approach
involves text detoxification, which consists of rewriting toxic content in a non-ofensive manner while
retaining the original meaning of the content [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In this paper, we tackle the Multilingual Text Detoxification task, which was proposed as part of the
PAN at CLEF 2025 shared task [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The aim is to transform toxic sentences into neutral and non-ofensive
versions in a diverse range of languages (a total of 15), including high-resource languages (e.g., English,
Spanish) as well as low-resource (e.g., Amharic, Hinglish, and Tata) or code-switched languages (e.g.,
      </p>
      <p>Hinglish). The emphasis is on explicit toxicity, such as being rude or using swear words, rather than
more implicit or subtle forms (e.g., sarcasm).</p>
      <p>
        Several challenges make this task both complex and impactful. One of the main obstacles is deciding
how to identify toxicity in a consistent manner across multiple languages and in diverse cultural
contexts. Although some types of toxicity can be determined from explicit abusive language using
lexicon-based methods, the context and construction of language, particularly across diverse linguistic
families, can pose significant challenges. Another serious issue is ensuring that the original meaning
of the content is retained in detoxification. Rephrasing sentences to eliminate toxic content without
altering the original meaning requires deep language understanding. Additionally, the discrepancy
in resource availability across diferent languages makes this task further challenging. While
highresource languages (e.g., English, Spanish, etc.) benefit from large-scale corpora and pre-trained models,
low-resource or code-switched languages (e.g., Amharic, Hinglish) lack such support, making efective
detoxification more dificult. Moreover, ensuring that the model’s output is not only detoxified but also
lfuent, natural-sounding, and culturally appropriate adds further complexity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>In this study, we propose a truly multilingual, prompt-driven approach to text detoxification using
only the GPT-4o-mini model. Rather than fine-tuning separate encoder–decoder or decoder-only
architectures, we leverage in-context learning: each toxic input is prefaced with a single fixed instruction.
This simple setup scales seamlessly to all 15 languages in the PAN 2025 multilingual ParaDetox
benchmark. Despite its lightweight nature, without any additional fine-tuning or model assembly, our method
achieves competitive performance. The results demonstrate that a well-crafted chain-of-thought prompt
alone can rival more complex, fully fine-tuned systems across diverse linguistic settings.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Text detoxification has attracted significant attention from researchers working across multiple
languages and styles. Early eforts focused on lexicon and rule-based filtering [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to identify and replace
or remove abusive terms, but these methods proved less efective in context-sensitive scenarios. This
led to data-driven paraphrasing approaches [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], most notably back-translation, that could rephrase
toxic inputs more accurately. Later, neural style-transfer models [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] were introduced to disentangle
content and toxicity signals to generate cleaner text. State-of-the-art methods fine-tune pretrained
large language models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which achieved superior fluency, semantic fidelity, and toxicity reduction.
      </p>
      <p>
        In 2024, PAN also introduced a shared task on multilingual text detoxification [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The task was to
challenge participants to convert toxic or abusive text into non-toxic paraphrases. The dataset comprised
text instances in nine diferent languages, including English, Spanish, German, Chinese, Arabic, Hindi,
Ukrainian, Russian, and Amharic. In the development phase, only English and Russian parallel data
were available, while for testing, systems were evaluated on full parallel corpora consisting of nine
languages. Submissions were assessed both automatically and through human crowd-sourcing. The
evaluation was performed using style transfer accuracy, semantic preservation (LaBSE cosine similarity),
and fluency (ChrF1 against human references). Baselines included simple duplications, delete keyword
heuristics, back translation, and fine-tuned mT5. Analysis showed that multilingual LLMs
(few/zeroshot and fine-tuned variants mT0-XL and LLaMa3) generally outperformed unsupervised methods,
though performance varied significantly by language.
      </p>
      <p>
        In human evaluation, Team SomethingAwful [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] claimed the first place by using an “uncensored"
LLaMa3-70B1 model with a few-shot prompting (10 exemplars per language) and a fine-tuned mT0-XL
for Amharic, it achieved the highest average Joint (J) score of 0.774, with its best performance in German
(J = 0.889) and Spanish (J = 0.834). Team SmurfCat [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] followed in second place by fine-tuning mT0-XL
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] on the shared task data and applying Odds-Ratio Preference Optimization (ORPO) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] at inference,
obtained an average J = 0.741 and peaking on Ukrainian (J = 0.840) and Arabic (J = 0.819). In third
place, Team VitalyProtasov [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] trained mT0-large [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] with language-specific data filtering before
ifne-tuning, reaching an average J = 0.723 and demonstrating particular strength in Hindi (J = 0.788).
      </p>
      <sec id="sec-2-1">
        <title>1https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md Last visited: 27/05/20225</title>
        <p>
          On the automatic leaderboard, Team SmurfCat [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] again led with an average J = 0.523 by leveraging
mT0-XL plus ORPO. Team lmeribal secured second place with an average J = 0.515; its top result was
in Ukrainian (J = 0.686), suggesting a robust multilingual fine-tuning strategy despite limited method
details. In third place, Team nikita.sushko [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] fine-tuned mT0-XL in two stages (parallel then synthetic
data filtered by LaBSE similarity and toxicity), with an additional “delete" post-processing step, achieving
an average J = 0.465 and the highest scores in Ukrainian (J = 0.668) and English (J = 0.553).
        </p>
        <p>As the literature depicts, PAN 2024’s TextDetox task highlighted the efectiveness of multilingual
LLMs, especially when combined with prompt engineering or lightweight preference-optimization
techniques, in outperforming unsupervised baselines. While top systems achieved impressive fluency
and content preservation, their performance still varied by language and toxicity category, underscoring
ongoing challenges in contextual nuance and low-resource adaptation.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>
        We use the ParaDetox parallel corpus from the Multilingual Text Detoxification task at PAN 2025,
available on Hugging Face [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ].2 The dataset contains sentence-level toxic–detoxified pairs in
nine languages: English, Spanish, German, Chinese, Arabic, Hindi, Ukrainian, Russian, and Amharic,
which serves as a training set. For all nine languages, there is a pair of 400 toxic-neutral sentences.
Similarly, the test set consists of 15 languages, including English, Spanish, German, Chinese, Arabic,
Hindi, Ukrainian, Russian, Amharic, Italian, French, Hebrew, Hinglish, Japanese, and Tatar. Example
for English, Spanish and Genrman are presentd in 1. For all 15 languages, there are 600 toxic sentences.
The table 2 provides a detailed overview of the dataset.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>Our approach relies exclusively on prompt engineering with the GPT-4o-mini model, avoiding any full
ifne-tuning due to policy restrictions on toxic content. We explore a range of prompting techniques,
from simple zero-shot instructions to detailed chain-of-thought prompts, and compare against a
backtranslation baseline. Figure 1 presents the overview of the methodology that how the toxic sentence is</p>
      <sec id="sec-4-1">
        <title>2https://huggingface.co/datasets/textdetox/multilingual_paradetox</title>
        <p>getting transformed into a detoxified one without changing its entire meaning, but by altering the toxic
words or phrases using our proposed methodology.</p>
        <sec id="sec-4-1-1">
          <title>4.1. Zero-Shot Prompting</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>In the zero-shot setting, we present only a high-level instruction to the model:</title>
        <p>[You are a wonderful text detoxification assistant, for each given
query substitute the toxic word with non-toxic word.]
Despite its minimal guidance, this setup leverages GPT-4o-mini’s extensive multilingual pre-training
and establishes a strong baseline for detoxification.</p>
        <sec id="sec-4-2-1">
          <title>4.2. Chain-of-Thought Prompting</title>
          <p>To improve the handling of subtle or context-dependent toxicity, we design a chain-of-thought prompt
that walks the model through the detoxification process in steps: Initially, identifying the toxic term,
then proposing a neutral substitute, and finally verifying that the meaning is preserved:
[You are a multilingual detoxification assistant. When the user
sends you a sentence, follow this process: 1. “Examine the sentence
for toxicity." 2. “Identify which word(s) are toxic or dirty." 3.
“For each toxic word, think of a neutral synonym that preserves the
original meaning." 4. “Replace the toxic word(s) with the chosen
synonym(s)." 5. “Review the new sentence to ensure it’s fluent
and nothing else has changed." 6. “Now provide the final non-toxic
paraphrase as a plain string." Do not add or remove anything besides
toxic words. If the sentence is already non-toxic, simply repeat it
unchanged.]
This explicit reasoning path helps the model produce more accurate and nuanced paraphrases.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.3. Fine-Tuning Considerations</title>
          <p>We evaluated OpenAI’s fine-tuning API on our parallel toxic–neutral dataset, but we were unable to
proceed due to policy constraints around harmful content. This reinforces the value of in-context
learning: with carefully crafted prompts alone, our method achieved first place in Hebrew and strong
performance in French without updating any model parameters.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>4.4. Back-Translation Baseline</title>
          <p>As a comparative baseline, we implemented a back-translation pipeline: translating the input into
English, detoxifying in English, then translating back to the original language. Although this approach
had a sound concept, it proved computationally expensive and underperformed compared to our
prompt-driven techniques.</p>
          <p>Overall, our experiments demonstrate that strategic prompt design with GPT-4o-mini ofers a
lightweight yet powerful solution for multilingual text detoxification across 15 languages.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>This section presents the performance of our proposed technique on the PAN 2025 ParaDetox test set,
as well as the outcomes from the subsequent post-evaluation phase. We report and analyze the Joint (J)
scores and leaderboard rankings of GPT-4o-mini across all 15 target languages, highlighting trends,
strengths, and limitations observed during both evaluation stages.</p>
      <sec id="sec-5-1">
        <title>5.1. Initial Results</title>
        <p>Table 3 reports the Joint (J) scores and leaderboard ranks for GPT-4o-mini across all 15 languages in
the PAN 2025 ParaDetox initial test set. The results show that GPT-4o-mini achieves its strongest
performance in French (J = 0.775, rank 2) and Hebrew (J = 0.613, rank 1), indicating particularly efective
in-context detoxification in these languages. High-resource European languages such as English (J =
0.727, rank 3) and Spanish (J = 0.705, rank 3) also perform well, suggesting the model’s pretraining is
most robust for widely represented languages.</p>
        <p>Mid-tier scores appear on Japanese (J = 0.653, rank 4) and Hindi (J = 0.600, rank 6), indicating moderate
success in East Asian languages. Notably, German (J = 0.699, rank 8) and Russian (J = 0.708, rank 7)
achieve competitive scores but slightly lower ranks, perhaps due to stronger baselines from specialized
systems.</p>
        <p>Performance declines in lower-resource or morphologically complex languages: Arabic (J = 0.585,
rank 11),Italian (J = 0.673, rank 12) and Amharic (J = 0.394, rank 12) rank lower, reflecting challenges in
script variation and fewer pretraining exemplars. The poorest result is on Hinglish (J = 0.296, rank 17),
underscoring the dificulty of mixed-script, code-switched text. These patterns suggest that while
few-shot in-context learning with GPT-4o-mini excels for high- and mid-resource languages, further
adaptation or fine-tuning may be required to close gaps in low-resource settings.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Post Evaluation Results</title>
        <p>This section presents the results for the post-evaluation phase for the parallel and non-parallel dataset.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Post Evaluation Results: Parallel Data Track</title>
        <p>The post-evaluation Joint (J) scores and rankings for 9 languages with parallel data are presented in
Table 4. Our team ranked fourth overall, along with other teams, achieving an average score of 0.768.
Notably, we obtained competitive scores in four key languages: English (0.888), ranked second; Spanish
(0.814), ranked second; German (0.912), ranked third; and Hindi (0.731) ranked third. Despite lower scores
in traditionally challenging languages like Amharic (0.526), our system consistently performed well in
most of the languages. The results highlight our model’s robustness in handling both morphologically
diverse and high-resource languages.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Post Evaluation Results: Without Parallel Data Track</title>
        <p>The post-evaluation Joint (J) scores and ranks for the 6 languages without parallel data, are presented
in Table 5. Our team achieved an average score of 0.718 and secured the third position, which shows
the robustenss of our model for unseen languages. Our method demonstrated strong performance in
high-resource languages like Japanese (0.819) and Hebrew (0.883), placing second rank.</p>
        <p>Although performance in languages like Tatar (0.511) and Hinglish (0.586) was relatively lower, these
results reflect the overall dificulty associated with such low-resource or morphologically complex
languages. However, consistent mid- to top-tier performance in most categories validates the efectiveness
of our detoxification strategy.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We presented a lightweight yet efective approach to multilingual text detoxification, utilizing
GPT-4omini in a prompt-based setting on the PAN 2025 ParaDetox benchmark. Our system does not require
task-specific fine-tuning and relies solely on a single system instruction. It delivers strong performance
in both initial and post-evaluation phases, ranking in the top 3 for 6 out of the 15 languages. The
post-evaluation results further validated our model’s robustness. We secured fourth place in the parallel
and third in the non-parallel data tracks, achieving high J scores across English, German, Hebrew, and
Spanish. Although our method struggled in low-resource and code-switched languages such as Hinglish
and Tatar, it consistently performed well in morphologically diverse and high-resource languages. These
results demonstrate the eficacy of prompt engineering in guiding multilingual LLMs for detoxification
tasks and highlight the importance of future work on adaptation strategies for low-resource and complex
linguistic settings.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Alsubait</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alarifi</surname>
          </string-name>
          , W. Alosaimi,
          <article-title>The impact of online toxicity on social media platforms</article-title>
          ,
          <source>in: 2021 International Conference on Computer and Information Sciences (ICCIS)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z. J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , E. Sheng, E. Wallace,
          <article-title>Detoxify: A robust pretrained transformer for toxic comment classification</article-title>
          ,
          <source>arXiv preprint arXiv:2104.07367</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dos Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rezende</surname>
          </string-name>
          ,
          <article-title>A survey on text detoxification: Challenges, methods, and evaluation, in: Findings of the Association for Computational Linguistics: ACL</article-title>
          <year>2023</year>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Voight-Kampf Generative AI Detection, Multilingual Text Detoxification, Multi-Author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of theSixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Protasov</surname>
          </string-name>
          ,
          <article-title>PAN 2024 Multilingual TextDetox: Exploring Cross-lingual Transfer in Case of Large Language Models</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S.</surname>
          </string-name>
          Herrera (Eds.),
          <source>Working Notes Papers of the CLEF</source>
          <year>2024</year>
          <article-title>Evaluation Labs, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>2852</fpage>
          -
          <lpage>2857</lpage>
          . URL: http: //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-274.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Spertus</surname>
          </string-name>
          ,
          <article-title>Smokey: automatic recognition of hostile messages</article-title>
          ,
          <source>in: Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence</source>
          , AAAI'97/IAAI'97, AAAI Press,
          <year>1997</year>
          , p.
          <fpage>1058</fpage>
          -
          <lpage>1065</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <article-title>Exploring methods for cross-lingual text style transfer: The case of text detoxification</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2311.13937. arXiv:
          <volume>2311</volume>
          .
          <fpage>13937</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <article-title>Exploring toxic lexicon similarity methods with the DRG framework on the toxic style transfer task, Master's thesis</article-title>
          , KTH, School of Electrical Engineering and Computer Science (EECS),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pletenev</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Panchenko,
          <article-title>LLMs to replace crowdsourcing for parallel data creation? the case of text detoxification</article-title>
          , in: Y.
          <string-name>
            <surname>Al-Onaizan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>Y.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2024</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>14361</fpage>
          -
          <lpage>14373</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          . ifndings-emnlp.
          <volume>839</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .findings-emnlp.
          <volume>839</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pletenev</surname>
          </string-name>
          ,
          <article-title>Memu_pro_kotow at PAN 2024 TextDetox: Uncensored Llama3 Helps to Censor Better</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G.-S. de Herrera (Eds.),
          <source>Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          .
          <article-title>PAN 2024 TextDetox workshop contribution</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rykov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zaytsev</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Anisimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Voronin</surname>
          </string-name>
          ,
          <article-title>Smurfcat at pan 2024 textdetox: Alignment of multilingual transformers for text detoxification</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407.05449. arXiv:
          <volume>2407</volume>
          .
          <fpage>05449</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Muennighof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sutawika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. X.</given-names>
            <surname>Yong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schoelkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Aji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Almubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Albanie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Alyafeai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Webson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Raf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <article-title>Crosslingual generalization through multitask finetuning</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>15991</fpage>
          -
          <lpage>16111</lpage>
          . URL: https://doi.org/10.18653/ v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>891</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>891</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          , Orpo:
          <article-title>Monolithic preference optimization without reference model</article-title>
          , CoRR, abs/2403.07691,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2403.07691, arXiv:
          <fpage>2403</fpage>
          .07691, https://doi. org/10.48550/arXiv.2403.07691.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Protasov</surname>
          </string-name>
          ,
          <article-title>Pan 2024 multilingual textdetox: Exploring cross-lingual transfer in case of large language models</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sushko</surname>
          </string-name>
          ,
          <article-title>Pan 2024 multilingual textdetox: Exploring diferent regimes for synthetic data training for multilingual text detoxification</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Protasov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          , I. Alimova,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brune</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Konovalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liebeskind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litvak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Shah</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Takeshita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vanetik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <article-title>Overview of the multilingual text detoxification task at pan 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of pan 2025:
          <article-title>Generative ai detection, multilingual text detoxification, multi-author writing style analysis, and generative plagiarism detection</article-title>
          , in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>434</fpage>
          -
          <lpage>441</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>