<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sky.Duan at TextDetox CLEF 2025/Multilingual Text Detoxification 2025: An Intelligent Approach Integrating Local Models and Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xianbing Duan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiangao Peng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kaiyin Sun</string-name>
          <email>sunkaiyin123@163.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhongyuan Han</string-name>
          <email>hanzhongyuan@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>33 Guang-yun-lu</institution>
          ,
          <addr-line>Shi Shan, NanHai, Foshan, Guangdong</addr-line>
          ,
          <country country="CN">P.R.China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Foshan No.3 Middle School</institution>
          ,
          <addr-line>Foshan, Guangdong, 528000</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper aims to introduce the system we submitted for the PAN 2025 multilingual text detoxification shared task. Text detoxification, as a key task in the field of natural language processing, is dedicated to transforming harmful text into neutral and harmless expressions. To this end, we propose a multilingual text detoxification system based on a heterogeneous model collaboration framework, which efectively combines the specialized processing capabilities of local fine-tuned models with the contextual understanding capabilities of large language models to achieve eficient and accurate text purification. The core of the system adopts a dual-branch processing architecture that combines the locally deployed s-nlp/mt0-xl-detox-orpo model with the cloud-based QWen3 model. Meanwhile, we have designed an intelligent output fusion mechanism that employs large language models with advanced reasoning capabilities to analyze, compare, and integrate multi-source detoxification results. Experimental results on the PAN 2025 multilingual detoxification dataset show that our system achieved a competitive score of 0.676 on the TEST dataset while maintaining the integrity of the original semantics, and demonstrated good adaptability and detoxification performance for 15 diferent languages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text detoxification</kwd>
        <kwd>Multilingual detoxification</kwd>
        <kwd>Heterogeneous model collaboration</kwd>
        <kwd>Large language models</kwd>
        <kwd>TypeChat</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the popularization of the Internet and social media, the spread of harmful content online has evolved
into a serious social issue. Text detoxification, as an important branch of natural language processing,
aims to rewrite harmful texts containing hate speech, discriminatory language, and cyberbullying into
neutral and harmless expressions.[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>This paper describes our system submission to the PAN 2025 multilingual text detoxification shared
task, which challenges participants to develop efective detoxification systems across multiple languages
and cultural contexts.</p>
      <p>
        The development of text detoxification technology has evolved through multiple stages. Early
rule-based methods mainly relied on predefined rules and keyword filtering, which were simple to
implement but had obvious limitations: inability to handle language diversity and creativity, easy
circumvention by malicious users through text transformation and homophones, lack of understanding
of context and cultural background, and poor performance in multilingual environments. In recent
years, deep learning-based text detoxification methods have made significant progress[
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], with
notable contributions including the RealToxicityPrompts dataset for evaluation benchmarking[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the
ParaDetox method using parallel data for multilingual detoxification[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and the DExperts method for
decoding-time control[10]. Particularly, the emergence of large language models has provided new
possibilities for text detoxification[
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], with recent research exploring advanced techniques such as
knowledge editing, sparse autoencoders, and cross-language detoxification systems.
      </p>
      <p>
        However, existing methods still face the following challenges: (1) single models are limited by their
inherent architectural constraints, where specialized models excel in processing speed but may lack
contextual understanding, while general-purpose models provide comprehensive analysis but require
substantial computational resources; (2) insuficient cross-language and cross-cultural adaptability[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ];
(3) lack of efective multi-model fusion mechanisms; (4) imperfect output quality control mechanisms.
      </p>
      <p>Motivation for Heterogeneous Model Collaboration: Our analysis reveals that specialized
detoxification models and general-purpose large language models possess complementary strengths and
limitations. Specialized local models (such as s-nlp/mt0-xl-detox-orpo) are specifically developed for text
detoxification tasks, ofering high processing speed and targeted detoxification capabilities. However,
these models may sufer from semantic rigidity and overly narrow focus, potentially losing important
contextual nuances or cultural subtleties. In contrast, large language models (whether cloud-based or
locally deployed) are designed to solve general problems and can leverage vast amounts of training data
to provide inspirational and contextually-aware detoxification approaches. They excel at understanding
complex semantic relationships and cultural contexts but may lack the specialized precision required
for efective detoxification.</p>
      <p>The heterogeneous collaboration framework motivation stems from the hypothesis that combining
these two complementary approaches can simultaneously achieve both processing eficiency and
output quality. Through asynchronous parallel execution, the system leverages the rapid response
of specialized models (approximately 150ms inference time) while concurrently obtaining the rich
contextual analysis from large language models. This parallel processing paradigm eliminates the
traditional trade-of between speed and quality by allowing both models to contribute their respective
strengths simultaneously. Furthermore, by employing large language models with advanced reasoning
capabilities to intelligently fuse the outputs from both branches, we can achieve a synergistic efect
that surpasses the performance of either approach alone while maintaining overall system eficiency.
This fusion process allows the system to automatically identify the strengths of each model’s output
and combine them into a more comprehensive and efective detoxification result.</p>
      <p>To address these challenges in the context of the PAN 2025 shared task, this paper proposes a
multilingual text detoxification system based on heterogeneous model collaboration framework, aiming
to achieve eficient, accurate, and culturally sensitive text detoxification by integrating the advantages
of multiple models.</p>
      <p>The main contributions of this paper encompass several key aspects. First, we propose a novel
heterogeneous model collaboration framework that efectively combines the specialized processing
capabilities of local fine-tuned models with the contextual understanding capabilities of large language
models, creating a synergistic approach to text detoxification. Second, we design an intelligent output
fusion mechanism that analyzes, compares, and integrates multiple detoxification results through large
language models with advanced reasoning capabilities, ensuring higher quality outcomes than
singlemodel approaches. Third, we establish a language-adaptive model selection strategy that dynamically
selects optimal model combinations based on diferent language characteristics, providing personalized
solutions for diverse linguistic contexts. Fourth, we build a comprehensive localized prompt engineering
system that designs culturally sensitive detoxification prompts for 15 languages, addressing the nuanced
requirements of cross-cultural text processing. Finally, we implement a TypeChat-based structured
output constraint mechanism to ensure output format consistency and quality control, maintaining
system reliability across all supported languages.</p>
    </sec>
    <sec id="sec-2">
      <title>2. System Approach and methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Overall System Architecture</title>
        <p>The multilingual text detoxification system proposed in this paper adopts a layered modular design. The
system consists of five core modules: Input Processing Module, responsible for text preprocessing,
language detection, and preliminary analysis; Parallel Detoxification Module , the core processing
unit containing local and remote model branches; Intelligent Summarization Module, which uses
large language models with advanced reasoning capabilities to analyze, compare, and merge multiple
detoxification results; Quality Control Module, performing output validation and format control
based on the TypeChat framework; and Output Post-processing Module, for final formatting, quality
checking, and exception handling. The system’s overall processing flow adopts a pipeline design,
ensuring eficient and orderly data processing from input to final output.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Core Technical Module Design</title>
        <sec id="sec-2-2-1">
          <title>2.2.1. Heterogeneous Model Collaboration Architecture</title>
          <p>The heterogeneous model collaboration architecture is the core innovation of this system, adopting
a design philosophy that efectively combines the specialized processing capabilities of local models
with the contextual understanding capabilities of large language models. The architecture comprises
three key components. The local branch utilizes the s-nlp/mt0-xl-detox-orpo model deployed on
an NVIDIA 3090 GPU, with an average inference time of approximately 150ms, providing rapid and
specialized detoxification. The remote branch integrates cloud APIs such as Qwen3, configured with
multi-level retry and fault tolerance mechanisms to ensure service stability. The parallel coordination
mechanism employs the ‘asyncio‘ library to implement asynchronous concurrent execution, thereby
maximizing overall processing eficiency.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Deep Fusion Mechanism of Two Branches</title>
          <p>The fusion mechanism of the two branches is another core innovation, achieving intelligent integration
of multi-source information through large language models with advanced reasoning capabilities to
ensure the quality and consistency of the final output. The fusion process follows a three-stage strategy:
"result acquisition -&gt; intelligent analysis -&gt; structured output".</p>
          <p>Language-Adaptive System Prompts A key design is the use of language-adaptive system prompts.
For each of the 15 supported languages, we have designed dedicated system-level role prompts for
the large language models with advanced reasoning capabilities. These prompts fully consider the
grammatical features, cultural background, and expression habits of each language.
Structured Information Transfer We utilize a structured information transfer mechanism to pass
the results from both branches to the large language models with advanced reasoning capabilities,
ensuring they can perform a comprehensive comparative analysis.</p>
          <p>TypeChat-Constrained Result Generation Finally, the output of the fusion stage strictly adheres
to the interface specifications defined by the TypeChat framework, guaranteeing format consistency and
parsability. While this deep fusion mechanism significantly improves detoxification quality, it also incurs
higher computational costs and processing delays. However, we anticipate that with the advancement
of LLM technology, this high-quality fusion approach will achieve a better eficiency balance in the
future. The technical innovations of this system include the pioneering multi-model collaborative
fusion, language-adaptive fusion strategies, a structured information flow, and a quality-oriented design
philosophy.</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>2.2.3. Intelligent Output Summarization Mechanism</title>
          <p>This module utilizes large language models with advanced reasoning capabilities, such as Qwen3,
to perform in-depth analysis and optimized fusion of multiple detoxification results. Its core is an
LLM-driven comprehensive evaluation system that assesses results from three dimensions. Semantic
preservation evaluation uses a 5-point scale to assess the semantic equivalence between the original
and detoxified texts. Fluency evaluation employs a 0-1 continuous score to assess everything from
grammatical correctness to naturalness of expression. Detoxification efect evaluation uses a 5-point
relative score to accurately assess the reduction of implicit harmful content by considering cultural
context. Based on these evaluations, the system uses an adaptive weight allocation algorithm to
dynamically adjust the weights of each model’s output, achieving intelligent result fusion. Compared to
traditional methods, LLM-based evaluation has significant advantages in deep semantic understanding,
context awareness, and cultural sensitivity.</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>2.2.4. Multilingual Prompt Engineering</title>
          <p>We have designed specialized prompt templates for 15 languages, fully considering their grammatical
features, cultural backgrounds, and expression habits. The system employs a sophisticated three-layer
prompt architecture. The local model prompts are concise language prefixes optimized for the
snlp/mt0-xl-detox-orpo model. The remote model prompts establish a "professional text purification
expert" role that follows six core detoxification principles. The advanced reasoning fusion prompts
guide the fusion model through a systematic five-step analysis process. In our design, we have
thoroughly considered writing system adaptation (e.g., right-to-left for Arabic), cultural sensitivity
management (e.g., religious neutrality), and language style preservation. The technical innovations
of this prompt engineering are particularly noteworthy, including our proposed minimal modification
principle ("prioritize lexical-level operations, strictly control sentence rewriting"), fine-grained control
over emotional intensity preservation, the establishment of cross-language consistency standards,
and a comprehensive multi-stage prompt system.</p>
        </sec>
        <sec id="sec-2-2-5">
          <title>2.2.5. TypeChat Structured Output Constraints</title>
          <p>The system uses the Microsoft TypeChat framework to ensure the consistency, reliability, and parsability
of LLM outputs. Its core mechanism relies on several key elements. We use TypeScript interface
definitions to set strict output formats for each functional module and utilize a JSON validator for
real-time validation of model outputs. Furthermore, a robust automatic retry mechanism ensures
that the system can self-correct in case of format mismatches and provides intelligent fallback strategies.
The technical advantages and innovations of this application are significant. Not only does it ensure
type safety for system outputs, but it also enhances robustness through a comprehensive error
handling mechanism. This is the first systematic application of the TypeChat framework in the text
detoxification field, which we refer to as a pioneering application, and its multilingual adaptability
ensures that this constraint mechanism works efectively across all 15 languages.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Design and Results Analysis</title>
      <sec id="sec-3-1">
        <title>3.1. Experimental Setup</title>
        <p>Dataset: Based on PAN 2025 multilingual detoxification dataset, containing 9,000 samples covering 15
languages. Major languages (uk, hi, zh, ar, de, en, ru, am, es, it) have 600 samples each, while other 7
languages have 3,005 samples total. The dataset covers Indo-European, Semitic, Sino-Tibetan language
families, ensuring cultural diversity and evaluation fairness.</p>
        <p>
          Evaluation Metrics: The system establishes a comprehensive LLM-based three-dimensional
evaluation framework that provides robust assessment across multiple quality dimensions. The content score
( = (,  ) ∈ [
          <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
          ]) evaluates semantic similarity
between original and detoxified texts, ensuring that essential meaning is preserved throughout the
detoxification process. The fluency score (   = (  ) ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]) assesses
the linguistic quality of detoxified outputs, measuring grammatical correctness, naturalness, and
overall readability. The pairwise score (  = (,  ) ∈ [
          <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
          ])
evaluates the efectiveness of toxicity reduction by comparing the harmful content levels between
original and processed texts.
        </p>
        <p>LLM evaluation has advantages in deep semantic understanding, context awareness, cultural
sensitivity, and consistency guarantee, with quality ensured through anomaly detection, multiple validation,
and manual sampling.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experimental Results and Analysis</title>
        <p>Based on comprehensive experiments with 9,005 multilingual samples, the system demonstrates
significant performance:</p>
        <p>Core Metrics: The system demonstrates solid performance across all evaluation dimensions, with
results that meet our design expectations. The average content score reaches 4.551 out of 5.0 (91.0%),
with 94.8% of samples achieving scores of 4 or higher, demonstrating good semantic preservation
capabilities that maintain the original meaning while successfully removing toxic elements. The average
lfuency score of 0.490 out of 1.0 (49.0%) shows solid language quality performance, with 57.8% of
samples achieving scores of 0.5 or higher, indicating that the system produces linguistically sound and
natural-sounding detoxified text in the majority of cases. The average pairwise score of 3.730 out of 5.0
(74.6%) provides strong evidence of efective toxicity reduction, confirming that the system successfully
identifies and neutralizes harmful content while preserving communicative intent.</p>
        <p>Performance Analysis: Fluency shows bimodal distribution, with 42.2% samples in low fluency
range (0.0-0.2) and 40.2% samples in high fluency range (0.8-1.0), indicating the system produces
high-quality output in most cases but still has room for improvement.</p>
        <p>Multilingual Adaptability: The system performs well across all 15 languages, supporting 6 writing
systems (Latin, Cyrillic, Arabic, Devanagari, Chinese, Ge’ez), successfully handling diferent grammatical
structures from analytic to synthetic languages, maintaining good detoxification efects across diferent
cultural backgrounds.</p>
        <p>Correlation Analysis: Content score vs fluency correlation coeficient -0.002 (almost no correlation),
content score vs pairwise score correlation coeficient -0.195 (slight negative correlation), fluency vs
pairwise score correlation coeficient 0.194 (positive correlation). Anomaly rate extremely low (0.12%),
data completeness 100%.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Evaluation Metrics Correlation Analysis</title>
        <p>To better understand the relationships between diferent evaluation dimensions, we examined the
correlations between our three core metrics across the 9,005 test samples.</p>
        <p>Basic Correlation Observations:
The correlation analysis reveals interesting patterns in our evaluation metrics:</p>
        <p>The correlation analysis reveals several important patterns in our evaluation metrics that provide
insights into system behavior. Content score versus fluency demonstrates a nearly zero correlation (  =
− 0.002), suggesting that semantic preservation and fluency operate as independent dimensions within
our system, indicating that maintaining original meaning does not inherently conflict with producing
lfuent output. Content score versus pairwise score shows a slight negative correlation (  = − 0.195),
indicating a minor trade-of between semantic preservation and detoxification efectiveness, which is
expected in text detoxification tasks where stronger detoxification approaches may occasionally impact
the preservation of original meaning. Fluency versus pairwise score exhibits a positive correlation
( = 0.194), suggesting that more fluent outputs tend to achieve better detoxification results, indicating
that natural language generation quality contributes significantly to efective detoxification outcomes.</p>
        <p>Implications for System Performance:</p>
        <p>These correlation patterns provide insights into our heterogeneous collaboration framework’s
behavior:</p>
        <p>These correlation patterns provide valuable insights into our heterogeneous collaboration
framework’s operational characteristics and validate key design decisions. The independence between content
preservation and fluency validates our dual-branch approach, demonstrating that specialized and
general models can optimize diferent aspects of text processing without creating inherent conflicts, thereby
supporting the efectiveness of our parallel processing strategy. The weak negative correlation between
content and pairwise scores reflects the inherent challenge in balancing meaning preservation with
toxicity removal, a trade-of that our intelligent fusion mechanism is specifically designed to minimize
through sophisticated analysis and optimization. The positive relationship between fluency and
detoxiifcation efectiveness strongly supports our strategy of employing sophisticated language models for
high-quality output generation, confirming that linguistic sophistication contributes to more efective
detoxification outcomes.</p>
        <p>Key Improvements: The system demonstrates progress in several areas: supporting 15 languages
and 6 major writing systems, achieving 94.8% samples with good semantic preservation, maintaining
a low anomaly rate of 0.12%, implementing LLM-based multi-dimensional automatic evaluation, and
showing improvements over traditional detoxification methods.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and Future Work</title>
      <sec id="sec-4-1">
        <title>4.1. Summary of Main Contributions</title>
        <p>This paper presents a multilingual text detoxification system based on heterogeneous model
collaboration framework with the following main contributions:</p>
        <p>System Design: We propose a heterogeneous model collaboration framework that combines local
specialized models (s-nlp/mt0-xl-detox-orpo) with cloud-based large language models (QWen3) through
asynchronous concurrent processing.</p>
        <p>Technical Innovations: (1) An intelligent output fusion mechanism using LLMs with advanced
reasoning capabilities; (2) Language-adaptive prompt engineering for 15 languages; (3) TypeChat-based
structured output constraints; (4) LLM-driven multi-dimensional evaluation system.</p>
        <p>Experimental Results: Testing on 9,005 multilingual samples shows: content score 4.551/5.0 (91.0%),
lfuency score 0.490/1.0 (49.0%), pairwise score 3.730/5.0 (74.6%), with 94.8% high-quality samples and
0.12% anomaly rate across 15 languages and 6 writing systems.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Future Work</title>
        <p>Future work will focus on: (1) improving fluency performance to address the bimodal distribution issue;
(2) expanding language coverage to low-resource languages; (3) optimizing system architecture for
real-time processing; (4) developing domain-specific detoxification strategies; (5) refining the LLM-based
evaluation system for better semantic and cultural nuance capture.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Social Science Foundation of China (24BYY080).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>(by using the activity taxonomy in ceur-ws.org/genai-tax.html):
During the preparation of this work, the author(s) used Claude 4 in order to: Grammar and spelling
check, code review, and research assistance. After using these tool(s)/service(s), the author(s) reviewed
and edited the content as needed and take(s) full responsibility for the publication’s content.
Code Availability The complete source code and implementation of our multilingual
text detoxification system is publicly available at: https://github.com/skyDuanXianBing/
MultilingualTextDetoxificationSystem.git
[10] Liu A, Sap M, Lu X, et al. DExperts: Decoding-Time Controlled Text Generation with Experts and
Anti-Experts. In: Proceedings of the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume
1: Long Papers). Online: Association for Computational Linguistics, 2021: 6691-6706.
[11] Dementieva D, Babakov N, Panchenko A. MultiParaDetox: Extending Text Detoxification with
Parallel Data to New Languages. In: Proceedings of the 2024 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume
4: Student Research Workshop). Mexico City, Mexico: Association for Computational Linguistics,
2024: 12-18.
[12] Lees A, Tran V Q, Tay Y, et al. A new generation of perspective api: Eficient multilingual
characterlevel transformers. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining. Washington DC USA: ACM, 2022: 3197-3207.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Prompt Examples</title>
      <p>Figure 5: Example For English Prompt
Figure 6: Example For Chinese Prompt</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Dementieva</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Protasov</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babakov</surname>
            <given-names>N</given-names>
          </string-name>
          , et al.
          <article-title>Overview of the Multilingual Text Detoxification Task at PAN 2025</article-title>
          . In: Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          .
          <source>CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bevendorf</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dementieva</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fröbe</surname>
            <given-names>M</given-names>
          </string-name>
          , et al.
          <source>Overview of PAN</source>
          <year>2025</year>
          :
          <article-title>Generative AI Detection, Multilingual Text Detoxification, Multi-author Writing Style Analysis, and Generative Plagiarism Detection</article-title>
          . In: Advances in Information Retrieval. Cham: Springer Nature Switzerland,
          <year>2025</year>
          :
          <fpage>434</fpage>
          -
          <lpage>441</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Gehman</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gururangan</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sap</surname>
            <given-names>M</given-names>
          </string-name>
          , et al.
          <article-title>RealToxicityPrompts: Evaluating neural toxic degeneration in language models</article-title>
          . In:
          <article-title>Findings of the Association for Computational Linguistics: EMNLP 2020</article-title>
          .
          <article-title>Online: Association for Computational Linguistics</article-title>
          ,
          <year>2020</year>
          :
          <fpage>3356</fpage>
          -
          <lpage>3369</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Dale</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voronov</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dementieva</surname>
            <given-names>D</given-names>
          </string-name>
          , et al.
          <source>Text Detoxification using Large Pre-trained Neural Models. arXiv preprint arXiv:2109.08914</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Welbl</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glaese</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uesato</surname>
            <given-names>J</given-names>
          </string-name>
          , et al.
          <article-title>Challenges in detoxifying language models</article-title>
          .
          <source>In: 9th International Conference on Learning Representations. Virtual Event: OpenReview.net</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Rykov</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anisimov</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voronin</surname>
            <given-names>A</given-names>
          </string-name>
          , et al.
          <article-title>Alignment of Multilingual Transformers for Text Detoxification</article-title>
          . In: Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          . Grenoble, France: CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Sushko</surname>
            <given-names>N.</given-names>
          </string-name>
          <article-title>PAN 2024 Multilingual TextDetox: Exploring Diferent Regimes For Synthetic Data Training</article-title>
          . In: Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          . Grenoble, France: CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Dementieva</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moskovskiy</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchenko</surname>
            <given-names>A</given-names>
          </string-name>
          , et al.
          <article-title>Overview of the Multilingual Text Detoxification Task at PAN 2024</article-title>
          .
          <article-title>In: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          . Grenoble, France: Springer,
          <year>2024</year>
          :
          <fpage>4</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Logacheva</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dementieva</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ustyantsev</surname>
            <given-names>S</given-names>
          </string-name>
          , et al.
          <article-title>ParaDetox: Detoxification with parallel data</article-title>
          .
          <source>In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          : Long Papers). Dublin, Ireland: Association for Computational Linguistics,
          <year>2022</year>
          :
          <fpage>6804</fpage>
          -
          <lpage>6818</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>