<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SmurfCat at PAN 2024 TextDetox: Alignment of Multilingual Transformers for Text Detoxification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elisei Rykov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Zaytsev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Anisimov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandr Voronin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HSE University</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Skolkovo Institute of Science and Technology</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>This paper presents a solution for the Multilingual Text Detoxification task in the PAN-2024 competition of the SmurfCat team. Using data augmentation through machine translation and a special filtering procedure, we collected an additional multilingual parallel dataset for text detoxification. Using the obtained data, we fine-tuned several multilingual sequence-to-sequence models, such as mT0 and Aya, on a text detoxification task. We applied the ORPO alignment technique to the final model. Our final model has only 3.7 billion parameters and achieves state-of-the-art results for the Ukrainian language and near state-of-the-art results for other languages. In the competition, our team achieved first place in the automated evaluation with a score of 0.52 and second place in the final human evaluation with a score of 0.74.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2024</kwd>
        <kwd>Multilingual Text Detoxification</kwd>
        <kwd>mT0</kwd>
        <kwd>ORPO</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Data</title>
      <p>Initially, there were not many parallel datasets for the multilingual detoxification task. More precisely,
primarily only the Russian3 and English4 ParaDetox datasets were available, with 11 100 and 19 700
samples respectively. During the competition, the organizers published a small human-annotated
Multilingual ParaDetox5 for all languages, containing only 400 samples per language.</p>
      <p>Nevertheless, we decided to augment the provided data by automatic translation from English
to other languages. To translate the original English data, we used a GoogleTranslator model from
deep_translator6 Python package. We chose API over some of the more advanced machine translation
models because of its speed and simplicity. Also, there are not as many translators for low-resource
languages like Amharic. As a result, we obtained an additional 19 700 samples for each language.</p>
      <p>Since translation is often imperfect, we decided to perform a specific post-processing procedure. In
general, we checked the preservation of meaning after translation and the toxicity of the translated
data. First, we used the LaBSE [6] model to evaluate the similarity between translated pairs. Second, we
applied XLM-R7 toxicity classifier to check whether toxic sentences were still toxic after translation
3https://huggingface.co/datasets/s-nlp/ru_paradetox
4https://huggingface.co/datasets/s-nlp/paradetox
5https://huggingface.co/datasets/textdetox/multilingual_paradetox
6https://pypi.org/project/deep-translator/
7https://huggingface.co/textdetox/xlmr-large-toxicity-classifier
and vice versa.</p>
      <p>A distribution for both of two measures is shown on Figures 2, 3. For most samples, similarity
between original and translated samples was high enough that many samples preserved their meaning.
Regarding toxicity, many neutral sentences became toxic after translation, and many toxic sentences
became neutral. For toxicity, we set a threshold parameter to 0.9 for toxic sentences and 0.1 for neutral
sentences. The similarity threshold was set to 0.8 for all sentences.</p>
      <p>After all filtering steps, 40 500 pairs of neutral and toxic sentences were obtained. A more precise
statistic of how many samples remain after filtering is given in Table 1. According to the statistics,
Amharic lost the most samples during filtering.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>In this section, we describe our prior method, related to fine-tuning and optimization of Language
Models on the text detoxification task.</p>
      <sec id="sec-3-1">
        <title>3.1. Supervised fine-tuning</title>
        <p>As a main approach, we choose fine-tuning of various multilingual LMs. As we suggest, the most
promising models for the further fine-tuning were LMs from mT0 family. It is a family of
sequenceto-sequence Transformer models initialized from mT5 [7]. We considered that sequence-to-sequence
modeling would be more preferable in case of the text detoxification task. The mT0 family was chosen
because of its strong multilingual capabilities, so these models were adapted to each language of the
competition. We also experimented with the novel Aya-101 model [8]: a fine-tuned mT5-xl model on
a multilingual instructions.</p>
        <p>All models were tuned in an almost similar way. The learning rate was set to 1e-5, the global batch
size to 8, and the weight decay to 0.01. The cosine scheduler was used for training. In total, 4 all models
were trained during 4 epochs. All other training parameters were default according to HuggingFace
Seq2SeqTrainer. The only diference is that for the mT0-XL we updated the weights of the whole
model because our computing resources allowed it. In case of a larger model like Aya-101 or mT0-XXL,
only the LoRA adapter was trained. The setup of the LoRA adapter was as follows: r and lora alpha were
set to 32, lora dropout parameter to 0.1, other parameters were default. The best model was selected
according to the validation loss.</p>
        <p>To enforce the in-context abilities of the models, we added a specific prefix to each toxic sentence
depending on the language. As a result, we passed toxic sentences with special prefix prompt into the
model during training.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. The Best Candidate Choice</title>
        <p>During inference, we generated 10 hypotheses and selected 5 most likely ones using diverse beam
search. The number of beams was set to 10 with 5 beam groups, the diversity penalty was 2.5, the
repetition penalty was 1.2. To select the best choice, we calculated a relevance metric using a product
of similarity and toxicity scores. Similarity was calculated using LaBSE embeddings, and toxicity was
measured using the xlm-roberta-large toxicity classifier. As relevance scores were calculated, we
selected then the best candidate according to the highest score.
3.3. ORPO
Once the models were fine-tuned, we decided to further tune the model for best performance using
Odds Ratio Preference Optimization (ORPO) approach. This optimization does not need a reference
model like it is in DPO [9]. Alignment was employed on the unseen test dataset.</p>
        <p>As a preference dataset, we generated hypotheses using diverse beam search on the samples from
the test set and annotated them using the relevance score described above. Only candidates with the
highest relevance scores were selected as the chosen ones, and all others were selected as the rejected
samples.</p>
        <p>The final ORPO data set for alignment contained the prompt (toxic sentence), the rejected sample
(negative candidate), and the selected sample (best candidate). Table 3 shows a small sample of the
dataset. Since the dataset was collected, we trained the model on the dataset using the same parameters
used to train the other models. Since ORPO uses the beta parameter, it was set to 0.1. For the final
submission, we used an aligned model with the algorithm described above to select the best candidate.</p>
        <p>Detoxify: again , give me the name again, give me the name of the store again, give me the name of the store
of the store or fuck of , liar . or go away
Detoxify: Nat is just a piece of shit, Nate is just not good, ignore him.
ignore him.</p>
        <p>Nat is just a bad person, ignore him</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The final results of the automatic evaluation are shown in the Table 4. The mT0-XL with ORPO alignment
showed the best performance among all approaches from the leaderboard for all languages. Compared
to mT0-XL, a model before ORPO alignment, ORPO slightly improved the performance of the model,
increasing the average results by 0.01 points. Surprisingly, the larger models are not the best. For
example, the mT0-XXL model with 13B parameters performed even worse than the mT0-XL model
with only 3.7B parameters. Aya-101, an mT5-XXL model additionally tuned to instructional data for
diferent languages, performed worse than other models. Since Aya-101 and mT0-XXL performed even
worse on mt0-XL, we did not perform an ORPO alignment step for these models. Considering other
teams on the automatic evaluation, our checkpoints, mainly mT0-XL-ORPO and mT0-XL, are the two
best performing approaches for all languages except the Chinese language.
5. Conclusion
evaluation by the averaged Joint metric.</p>
      <p>The results of the human evaluation. The teams with the best scores were selected for the table. Joint is given as
the evaluation metric.</p>
      <p>Amharic
high-resource languages to low-resource languages without translation, as machine translation for
lowresource languages often shows low quality. A further direction for investigation is the interpretability
of models, specifically the understanding of which tokens have been replaced by the model through the
text detoxification process and the rationale behind this.
Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and
Generative AI Authorship Verification, in: Experimental IR Meets Multilinguality, Multimodality,
and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association
(CLEF 2024), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024.
[2] D. Dementieva, D. Moskovskiy, N. Babakov, A. A. Ayele, N. Rizwan, F. Schneider, X. Wang, S. M.</p>
      <p>Yimam, D. Ustalov, E. Stakovskii, A. Smirnova, A. Elnagar, A. Mukherjee, A. Panchenko, Overview
of the multilingual text detoxification task at pan 2024, in: G. Faggioli, N. Ferro, P. Galuščáková,
A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation
Forum, CEUR-WS.org, 2024.
[3] N. Muennighof, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. Le Scao, M. S. Bari, S. Shen,
Z. X. Yong, H. Schoelkopf, X. Tang, D. Radev, A. F. Aji, K. Almubarak, S. Albanie, Z. Alyafeai,
A. Webson, E. Raf, C. Rafel, Crosslingual generalization through multitask finetuning, in: A. Rogers,
J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics,
Toronto, Canada, 2023, pp. 15991–16111. URL: https://aclanthology.org/2023.acl-long.891. doi:10.
18653/v1/2023.acl-long.891.
[4] A. K. Vijayakumar, M. Cogswell, R. R. Selvaraju, Q. Sun, S. Lee, D. Crandall, D. Batra, Diverse beam
search: Decoding diverse solutions from neural sequence models, 2017. URL: https://openreview.
net/forum?id=HJV1zP5xg.
[5] J. Hong, N. Lee, J. Thorne, Orpo: Monolithic preference optimization without reference model, 2024.</p>
      <p>URL: https://arxiv.org/abs/2403.07691. arXiv:2403.07691.
[6] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, W. Wang, Language-agnostic BERT sentence embedding,
in: S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland,
May 22-27, 2022, Association for Computational Linguistics, 2022, pp. 878–891. URL: https://doi.
org/10.18653/v1/2022.acl-long.62. doi:10.18653/V1/2022.ACL-LONG.62.
[7] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Rafel, mt5: A
massively multilingual pre-trained text-to-text transformer, in: North American Chapter of the
Association for Computational Linguistics, 2020. URL: https://api.semanticscholar.org/CorpusID:
225040574.
[8] A. Üstün, V. Aryabumi, Z.-X. Yong, W.-Y. Ko, D. D’souza, G. Onilude, N. Bhandari, S. Singh, H.-L.</p>
      <p>Ooi, A. Kayid, F. Vargus, P. Blunsom, S. Longpre, N. Muennighof, M. Fadaee, J. Kreutzer, S. Hooker,
Aya model: An instruction finetuned open-access multilingual language model, arXiv preprint
arXiv:2402.07827 (2024).
[9] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, C. Finn, Direct preference
optimization: Your language model is secretly a reward model, in: Thirty-seventh Conference on Neural
Information Processing Systems, 2023. URL: https://openreview.net/forum?id=HPuSIXJaa9.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-Author</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>