SmurfCat at PAN 2024 TextDetox: Alignment of
                         Multilingual Transformers for Text Detoxification
                         Notebook for the PAN Lab at CLEF 2024

                         Elisei Rykov1,* , Konstantin Zaytsev2,* , Ivan Anisimov1 and Alexandr Voronin1
                         1
                             Skolkovo Institute of Science and Technology, Russia
                         2
                             HSE University, Russia


                                         Abstract
                                         This paper presents a solution for the Multilingual Text Detoxification task in the PAN-2024 competition of the
                                         SmurfCat team. Using data augmentation through machine translation and a special filtering procedure, we
                                         collected an additional multilingual parallel dataset for text detoxification. Using the obtained data, we fine-tuned
                                         several multilingual sequence-to-sequence models, such as mT0 and Aya, on a text detoxification task. We applied
                                         the ORPO alignment technique to the final model. Our final model has only 3.7 billion parameters and achieves
                                         state-of-the-art results for the Ukrainian language and near state-of-the-art results for other languages. In the
                                         competition, our team achieved first place in the automated evaluation with a score of 0.52 and second place in
                                         the final human evaluation with a score of 0.74.

                                         Keywords
                                         PAN 2024, Multilingual Text Detoxification, mT0, ORPO


                         1. Introduction
                         Multilingual text detoxification is a challenging subtask within text style transfer. The most difficult part
                         is the adaptation of such a system to low-resource languages. The concept of PAN-2024 Multilingual
                         Text Detoxification Task [1, 2] is to develop a multilingual text detoxification system for 9 languages:
                         Amharic, Arabic, German, Spanish, Hindi, Chinese, Russian, Ukrainian and English.
                            This paper describes the solution of the SmurfCat team, which achieved first place with an average
                         score of 0.52 in the automatic evaluation and second place with a score of 0.74 in the manual human
                         evaluation. Our solution is based on the mT0 model family [3], which has powerful multilingual
                         capabilities. We fine-tuned all our selected models to each language of the competition, and applied
                         various data augmentation techniques. To improve detoxification, we performed hypothesis filtering
                         using the diverse beam search algorithm [4]. Finally, we applied ORPO [5] alignment to enforce
                         model predictions. Our 3.7-billion-parameter language model demonstrates state-of-the-art results for
                         Ukrainian and near state-of-the-art results for other languages. We published the final best-performing
                         model on the HuggingFace Hub1 . You can also find the training scripts and the extended data on
                         GitHub2 .
                            The rest of the paper is organized as follows: Section 2 discusses data augmentation strategies, Section
                         3 describes our final solution, and Section 4 presents the results and discussion.


                          CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                         *
                           These authors contributed equally.
                          $ Elisei.Rykov@skoltech.ru (E. Rykov); kzaytsev@hse.ru (K. Zaytsev); Ivan.Anisimov@skoltech.ru (I. Anisimov);
                          Alexandr.Voronin@skoltech.ru (A. Voronin)
                                      © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                         1
                           https://hf.co/s-nlp/mt0-xl-detox-orpo
                         2
                           https://github.com/s-nlp/multilingual-transformer-detoxification

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: An overview of our approach. We used different datasets, fine-tuned the whole mT0-XL model and
finally performed the ORPO alignment step.


2. Data
Initially, there were not many parallel datasets for the multilingual detoxification task. More precisely,
primarily only the Russian3 and English4 ParaDetox datasets were available, with 11 100 and 19 700
samples respectively. During the competition, the organizers published a small human-annotated
Multilingual ParaDetox5 for all languages, containing only 400 samples per language.
   Nevertheless, we decided to augment the provided data by automatic translation from English
to other languages. To translate the original English data, we used a GoogleTranslator model from
deep_translator6 Python package. We chose API over some of the more advanced machine translation
models because of its speed and simplicity. Also, there are not as many translators for low-resource
languages like Amharic. As a result, we obtained an additional 19 700 samples for each language.


Figure 2: Toxicity of translations                           Figure 3: Similarity of translations


  Since translation is often imperfect, we decided to perform a specific post-processing procedure. In
general, we checked the preservation of meaning after translation and the toxicity of the translated
data. First, we used the LaBSE [6] model to evaluate the similarity between translated pairs. Second, we
applied XLM-R7 toxicity classifier to check whether toxic sentences were still toxic after translation
3
  https://huggingface.co/datasets/s-nlp/ru_paradetox
4
  https://huggingface.co/datasets/s-nlp/paradetox
5
  https://huggingface.co/datasets/textdetox/multilingual_paradetox
6
  https://pypi.org/project/deep-translator/
7
  https://huggingface.co/textdetox/xlmr-large-toxicity-classifier
and vice versa.
   A distribution for both of two measures is shown on Figures 2, 3. For most samples, similarity
between original and translated samples was high enough that many samples preserved their meaning.
Regarding toxicity, many neutral sentences became toxic after translation, and many toxic sentences
became neutral. For toxicity, we set a threshold parameter to 0.9 for toxic sentences and 0.1 for neutral
sentences. The similarity threshold was set to 0.8 for all sentences.
   After all filtering steps, 40 500 pairs of neutral and toxic sentences were obtained. A more precise
statistic of how many samples remain after filtering is given in Table 1. According to the statistics,
Amharic lost the most samples during filtering.

Table 1
Statistics of number of remaining samples after filtering.
     Language         Amharic     Arabic   German      Spanish   Hindi   Russian   Ukrainian   Chinese
     # of samples     1 323       3 190    7 511       7 555     4 844   7 458     5 350       3 274

  Our final dataset mixture is shown in Table 2. In total, 74 900 samples were used in the training
process.


3. Method
In this section, we describe our prior method, related to fine-tuning and optimization of Language
Models on the text detoxification task.

3.1. Supervised fine-tuning
As a main approach, we choose fine-tuning of various multilingual LMs. As we suggest, the most
promising models for the further fine-tuning were LMs from mT0 family. It is a family of sequence-
to-sequence Transformer models initialized from mT5 [7]. We considered that sequence-to-sequence
modeling would be more preferable in case of the text detoxification task. The mT0 family was chosen
because of its strong multilingual capabilities, so these models were adapted to each language of the
competition. We also experimented with the novel Aya-101 model [8]: a fine-tuned mT5-xl model on
a multilingual instructions.
   All models were tuned in an almost similar way. The learning rate was set to 1e-5, the global batch
size to 8, and the weight decay to 0.01. The cosine scheduler was used for training. In total, 4 all models
were trained during 4 epochs. All other training parameters were default according to HuggingFace
Seq2SeqTrainer. The only difference is that for the mT0-XL we updated the weights of the whole
model because our computing resources allowed it. In case of a larger model like Aya-101 or mT0-XXL,
only the LoRA adapter was trained. The setup of the LoRA adapter was as follows: r and lora alpha were
set to 32, lora dropout parameter to 0.1, other parameters were default. The best model was selected
according to the validation loss.
   To enforce the in-context abilities of the models, we added a specific prefix to each toxic sentence
depending on the language. As a result, we passed toxic sentences with special prefix prompt into the
model during training.

3.2. The Best Candidate Choice
During inference, we generated 10 hypotheses and selected 5 most likely ones using diverse beam
search. The number of beams was set to 10 with 5 beam groups, the diversity penalty was 2.5, the
repetition penalty was 1.2. To select the best choice, we calculated a relevance metric using a product
of similarity and toxicity scores. Similarity was calculated using LaBSE embeddings, and toxicity was
measured using the xlm-roberta-large toxicity classifier. As relevance scores were calculated, we
selected then the best candidate according to the highest score.
Table 2
Training dataset mixture.
                                       Dataset                   # of samples
                                       En-ParaDetox              19 700
                                       Ru-ParaDetox              11 100
                                       Translations              40 500
                                       Multilingual ParaDetox    3 600
                                       Total                     74 900


3.3. ORPO
Once the models were fine-tuned, we decided to further tune the model for best performance using
Odds Ratio Preference Optimization (ORPO) approach. This optimization does not need a reference
model like it is in DPO [9]. Alignment was employed on the unseen test dataset.
  As a preference dataset, we generated hypotheses using diverse beam search on the samples from
the test set and annotated them using the relevance score described above. Only candidates with the
highest relevance scores were selected as the chosen ones, and all others were selected as the rejected
samples.
  The final ORPO data set for alignment contained the prompt (toxic sentence), the rejected sample
(negative candidate), and the selected sample (best candidate). Table 3 shows a small sample of the
dataset. Since the dataset was collected, we trained the model on the dataset using the same parameters
used to train the other models. Since ORPO uses the beta parameter, it was set to 0.1. For the final
submission, we used an aligned model with the algorithm described above to select the best candidate.

Table 3
Samples for the ORPO tuning.
Prompt                                   Chosen                                 Rejected
Detoxify: She really can’t be such a     She really can’t be such a bad per-    really can’t be such a bad person,
bitch, can she???                        son, can she???                        can she???
Detoxify: hopefully turks are not        Turkish people are not naive           hopefuly turks are not naive
dumb enough to hand it to him            enough to hand it to him.              enough to hand it to him.
Detoxify: again , give me the name       again, give me the name of the store   again, give me the name of the store
of the store or fuck off , liar .        or go away
Detoxify: Nat is just a piece of shit,   Nate is just not good, ignore him.     Nat is just a bad person, ignore him
ignore him.


4. Results
The final results of the automatic evaluation are shown in the Table 4. The mT0-XL with ORPO alignment
showed the best performance among all approaches from the leaderboard for all languages. Compared
to mT0-XL, a model before ORPO alignment, ORPO slightly improved the performance of the model,
increasing the average results by 0.01 points. Surprisingly, the larger models are not the best. For
example, the mT0-XXL model with 13B parameters performed even worse than the mT0-XL model
with only 3.7B parameters. Aya-101, an mT5-XXL model additionally tuned to instructional data for
different languages, performed worse than other models. Since Aya-101 and mT0-XXL performed even
worse on mt0-XL, we did not perform an ORPO alignment step for these models. Considering other
teams on the automatic evaluation, our checkpoints, mainly mT0-XL-ORPO and mT0-XL, are the two
best performing approaches for all languages except the Chinese language.
Table 4
The results of the automatic evaluation. The teams with the best scores were selected for the table. Joint is given
as the evaluation metric.
                                                             Language
        Team                                                                                                    Avg J
                      Amharic   Arabic   German    English   Spanish    Hindi   Russian   Ukrainian   Chinese
Our (mT0-XL-ORPO)     0.378     0.626    0.678     0.602     0.562      0.355   0.634     0.692       0.178     0.523
Our (mT0-XL)          0.374     0.617    0.669     0.593     0.555      0.352   0.628     0.686       0.165     0.515
Our (mT0-XXL-LoRA)    0.361     0.594    0.639     0.591     0.548      0.345   0.605     0.660       0.159     0.500
nikita.sushko         0.328     0.575    0.592     0.553     0.480      0.241   0.570     0.668       0.176     0.465
VitalyProtasov        0.311     0.523    0.502     0.531     0.472      0.320   0.542     0.629       0.175     0.445
erehulka              0.287     0.536    0.575     0.543     0.497      0.185   0.529     0.602       0.160     0.435
Our (Aya-101-LoRA)    0.301     0.526    0.530     0.529     0.475      0.223   0.541     0.586       0.108     0.424
ansafronov            0.270     0.456    0.362     0.506     0.319      0.133   0.507     0.328       0.178     0.340


   The Table 5 shows human evaluation results. Our detoxification model for Ukrainian achieved the
highest human evaluation score by a wide margin, indicating that our approach is the state-of-the-art
for this language. Overall, our best performing checkpoint is the top-2 approach according to the human
evaluation by the averaged Joint metric.

Table 5
The results of the human evaluation. The teams with the best scores were selected for the table. Joint is given as
the evaluation metric.
                                                             Language
       Team                                                                                                     Avg J
                     Amharic    Arabic   German   English    Spanish    Hindi   Russian   Ukrainian   Chinese
Human Reference      0.85       0.82     0.71     0.88       0.79       0.97    0.80      0.90        0.93      0.85
SomethingAwful       0.71       0.74     0.89     0.86       0.83       0.86    0.84      0.69        0.53      0.77
Our (mT0-XL-ORPO)    0.71       0.82     0.70     0.83       0.73       0.68    0.76      0.84        0.60      0.74
VitalyProtasov       0.68       0.79     0.77     0.69       0.81       0.87    0.73      0.67        0.49      0.72
nikita.sushko        0.68       0.89     0.79     0.70       0.62       0.84    0.74      0.67        0.47      0.71
erehulka             0.69       0.78     0.85     0.88       0.71       0.52    0.65      0.63        0.68      0.69
mkrisnai             0.49       0.63     0.70     0.89       0.83       0.73    0.78      0.73        0.34      0.68
d1n910               0.61       0.44     0.77     0.91       0.77       0.34    0.71      0.50        0.84      0.65
ZhongyuLuo           0.72       0.49     0.01     0.73       0.52       0.49    0.68      0.42        0.56      0.51


5. Conclusion
In conclusion, our system demonstrated a strong pipeline for augmenting training data for low-resource
languages and further fine-tuning a relatively small 3.7 billion parameter language model for the text
detoxification task. Our future research may consider how to adapt text detoxification capabilities from
high-resource languages to low-resource languages without translation, as machine translation for low-
resource languages often shows low quality. A further direction for investigation is the interpretability
of models, specifically the understanding of which tokens have been replaced by the model through the
text detoxification process and the rationale behind this.


References
[1] J. Bevendorff, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag, M. Fröbe, D. Korenčić,
    M. Mayerl, A. Mukherjee, A. Panchenko, M. Potthast, F. Rangel, P. Rosso, A. Smirnova, E. Stamatatos,
    B. Stein, M. Taulé, D. Ustalov, M. Wiegmann, E. Zangerle, Overview of PAN 2024: Multi-Author
    Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and
    Generative AI Authorship Verification, in: Experimental IR Meets Multilinguality, Multimodality,
    and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association
    (CLEF 2024), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024.
[2] D. Dementieva, D. Moskovskiy, N. Babakov, A. A. Ayele, N. Rizwan, F. Schneider, X. Wang, S. M.
    Yimam, D. Ustalov, E. Stakovskii, A. Smirnova, A. Elnagar, A. Mukherjee, A. Panchenko, Overview
    of the multilingual text detoxification task at pan 2024, in: G. Faggioli, N. Ferro, P. Galuščáková,
    A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation
    Forum, CEUR-WS.org, 2024.
[3] N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. Le Scao, M. S. Bari, S. Shen,
    Z. X. Yong, H. Schoelkopf, X. Tang, D. Radev, A. F. Aji, K. Almubarak, S. Albanie, Z. Alyafeai,
    A. Webson, E. Raff, C. Raffel, Crosslingual generalization through multitask finetuning, in: A. Rogers,
    J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for
    Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics,
    Toronto, Canada, 2023, pp. 15991–16111. URL: https://aclanthology.org/2023.acl-long.891. doi:10.
    18653/v1/2023.acl-long.891.
[4] A. K. Vijayakumar, M. Cogswell, R. R. Selvaraju, Q. Sun, S. Lee, D. Crandall, D. Batra, Diverse beam
    search: Decoding diverse solutions from neural sequence models, 2017. URL: https://openreview.
    net/forum?id=HJV1zP5xg.
[5] J. Hong, N. Lee, J. Thorne, Orpo: Monolithic preference optimization without reference model, 2024.
    URL: https://arxiv.org/abs/2403.07691. arXiv:2403.07691.
[6] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, W. Wang, Language-agnostic BERT sentence embedding,
    in: S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the
    Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland,
    May 22-27, 2022, Association for Computational Linguistics, 2022, pp. 878–891. URL: https://doi.
    org/10.18653/v1/2022.acl-long.62. doi:10.18653/V1/2022.ACL-LONG.62.
[7] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Raffel, mt5: A
    massively multilingual pre-trained text-to-text transformer, in: North American Chapter of the
    Association for Computational Linguistics, 2020. URL: https://api.semanticscholar.org/CorpusID:
    225040574.
[8] A. Üstün, V. Aryabumi, Z.-X. Yong, W.-Y. Ko, D. D’souza, G. Onilude, N. Bhandari, S. Singh, H.-L.
    Ooi, A. Kayid, F. Vargus, P. Blunsom, S. Longpre, N. Muennighoff, M. Fadaee, J. Kreutzer, S. Hooker,
    Aya model: An instruction finetuned open-access multilingual language model, arXiv preprint
    arXiv:2402.07827 (2024).
[9] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, C. Finn, Direct preference optimiza-
    tion: Your language model is secretly a reward model, in: Thirty-seventh Conference on Neural
    Information Processing Systems, 2023. URL: https://openreview.net/forum?id=HPuSIXJaa9.