1. Introduction

SmurfCat at PAN 2024 TextDetox: Alignment of Multilingual Transformers for Text Detoxification

Elisei Rykov

Konstantin Zaytsev

Ivan Anisimov

Alexandr Voronin

1 0 HSE University , Russia 1 Skolkovo Institute of Science and Technology , Russia

2024

This paper presents a solution for the Multilingual Text Detoxification task in the PAN-2024 competition of the SmurfCat team. Using data augmentation through machine translation and a special filtering procedure, we collected an additional multilingual parallel dataset for text detoxification. Using the obtained data, we fine-tuned several multilingual sequence-to-sequence models, such as mT0 and Aya, on a text detoxification task. We applied the ORPO alignment technique to the final model. Our final model has only 3.7 billion parameters and achieves state-of-the-art results for the Ukrainian language and near state-of-the-art results for other languages. In the competition, our team achieved first place in the automated evaluation with a score of 0.52 and second place in the final human evaluation with a score of 0.74.

eol>PAN 2024 Multilingual Text Detoxification mT0 ORPO

1. Introduction 2. Data

Initially, there were not many parallel datasets for the multilingual detoxification task. More precisely, primarily only the Russian3 and English4 ParaDetox datasets were available, with 11 100 and 19 700 samples respectively. During the competition, the organizers published a small human-annotated Multilingual ParaDetox5 for all languages, containing only 400 samples per language.

Nevertheless, we decided to augment the provided data by automatic translation from English to other languages. To translate the original English data, we used a GoogleTranslator model from deep_translator6 Python package. We chose API over some of the more advanced machine translation models because of its speed and simplicity. Also, there are not as many translators for low-resource languages like Amharic. As a result, we obtained an additional 19 700 samples for each language.

Since translation is often imperfect, we decided to perform a specific post-processing procedure. In general, we checked the preservation of meaning after translation and the toxicity of the translated data. First, we used the LaBSE [6] model to evaluate the similarity between translated pairs. Second, we applied XLM-R7 toxicity classifier to check whether toxic sentences were still toxic after translation 3https://huggingface.co/datasets/s-nlp/ru_paradetox 4https://huggingface.co/datasets/s-nlp/paradetox 5https://huggingface.co/datasets/textdetox/multilingual_paradetox 6https://pypi.org/project/deep-translator/ 7https://huggingface.co/textdetox/xlmr-large-toxicity-classifier and vice versa.

A distribution for both of two measures is shown on Figures 2, 3. For most samples, similarity between original and translated samples was high enough that many samples preserved their meaning. Regarding toxicity, many neutral sentences became toxic after translation, and many toxic sentences became neutral. For toxicity, we set a threshold parameter to 0.9 for toxic sentences and 0.1 for neutral sentences. The similarity threshold was set to 0.8 for all sentences.

After all filtering steps, 40 500 pairs of neutral and toxic sentences were obtained. A more precise statistic of how many samples remain after filtering is given in Table 1. According to the statistics, Amharic lost the most samples during filtering.

3. Method

In this section, we describe our prior method, related to fine-tuning and optimization of Language Models on the text detoxification task.

3.1. Supervised fine-tuning

As a main approach, we choose fine-tuning of various multilingual LMs. As we suggest, the most promising models for the further fine-tuning were LMs from mT0 family. It is a family of sequenceto-sequence Transformer models initialized from mT5 [7]. We considered that sequence-to-sequence modeling would be more preferable in case of the text detoxification task. The mT0 family was chosen because of its strong multilingual capabilities, so these models were adapted to each language of the competition. We also experimented with the novel Aya-101 model [8]: a fine-tuned mT5-xl model on a multilingual instructions.

All models were tuned in an almost similar way. The learning rate was set to 1e-5, the global batch size to 8, and the weight decay to 0.01. The cosine scheduler was used for training. In total, 4 all models were trained during 4 epochs. All other training parameters were default according to HuggingFace Seq2SeqTrainer. The only diference is that for the mT0-XL we updated the weights of the whole model because our computing resources allowed it. In case of a larger model like Aya-101 or mT0-XXL, only the LoRA adapter was trained. The setup of the LoRA adapter was as follows: r and lora alpha were set to 32, lora dropout parameter to 0.1, other parameters were default. The best model was selected according to the validation loss.

To enforce the in-context abilities of the models, we added a specific prefix to each toxic sentence depending on the language. As a result, we passed toxic sentences with special prefix prompt into the model during training.

3.2. The Best Candidate Choice

During inference, we generated 10 hypotheses and selected 5 most likely ones using diverse beam search. The number of beams was set to 10 with 5 beam groups, the diversity penalty was 2.5, the repetition penalty was 1.2. To select the best choice, we calculated a relevance metric using a product of similarity and toxicity scores. Similarity was calculated using LaBSE embeddings, and toxicity was measured using the xlm-roberta-large toxicity classifier. As relevance scores were calculated, we selected then the best candidate according to the highest score. 3.3. ORPO Once the models were fine-tuned, we decided to further tune the model for best performance using Odds Ratio Preference Optimization (ORPO) approach. This optimization does not need a reference model like it is in DPO [9]. Alignment was employed on the unseen test dataset.

As a preference dataset, we generated hypotheses using diverse beam search on the samples from the test set and annotated them using the relevance score described above. Only candidates with the highest relevance scores were selected as the chosen ones, and all others were selected as the rejected samples.

The final ORPO data set for alignment contained the prompt (toxic sentence), the rejected sample (negative candidate), and the selected sample (best candidate). Table 3 shows a small sample of the dataset. Since the dataset was collected, we trained the model on the dataset using the same parameters used to train the other models. Since ORPO uses the beta parameter, it was set to 0.1. For the final submission, we used an aligned model with the algorithm described above to select the best candidate.

Detoxify: again , give me the name again, give me the name of the store again, give me the name of the store of the store or fuck of , liar . or go away Detoxify: Nat is just a piece of shit, Nate is just not good, ignore him. ignore him.

Nat is just a bad person, ignore him

4. Results

The final results of the automatic evaluation are shown in the Table 4. The mT0-XL with ORPO alignment showed the best performance among all approaches from the leaderboard for all languages. Compared to mT0-XL, a model before ORPO alignment, ORPO slightly improved the performance of the model, increasing the average results by 0.01 points. Surprisingly, the larger models are not the best. For example, the mT0-XXL model with 13B parameters performed even worse than the mT0-XL model with only 3.7B parameters. Aya-101, an mT5-XXL model additionally tuned to instructional data for diferent languages, performed worse than other models. Since Aya-101 and mT0-XXL performed even worse on mt0-XL, we did not perform an ORPO alignment step for these models. Considering other teams on the automatic evaluation, our checkpoints, mainly mT0-XL-ORPO and mT0-XL, are the two best performing approaches for all languages except the Chinese language. 5. Conclusion evaluation by the averaged Joint metric.

The results of the human evaluation. The teams with the best scores were selected for the table. Joint is given as the evaluation metric.

Amharic high-resource languages to low-resource languages without translation, as machine translation for lowresource languages often shows low quality. A further direction for investigation is the interpretability of models, specifically the understanding of which tokens have been replaced by the model through the text detoxification process and the rationale behind this. Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024. [2] D. Dementieva, D. Moskovskiy, N. Babakov, A. A. Ayele, N. Rizwan, F. Schneider, X. Wang, S. M.

Yimam, D. Ustalov, E. Stakovskii, A. Smirnova, A. Elnagar, A. Mukherjee, A. Panchenko, Overview of the multilingual text detoxification task at pan 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR-WS.org, 2024. [3] N. Muennighof, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. Le Scao, M. S. Bari, S. Shen, Z. X. Yong, H. Schoelkopf, X. Tang, D. Radev, A. F. Aji, K. Almubarak, S. Albanie, Z. Alyafeai, A. Webson, E. Raf, C. Rafel, Crosslingual generalization through multitask finetuning, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Toronto, Canada, 2023, pp. 15991–16111. URL: https://aclanthology.org/2023.acl-long.891. doi:10. 18653/v1/2023.acl-long.891. [4] A. K. Vijayakumar, M. Cogswell, R. R. Selvaraju, Q. Sun, S. Lee, D. Crandall, D. Batra, Diverse beam search: Decoding diverse solutions from neural sequence models, 2017. URL: https://openreview. net/forum?id=HJV1zP5xg. [5] J. Hong, N. Lee, J. Thorne, Orpo: Monolithic preference optimization without reference model, 2024.

URL: https://arxiv.org/abs/2403.07691. arXiv:2403.07691. [6] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, W. Wang, Language-agnostic BERT sentence embedding, in: S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Association for Computational Linguistics, 2022, pp. 878–891. URL: https://doi. org/10.18653/v1/2022.acl-long.62. doi:10.18653/V1/2022.ACL-LONG.62. [7] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Rafel, mt5: A massively multilingual pre-trained text-to-text transformer, in: North American Chapter of the Association for Computational Linguistics, 2020. URL: https://api.semanticscholar.org/CorpusID: 225040574. [8] A. Üstün, V. Aryabumi, Z.-X. Yong, W.-Y. Ko, D. D’souza, G. Onilude, N. Bhandari, S. Singh, H.-L.

Ooi, A. Kayid, F. Vargus, P. Blunsom, S. Longpre, N. Muennighof, M. Fadaee, J. Kreutzer, S. Hooker, Aya model: An instruction finetuned open-access multilingual language model, arXiv preprint arXiv:2402.07827 (2024). [9] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, C. Finn, Direct preference optimization: Your language model is secretly a reward model, in: Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL: https://openreview.net/forum?id=HPuSIXJaa9.

[1]

Bevendorf ,

X. B.

Casals ,

Chulvi ,

Dementieva ,

Elnagar ,

Freitag ,

Fröbe ,

Korenčić ,

Mayerl ,

Mukherjee ,

Panchenko ,

Potthast ,

Rangel ,

Rosso ,

Smirnova ,

Stamatatos ,

Stein ,

Taulé ,

Ustalov ,

Wiegmann , E. Zangerle, Overview of PAN 2024: Multi-Author