Linguistic_Hygenist at PAN 2024 TextDetox: HybridDetox - A Combination of Supervised and Unsupervised Methods for Effective Multilingual Text Detoxification Notebook for PAN at CLEF 2024 Susmita Gangopadhyay, M.Taimoor Khan and Hajira Jabeen GESIS Leibniz Institute for the Social Sciences, Köln, Germany Abstract Text detoxification is the process of revising toxic comments to neutralize their toxicity by eliminating inappro- priate content, while preserving the meaning of the message. Toxicity can manifest in various forms, including the use of curse words, insults, hate speech, cyberbullying, or trolling. The present-day social media landscape is rife with toxic comments, necessitating a text detoxification system. Unlike the conventional practice of blocking offensive content through moderation, detoxification preserves the valuable information contained in the message. This paper details our approach for multilingual text detoxification as part of the Multilingual Text Detoxification (TextDetox) 2024 [1] Challenge organized by the PAN lab [2]. Our approach consists of two components i.e., the Supervised T5-BART Module for English and Russian languages with parallel corpora and the Unsupervised PLM Detoxifier for the other seven languages. The Supervised T5-BART Module uses T5 and BART as base models, with exponentially weighted moving average and ROUGE scores as loss functions for Russian and English, respectively. The Unsupervised PLM Detoxifier utilizes hashing techniques, log odds ratio, and linguistic patterns to identify and conceal toxic words across all languages. Additionally, it incorporates a mask prediction model to maintain the original sentence’s meaning intact. Our proposed approach has achieved an average score of 0.315 across all languages, exhibiting outstanding performance in English, German, and Ukrainian for style transfer, content preservation, and fluency. Keywords PAN 2024, text-detoxification, toxicity, mask-prediction, sentence-similarity, sequence-to-sequence models, CEUR-WS 1. Introduction Internet access has revolutionized information dissemination, providing unprecedented opportunities worldwide. However, this rapid and uncontrolled proliferation of information containing user-generated content could also contain toxic information that is considered harmful, offensive, or inappropriate. Text detoxification is a critical endeavor in the contemporary digital landscape, where the proliferation of toxic comments poses significant challenges to online discourse [3]. Detoxification process involves the meticulous revision of toxic comments to neutralize their toxicity while ensuring that the essence of the original message remains intact [4]. Toxicity can manifest in numerous forms, including the use of curse words, insults, hate speech, cyberbullying, or trolling, contributing to an unhealthy online environment [5]. This pervasive toxicity underscores the urgent need for effective text detoxification systems to maintain a healthier online ecosystem [6]. Identification of toxicity in text is an active area of research. Today, social networks such as Facebook, Instagram are trying to address the problem of toxicity. However, they usually simply block offensive content through moderation [7]. Text detoxification prioritizes the preservation of valuable information within the message while neutralizing its toxicity. Significant progress has been made in detecting offensive or toxic speech. The supervised text CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France $ susmita.gangopadhyay@gesis.org (S. Gangopadhyay); taimoor.khan@gesis.org (M.Taimoor Khan); hajira.jabeen@gesis.org (H. Jabeen)  0009-0009-1520-9070 (S. Gangopadhyay); 0000-0002-6542-9217 (M.Taimoor Khan); 0000-0003-1476-2121 (H. Jabeen) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings detoxification techniques are used for languages with abundant resources having parallel corpora [8]. On the other hand, the unsupervised techniques target languages with limited resources by employing alternative methods [9]. In the realm of multilingual text detoxification, existing approaches typically adopt a combination of supervised and unsupervised techniques [10]. The integration of pretrained models has been instrumental in advancing text detoxification efforts [11]. These models, trained on vast amounts of data, possess a remarkable ability to understand and generate human-like text across various languages. However, despite their potential, several open challenges like the inability to generalize to different contexts, inefficiency to handle implicit and subtle toxicity, and below-par performance in multilingual setup remains [12]. Adapting pretrained models to diverse languages and cultural contexts while ensuring their effectiveness in detecting and neutralizing toxic content presents a significant hurdle [13] due to the continuously evolving language or presence of sarcasm, innuendo, or coded language in the text [14]. Our proposed approach aims to tackle multilingual detoxification by adopting a hybrid method of supervised named Supervised T5-BART Module and unsupervised modules named Unsupevised PLM Detoxifier. The Supervised T5-BART Module fine-tunes T5 Seq2Seq model for Russian, which was originally trained in a teacher-forcing style for multiple NLP tasks like summarization, translation, and text generation. It uses an exponential weighted moving average (EWMA) score for loss evaluation. Additionally, it fine-tunes BART model for English using ROUGE scores for loss evaluation. Meanwhile, the Unsupervised PLM Detoxifier adopts a multi-step process, including the masking of multiple toxic tokens and predicting a suitable mask replacement while still preserving the meaning. By leveraging both supervised and unsupervised methods, our approach offers a robust and versatile solution to the complex problem of multilingual text detoxification. In the subsequent sections, we describe the problem statement, related previous research, our proposed approach, and present some examples from our results. In addition, we also share our vision of future work that could be adopted in this research direction. 2. Problem Definition The competition expects a text detoxification system for 9 languages from different linguistic families. Parallel training corpora of several thousand toxic-detoxified pairs are available only for English and Russian languages. For the remaining 7 languages—Spanish, German, Chinese, Arabic, Hindi, Ukrainian, and Amharic—only texts containing toxic content were provided. For all 9 languages, a list of prominent toxic lexicons was provided, varying in number. For languages like English and Russian where parallel training corpora was available, fine-tuning of any text-generation model was allowed. The main challenge of this competition was to use a mix of supervised and unsupervised approaches to develop a multilingual text detoxification system. The evaluation was based on both automatic methods such as duplication, deletion, and backtranslation as mentioned on the challenge website1 as well as manual verification of the detoxified text. 3. Related Work There have been numerous studies and shared tasks focusing on toxicity detection, particularly for English language. One of the earliest and most notable efforts came from several Kaggle competitions organized by the Jigsaw/Conversation AI team, which included the “Toxic Comment Classification Challenge”2 in 2018, the “Unintended Bias in Toxicity Classification Challenge”3 in 2019, and the “Multilingual Toxic Comment Classification Challenge”4 in 2021. These competitions provided some of the largest datasets for English toxicity detection, covering multiple types of toxicity such as toxic, 1 https://pan.webis.de/clef24/pan24-web/text-detoxification.html 2 https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge 3 https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification 4 https://www.kaggle.com/code/bond005/siamese-xlm-r-for-multilingual-sentiment-analysis obscene, threat, insult, and identity hate, along with multilingual test sets for other languages like Spanish, French, Italian, and Russian. Starting in 2019, the detection of toxicity and offensive language has been a major focus at SemEval. This began with the SemEval-2019 Task 6 and continued with the SemEval-2020 Task 12, both centered around identifying and categorizing offensive language in Social Media (OffensEval), which garnered considerable interest and participation. The emphasis on toxicity persisted with the SemEval-2021 Task on Toxic Spans Detection. This task was designed to pinpoint the exact spans within a text that make it toxic, providing valuable assistance to human moderators who have to manage lengthy and potentially harmful comments. In 2022, the arena of toxicity detection continued to buzz with activity, featuring events such as the “Multimedia Automatic Misogyny Identification (MAMI)” competition. This unique challenge focused on identifying misogynous memes, utilizing a comprehensive analysis of both textual content and accompanying images. By shedding light on the pervasive issue of systemic gender inequality and discrimination against women in online spaces. This competition played a pivotal role in raising awareness and fostering discussions on these critical issues. Apart from these shared tasks and competitions, several other research works focus on the task of toxicity identification and text detoxification. Focus has been on utilizing Deep Learning models like LSTM [15], utilization of embedding models [16], and incorporation of context [17] in the detection of toxic texts. Detoxification is generally framed as a style transfer from toxic to neutral (non-toxic) style, using parallel datasets labeled for toxicity. For example, Logacheva [8] created such a parallel corpus for the English language as a resource for utilization in the detoxification task. Researchers such as Laugier [18] used pretrained text-to-text transformer trained on civil comments dataset to create fluent and neutral sentences from toxic ones. Detoxification efforts often rely on style transfer models tested in other domains. For example, fine-tuning autoencoders with additional style classification and cycle-consistency losses [18] and applying point-wise corrections and seq2seq models to improve text fluency and style [9]. In terms of multilingual and low-resource languages, significant research has been conducted in multilingual text generation [19], language agnostic sentence embeddings [20], and translation of low-resource languages [21]. However, multilingual text detoxification remains a challenge that is relatively under-explored and is still active. A recent challenge, RUSSE-2022 [22], focused solely on detoxifying Russian texts. Our approach contributes to this unique and evolving area of research by proposing a unified pipeline for detoxification across multiple languages, including low-resource ones. 4. HybridDetox Pipeline We propose a method that allows us to effectively address the challenge of detoxification across all languages in the dataset. Our proposed methodology is a hybrid of supervised and unsupervised approaches. Our pipeline takes toxic text as input, processes it, and rephrases it into detoxified text. Figure 1 demonstrates the entire detoxification pipeline for the languages in study. 4.1. Language Detection Module The first step a toxic text passes through is a language detection module. Although sentences and their corresponding languages were provided in the test data, the language detection module was added to simulate a real production scenario. We used the Python langdetect5 library for this purpose. If the detected language is English or Russian the text is forwarded through the Supervised T5-BART Module. For the remaining seven languages the text is passed through an Unsupervised PLM Detoxifier. 5 https://pypi.org/project/langdetect/ Input: Toxic Text EN/RU Output: Detect Language Supervised T5-BART Module Detoxified Text Unsupervised PLM Detoxifier Other Languages Figure 1: Detoxification Pipeline for all languages 4.2. Supervised T5-BART Module Supervised T5-BART Module fine-tunes classifiers for English and Russian having parallel corpora. We used T5 (Text-to-Text Transfer Transformer) model as our base model for the Russian language [23]. T5 is designed around the innovative concept of having a single architecture across diverse tasks to benefit from transfer learning. It is trained on large-scale diverse datasets that enhanced its ability to understand and generate close to human-like text. T5 has demonstrated significant advancement for multiple NLP text generation-related tasks e.g., translation, text summarization, etc. We fine-tuned T5 on parallel corpora for Russian using exponentially weighted moving average (EWMA) loss function. It puts more emphasis on the recently generated text and is not effected by extreme values. We employed BART for the English parallel corpora and fine-tuned for text detoxification [24]. BART model is effective for different text generation and comprehension tasks. Its architecture is flexible that facilitates fine-tuning for specific tasks with parallel corpora. BART has encoder-decoder architecture where the encoder is similar to BERT while the decoder is a GPT model. During fine-tuning, the models are exposed to a labeled dataset containing pairs of toxic and detoxified texts. ROUGE measures is used as loss measure to evaluate the quality of the generated text. It compares n-grams between generated texts and label text with higher overlap desired. We computed ROUGE-1, ROUGE-2 and ROUGE-L to train multiple models where the same measure were used to pick the best model. 4.3. Unsupervised PLM Detoxifier Unsupervised PLM Detoxifier supports languages without parallel corpora. It is trained for 7 languages including low-resource languages i.e., Chinese, Amharic, German, Hindi, Arabic, Ukrainian, and Spanish. The working of Unsupervised PLM Detoxifier is explained with the following submodules. 4.3.1. Toxic Words Identification and Masking To identify toxic words in the sentences, we adopted a combination of hashing-based techniques and log-odds ratio. As a starting point, we utilized the list of toxic lexicons provided in the challenge6 . Each language has a list of toxic lexicons containing prominent curse words specific to that language. We employed a hashing-based sequence-matching mechanism7 to identify words similar to these lexicons beyond a certain threshold. These identified toxic words were then removed from the sentences and replaced with masks. Suitable threshold values 𝑡1 are identified based on manual evaluation. In the next step, our approach relied on the principle that curse words are relatively rare and would appear less frequently in a neutral corpus compared to a toxic one. Therefore, the log-odds ratio between any normal neutral corpus and a toxic corpus would highlight a list of toxic words. The log-odds ratio 6 https://huggingface.co/datasets/textdetox/multilingual_toxic_lexicon 7 https://docs.python.org/3/library/difflib.html Figure 2: Example of implemented Mask Placement with Linguistic Pattern in our method defines the relative frequency comparison and measures how often or less frequent a word is in one corpus compared to another. A higher log-odds ratio indicates that the word is much more common in the target corpus (e.g., toxic text) than in the reference corpus (e.g., neutral text). Mathematically, the log-odds ratio for a word 𝑤 can be defined as: ⎛ ⎞ 𝑝(𝑤|𝐶toxic ) 1−𝑝(𝑤|𝐶toxic ) ⎠ Log-Odds Ratio = log ⎝ 𝑝(𝑤|𝐶neutral ) (1) 1−𝑝(𝑤|𝐶neutral ) where 𝑝(𝑤 | 𝐶𝑡𝑜𝑥𝑖𝑐 ) and 𝑝(𝑤 | 𝐶𝑛𝑒𝑢𝑡𝑟𝑎𝑙 ) are the occurrence probabilities of the word 𝑤 in toxic and neutral corpora, respectively. In summary, the log-odds ratio helps identify words that are significantly more likely to appear in toxic texts compared to neutral ones, thus aiding in the detection of toxic language. We utilized the development set’s toxic and neutral pairs for this experiment, but it could also have been conducted with any toxic and neutral corpus in the target languages. From the extracted list of words and their log-odds ratios, we selected those with a score above 𝑡2 as toxic words. This threshold was chosen because the log-odds ratio values are in range [0, 1]. We aimed to maintain a balanced value to ensure that we accurately identify toxic words while minimizing false positives and negatives. Additionally, we cleansed our generated toxic lexicon list by filtering out stopwords and words that are less than 3 characters long. This was done to eliminate special characters, symbols, or incomplete random words. The filtering criteria for removing unwanted stopwords was based on general observation. This approach ensured that we effectively masked words likely to be curse words, thereby excluding stopwords and special characters that might have got on to the list of toxic words. 4.3.2. Mask Placement with Linguistic Patterns Languages follow certain grammatical paradigms or linguistic rules that aid in constructing sentences. By observing these rules, we were able to better process the masks in sentences. We found that for any language if curse words appear at the beginning or end of a sentence, they can be simply removed. Additionally, when multiple consecutive masked words were present, they could be combined into a single mask without losing the overall meaning of the sentence. Figure 2 shows an example of our implemented linguistic paradigms. For ease of understanding, the provided example is in English. 4.3.3. Mask Prediction Following the process of identifying and masking toxic words, and implementing linguistic rules, we were left with sentences containing masked toxic words. To handle these, we used the XLM-RoBERTa large model [25], which is pretrained in a self-supervised manner on 2.5TB of filtered CommonCrawl data spanning 100 languages, including all languages featured in the competition. The model employs a Masked Language Modeling (MLM) objective. It randomly masks 15% of the words in the input sentence, processes the entire masked sentence through the model, and predicts the masked words. We chose this model because, unlike traditional recurrent neural networks (RNNs) that process words sequentially or autoregressive models like GPT that internally mask future tokens, XLM-RoBERTa learns a bidirectional representation of the sentence. Using this model, we predicted the top three probable replacements for each mask and generated sentences accordingly. For sentences with multiple masks, this resulted in 3𝑛 possible sentences. 4.3.4. Sentence Similarity From our resultant 3𝑛 sentences generated from the masked predictions, we used a sentence transformer model [26] to generate embeddings for each of the sentences along with their parallel toxic input sentence. The model works in a way that sentences with similar meanings are associated with embeddings that are close in the vector space. Then, the semantic textual similarity between two sentences is computed, and we have sentence pairs annotated together with a score indicating the similarity between them. The model uses a Siamese network structure that was trained using CosineSimilarityLoss [27]. Among all the sentence pairs generated, we chose the one that had the lowest score indicating the resultant sentence closest to the input toxic sentence as our selected output sentence. The code for both Supervised8 and Unsupervised9 pipelines are made available at GitHub. 5. Results The experimental setup consists of setting up the threshold values. For masking toxic words in the unsupervised module, the threshold 𝑡1 is set to 0.8. While the threshold 𝑡2 for identifying toxic words based on their log-odds ratio is set to 0.5. The threshold values are determined based on manual evaluation. In supervised learning, English has 19744 and Russian has 11090 training samples. Our method achieves an average score of 0.315 on the leaderboard’s automatic evaluation securing 22nd position and comparable average results with the mT5 baseline. Notably, we observe exceptional performance in languages such as English, German, Ukrainian, and Arabic, with scores of 0.47,0.41,0.42, and 0.52 respectively. In the manual evaluation conducted via crowdsourcing on a random subsample of 100 texts per language, our method secured the 18th place with an average score of 0.50. Upon observing the results of both automatic and manual evaluation, we found that our proposed approach demonstrated suboptimal performance in Chinese and Spanish and was particularly ineffective for Russian. Despite the loss function indicating convergence and the text being detoxified, manual evaluation for Russian revealed that the generated text lacked meaningfulness. Further exploration of our method revealed that the fine-tuned T5 multi-task text generator model that was used for our method generated smaller tokens resulting in the generation of out-of-vocabulary words. This raises a significant concern that although language models trained on multilingual text generation may exhibit promising scores and reduced loss metrics, verifying their effectiveness for languages outside the researcher’s linguistic proficiency remains challenging. This issue underscores the necessity of incorporating native speakers in the evaluation process to ensure the semantic integrity of the detoxified text. We also found that using BART with ROUGE for English performed much better on the test set than using T5 with EWMA for the Russian language. In general, these findings indicate that unsupervised approaches to multilingual text detoxification using pretrained language models hold promising results despite the lack of parallel training corpora. Table 1 shows training and validation loss of Supervised T5-BART Module. Samples of toxic sentences and their detoxified sentence for all 9 languages involved in the study are given in Figure 3. 8 https://github.com/taimoorkhan-nlp/RuEn-supervised-detoxifier 9 https://github.com/susmita3107/mDetoxifier-Multilingual-unsupervised-text-detoxifier Table 1 Training and validation loss of the supervised models (BART and T5) for English and Russian, respectively. English Russian Training Loss Validation Loss (ROUGE Score) Training Loss Validation Loss (EWMA) 1.422600 1.197713 0.6551 7.6154 1.359800 1.141416 0.6522 7.7473 1.297000 1.114076 0.6881 7.7505 1.389900 1.111269 0.5655 7.7502 1.461900 1.109282 0.6839 7.4104 1.208900 1.140405 0.6881 7.7505 1.377100 1.129589 0.6098 7.9673 1.215600 1.122608 0.6349 7.6759 Figure 3: Sample results of toxic and detoxified text in each of the languages Table 2 Manual and automatic evaluation scores of our proposed approach for individual languages and average of all languages. The evaluation is based on removing toxicity, style transfer accuracy, content preservation and fluency. Evaluation average en es de zh ar hi uk ru am Manual 0.50 0.74 0.20 0.72 0.37 0.61 0.75 0.48 0.00 0.61 Automatic 0.315 0.472 0.356 0.414 0.069 0.425 0.198 0.528 0.090 0.280 6. Conclusion and Future Work In this work, we propose a novel approach that combines both supervised and unsupervised methods for text detoxification across nine languages, including some low-resource ones. Our work forms a part of the CLEF Multilingual TextDetox challenge 2024, achieving an average score of 0.315 on the leaderboard’s automatic evaluation and 0.50 in manual evaluation. While our results are promising, we acknowledge that there is room for improvement. In the future, we aim to explore diverse methodologies, such as leveraging multilingual embedding features to identify linguistic similarities among different languages. Additionally, we intend to experiment with various clustering techniques to investigate potential hierarchical relationships among toxic words. Furthermore, we plan to explore domain adaptation and transfer learning methods, particularly for languages that share similar roots e.g., Italian, Spanish, Portuguese and Latin. We anticipate that models trained on languages with similar linguistic roots might effectively perform on others with comparable linguistic characteristics. References [1] D. Dementieva, D. Moskovskiy, N. Babakov, A. A. Ayele, N. Rizwan, F. Schneider, X. Wang, S. M. Yimam, D. Ustalov, E. Stakovskii, A. Smirnova, A. Elnagar, A. Mukherjee, A. Panchenko, Overview of the multilingual text detoxification task at pan 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR-WS.org, 2024. [2] J. Bevendorff, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag, M. Fröbe, D. Ko- renčić, M. Mayerl, A. Mukherjee, A. Panchenko, M. Potthast, F. Rangel, P. Rosso, A. Smirnova, E. Stamatatos, B. Stein, M. Taulé, D. Ustalov, M. Wiegmann, E. Zangerle, Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024. [3] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the problem of offensive language, in: Proceedings of the international AAAI conference on web and social media, volume 11, 2017, pp. 512–515. [4] S. Poria, E. Cambria, D. Hazarika, P. Vij, A deeper look into sarcastic tweets using deep convolu- tional neural networks, arXiv preprint arXiv:1610.08815 (2016). [5] Z. Zhang, D. Robinson, J. Tepper, Detecting hate speech on twitter using a convolution-gru based deep neural network, in: The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, Springer, 2018, pp. 745–760. [6] L. Zhou, A. Caines, I. Pete, A. Hutchings, Automated hate speech detection and span extraction in underground hacking and extremist forums, Natural Language Engineering 29 (2023) 1247–1274. [7] P. Liu, J. Guberman, L. Hemphill, A. Culotta, Forecasting the presence and intensity of hostility on instagram using linguistic and social features, in: Proceedings of the International AAAI Conference on Web and Social Media, volume 12, 2018. [8] V. Logacheva, D. Dementieva, S. Ustyantsev, D. Moskovskiy, D. Dale, I. Krotova, N. Semenov, A. Panchenko, Paradetox: Detoxification with parallel data, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 6804–6818. [9] D. Dale, A. Voronov, D. Dementieva, V. Logacheva, O. Kozlova, N. Semenov, A. Panchenko, Text detoxification using large pre-trained neural models, arXiv preprint arXiv:2109.08914 (2021). [10] G. Hassan, J. Rabah, P. Madriaza, S. Brouillette-Alarie, E. Borokhovski, D. Pickup, W. Varela, M. Girard, L. Durocher-Corfa, E. Danis, Protocol: Hate online and in traditional media: A systematic review of the evidence for associations or impacts on individuals, audiences, and communities, Campbell systematic reviews 18 (2022) e1245. [11] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [12] J. H. Park, P. Fung, One-step and two-step classification for abusive language detection on twitter, arXiv preprint arXiv:1706.01206 (2017). [13] Y. Khan, W. Ma, S. Vosoughi, Lone pine at semeval-2021 task 5: fine-grained detection of hate speech using bertoxic, arXiv preprint arXiv:2104.03506 (2021). [14] J. Risch, R. Krestel, Toxic comment detection in online discussions, Deep learning-based approaches for sentiment analysis (2020) 85–109. [15] M. Taleb, A. Hamza, M. Zouitni, N. Burmani, S. Lafkiar, N. En-Nahnahi, Detection of toxicity in social media based on natural language processing methods, in: 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), 2022, pp. 1–7. doi:10.1109/ISCV54655.2022. 9806096. [16] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543. [17] J. Pavlopoulos, J. Sorensen, L. Dixon, N. Thain, I. Androutsopoulos, Toxicity detection: Does context really matter?, arXiv preprint arXiv:2006.00998 (2020). [18] L. Laugier, J. Pavlopoulos, J. Sorensen, L. Dixon, Civil rephrases of toxic texts with self-supervised transformers, arXiv preprint arXiv:2102.05456 (2021). [19] Z. Wang, S. Mayhew, D. Roth, et al., Extending multilingual bert to low-resource languages, arXiv preprint arXiv:2004.13640 (2020). [20] M. Artetxe, H. Schwenk, Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond, Transactions of the association for computational linguistics 7 (2019) 597–610. [21] S. Ranathunga, E.-S. A. Lee, M. Prifti Skenduli, R. Shekhar, M. Alam, R. Kaur, Neural machine translation for low-resource languages: A survey, ACM Computing Surveys 55 (2023) 1–37. [22] D. Dementieva, V. Logacheva, I. Nikishina, A. Fenogenova, D. Dale, I. Krotova, N. Semenov, T. Shavrina, A. Panchenko, Russe-2022: Findings of the first russian detoxification shared task based on parallel corpora, Cited by 1 (2022) 114–131. [23] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of machine learning research 21 (2020) 1–67. [24] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, arXiv preprint arXiv:1910.13461 (2019). [25] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, CoRR abs/1911.02116 (2019). URL: http://arxiv.org/abs/1911.02116. arXiv:1911.02116. [26] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, arXiv preprint arXiv:1908.10084 (2019). [27] F. Rahutomo, T. Kitasuka, M. Aritsugi, et al., Semantic cosine similarity, in: The 7th international student conference on advanced science and technology ICAST, volume 4, University of Seoul South Korea, 2012, p. 1.