=Paper=
{{Paper
|id=Vol-3395/T2-13
|storemode=property
|title=Leveraging Sentiment Data for the Detection of Homophobic/Transphobic Content in a Multi-Task, Multi-Lingual Setting Using Transformers
|pdfUrl=https://ceur-ws.org/Vol-3395/T2-13.pdf
|volume=Vol-3395
|authors=Filip Nilsson,Sana Sabah Al-Azzawi,György Kovács
|dblpUrl=https://dblp.org/rec/conf/fire/NilssonA022
}}
==Leveraging Sentiment Data for the Detection of Homophobic/Transphobic Content in a Multi-Task, Multi-Lingual Setting Using Transformers==
Leveraging Sentiment Data for the Detection of Homophobic/Transphobic Content in a Multi-Task, Multi-Lingual Setting Using Transformers Filip Nilsson1 , Sana Sabah Al-Azzawi1 and György Kovács1 1 EISLAB Machine Learning, Luleå University of Technology, 977 54 Luleå, Sweden Abstract Hateful content is published and spread on social media at an increasing rate, harming the user experience. In addition, hateful content targeting particular, marginalized/vulnerable groups (e.g. homophobic/trans- phobic content) can cause even more harm to members of said groups. Hence, detecting hateful content is crucial, regardless of its origin, or the language used. The large variety of (often underresourced) languages used, however, makes this task daunting, especially as many users use code-mixing in their messages. To help overcome these difficulties, the approach we present here uses a multi-language framework. And to further mitigate the scarcity of labelled data, it also leverages data from the related task of sentiment-analysis to improve the detection of homophobic/transphobic content. We evaluated our system by participating in a sentiment analysis and hate speech detection challenge. Results show that our multi-task model outperforms its single-task counterpart (on average, by 24%) on the detection of homophobic/transphobic content. Moreover, the results achieved in detecting homophobic/transphobic content put our system in 1st or 2nd place for three out of four languages examined. Keywords Multi-Task, Multi-Language Learning, Hateful Language, Sentiment Analysis, Detecting Homophobic/- Transphobic Language. 1. Introduction The increasing use of social media such as Twitter and Youtube has escalated the exploitation of these platforms to propagate violence [1]. This violence can take the form of hateful, offensive, and abusive language causing harm [2, 3]. To help preventing this harm, social media has been analyzed using various methods designed to detect offensive language, or more particularly, detect hateful language [4], and homophobic/transphobic content [5]. Homophobic/Transphobic content is a type of hateful language intending to harm LGBT+ people. Unfortunately, the shortage of labeled data has limited research in this area, especially in low resource languages [6]. One approach to overcome this problem is leveraging additional data for improving the detection of hateful language. Among others, Kovács et al. [7, 8] examined this option by (among other methods) using additional datasets created for the same task. In this paper, we extend this idea by leveraging datasets from related tasks, as well as datasets in different languages to improve the detection of homophobic/transphobic content in a multi-task, multi-language setting. FIRE 2022: Forum for Information Retrieval Evaluation, Decemeber 09–13, 2022, Kolkata, India Envelope-Open filnil-8@student.ltu.se (F. Nilsson); sana.al-azzawi@ltu.se (S. S. Al-Azzawi); gyorgy.kovacs@ltu.se (G. Kovács) GLOBE https://github.com/flippe3/fire_2022 (F. Nilsson) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) For these experiments, we use the ”Sentiment Analysis and Homophobia detection of YouTube comments in Code-Mixed Dravidian Languages” (hereinafter DravidianCodeMix) challenge at FIRE [9] which presents two shared tasks. One task for sentiment analysis (Task A), and another one for the detection of homophobic/transphobic content (Task B). The two tasks combined provided seven datasets with altogether four different languages. We participated in both tasks and used multi-task learning to fine-tune an XLM-RoBERTA (XLM-R) pre-trained language model to deal with multilingualism [10]. In this paper, we describe our approach and the results it attained. First, we discuss the related literature in Section 2. Then, in Section 3 we describe the tasks and datases of the challenge. This is followed by the description of our methods in Section 4, after which we present our experiments and results in Section 5. In Section 6 we analyze our experiments, then share our conclusions and plans for future work in Section 7. 2. Related Work Here, we discuss part of the related literature providing relevant background to the Dravidi- anCodeMix challenge, including work on detection of hateful language in general, and homo- phobic/transphobic language in particular. We also discuss different approaches to multi-task learning, its use in Natural Language Processing (NLP) and for the detection of hateful language. 2.1. Hateful Language Hateful language is any insult directed against a person or group based on their protected category that aims to cause damage or stir hatred [11]. Hateful posts/comments incur the risk of prompting a shift towards violence. Moreover, this content can potentially cause a harmful emotional effect to its readers. For these reasons, automatic hate speech identification has attracted a lot of attention, and many approaches have been investigated for the detection of hate speech, as well as other relevant areas [12]. Existing methods mainly deal with hateful language detection as a classification task. These approaches can be categorized into two groups, namely traditional machine learning methods such as SVM [7, 11], logistic regression [13], random forest [11], gradient boosting decision tree models [14] and naive Bayes [11]. The other main category is that of deep learning-based methods, which can be further partitioned into deep learning architecture, word embedding based methods, and transformer-based methods. Transformer-based methods utilize pre-trained transformer models (e.g. BERT, ELECTRA, T5), and fine-tune them on datasets annotated for hateful language detection to detect harmful remarks in social media posts. These methods show remarkable performance on harm identification across different languages [15, 16, 17, 18, 19]. Homophobic/Transphobic comments are usually categorized as a type of hateful language di- rected toward LGBT+ individuals. This phenomenon has been a growing concern. Chakravarthi et al. [9] collected and created a dataset in 2021 for Homophobic classification. The dataset is a collection of 15,141 YouTube multilingual comments on social media. In addition, they made detection experiments using several classical ML and DL models as baselines, and in 2022 they organized a shared task based on the data collected, at an ACL workshop [5], to promote research on identifying homophobic/transphobic content. 2.2. Multi-Task Learning One method to combat data scarcity in supervised learning is multi-task learning (MTL). In MTL models are simultaneously trained on multiple related tasks to improve their performance and generalization ability. MTL has shown promising results in various areas, including computer vision, reinforcement learning, speech processing [20], and NLP, where MTL has been studied in various levels of relatedness, goals, and features [21]. One prominent example of the application of MTL in NLP is the T5 [22] text-to-text trans- former model. This model was primarily trained using a combination of supervised and unsupervised learning, and an MTL approach, which was shown to have a positive influence. The authors also experimented with various types of mixing, namely examples-proportional, temperature-scaled and equal mixing. Results indicated that examples-proportional mixing leads to the best performance, while equal mixing can degrade the performance due to the model overfitting on low-resource tasks. Other researchers working on MTL in NLP [23] demonstrated that joint fine-tuning a model on multiple languages could bring substantial improvements to the performance of a universal language encoder. For this, they used a tree-like structure with 16 heads, one for each language. The authors found that it was only in rare cases when this joint MTL training hurt the performance of their model. MTL has also been used in the context of detecting hate speech [24]. Here, the authors used a multi-task model to detect hate speech in Spanish using related tasks of polarity and emotion classification to improve their model. They showed that their MTL system with task-specific output heads outperformed its single-task counterpart and achieved state-of-the-art results. 3. Data The shared challenge includes two main tasks (Task A and Task B), both coupled with annotated datasets comprised of Youtube comments in various languages. Task A being a sentiment analysis problem [25], where the goal is to classify each comment into one of five categories: Positive, Negative, Mixed feelings, Unknown state, and comment not in the target language. The detailed statistics and the split of the datasets are shown in Table 1. As can be seen in Table 1, the data is relatively imbalanced, both in terms of the labels, and languages (the Tamil dataset having more examples than the other two languages put together). Moreover, we can also see that the different partitions have largely different class label distributions. Table 1 Statistics of the data sets distributed for Task A Languages Labels Tamil Dataset Malayalam Dataset Kannada Dataset Train Dev Test Train Dev Test Train Dev Test Positive 20070 2257 73 6421 706 780 2823 321 374 Negative 4271 480 338 2105 237 258 1188 139 157 Mixed Feelings 4020 438 101 926 102 134 574 52 65 Unknown state 20070 611 137 5279 580 643 711 69 62 Text not in the target language 1667 176 0 1157 141 147 916 110 110 Table 2 Statistics of the data sets distributed for Task B Languages Code-mixed Labels Tamil Dataset Malayalam Dataset English Dataset Tamil-English Dataset Train Dev Test Train Dev Test Train Dev Test Train Dev Test Non-Anti-LGBT 2022 526 352 2434 692 971 3001 732 924 3438 862 1085 Homophobic 485 103 271 491 133 182 157 58 61 311 66 88 Transphobic 155 37 26 189 41 60 6 2 5 112 38 34 The main objective of Task B was to identify whether a comment was Transphobic, Ho- mophobic, or Non-Anti-LGBT (Safe) [9]. The datasets provided for this task consist of Tamil, English, Malayalam, and the remaining code-mixed Tamil-English. The detailed statistics and the split of the datasets are shown in Table 2. As Table 2 shows, the Non-Anti-LGBT (safe) class outweighs the other two labels. This is most prominent in the English dataset, where transphobic messages represent less than 0.3% of the full data. Some examples of the comments are also shown in Table 3. As can be seen from the table, two kinds of code-mixing is present in the data set. One, where different languages are mixed (e.g. the comment labelled as ”mixed feelings” in row 6). We can see examples of the other type of code-mixing as well, where native and Roman scripts are mixed (e.g. the Non-LGBT comment in row 8). Table 3 Sample comments from some of the datasets provided for the challenge (a) Positive sentiment (b) Negative sentiment Figure 1: The result of applying a sentiment analysis model trained on Tamil sentiment data on Tamil hate data. We hypothesized that the two tasks of the challenge are closely related. That is, negative sentiment would be more strongly associated with homophobic/transphobic comments than with safe ones, and the opposite would be true for the positive sentiment. To verify this, we trained a classifier for the Tamil dataset in Task-A (sentiment analysis), and applied it on the Tamil dataset in Task-B. Results of these experiments are shown in Figure 1b. As can be seen here, the negative sentiment on average was higher in homophobic and transphobic content than in safe comments. While for positive sentiment, on average higher scores were attained for safe comments than for transphobic ones. Although this was not the case for homophobic comments, the results partially supporting our hypothesis encouraged us to further examine the relation between the two tasks. 4. Methodology One of our targets during our experiments was to examine how the task of identifying hateful comments can benefit from the availability of a sentiment analysis dataset. For this, we decided to work in a multi-task paradigm to benefit from the connection between the two tasks (supported by the sentiment scores attained using the data from the task of identifying hateful language in Figure 1b). For this, as our model, we chose to use a multi-lingual language model pre-trained on 100 different languages that included all languages in our datasets. 4.1. Preprocessing To handle the irregularities often found in Youtube comments, we applied a preprocessing step in our pipeline. First, we used the Hugging Face normalizer to remove blank spaces and emojis. Then, we used the SentencePiece [26] tokenizer. SentecePiece is a language-independent subword tokenizer we chose to use since we are dealing with different types of code-mixing, a domain where SentencePiece has previously shown promising results [5]. Figure 2: Pipeline of our seven-headed multi-task model 4.2. Model We used the XLM-RoBERTA (XLM-R) [10] multilingual sentence-transformer which is trained using an XLM-R model as a student model and an SBERT [27] model as the teacher model. 4.3. Multi-Task Training From our study of the data we hypothesized that hateful content would in general have a higher degree of negativity (and a lower degree of positivity). Therefore a multi-task model should be adequate to help improve the result for Task B. To train the model, we split the output heads into seven different RoBERTA [28] output heads shown in Figure 2, similar to the architecture used by Unicoder [23] that produced excellent results. This enabled us to fine-tune the language model using both tasks and all seven datasets simultaneously. An essential part of MTL is the technique used to mix the tasks. In our case, due to the imbalance among datasets and labels, equal mixing would have incurred the risk of missing vital data points, negatively affecting the model. Thus, following the findings of [22] we used examples-proportional mixing, which means sampling from tasks proportionally to their dataset size. In our MTL training we experimented using different sized output layers. We did not find any significant improvements when adding more layers, which can also be an indication of the closely related nature of the two tasks. The final model we used for training uses a RoBERTa classification head with two dense layers. The training of our models was done using 4 epochs with a learning rate of 3e-5. Furthermore, we used a linear scheduler and an AdamW optimizer. These experiments were trained on a shared DGX-1 cluster using 2 x 32GB Nvidia V100 GPUs. Table 4 Macro F1 scores for Task A [sentiment analysis] Language Architecture Languages Output Heads Tamil Malayalam Kannada Dev Test Dev Test Dev Test Multi-task All 7 0.50 0.20 0.66 0.66 0.55 0.52 Multi-task All 2 0.48 0.19 0.63 0.60 0.51 0.50 Single-task One 1 0.51 0.20 0.67 0.66 0.54 0.52 LS Multi-task Tamil & English 4 0.51 0.20 - - - - LS Multi-task Malayalam 2 - - 0.63 0.71 - - Best competing team - - - 0.27 - 0.66 - 0.55 5. Experiments and results We performed four experiments to evaluate our proposed multi-task system, each with a different architecture. In this section will present these architectures, as well as the results of these experiments. The four different architectures we were using in our experiments were as follows. 1) Single-task architecture: here, a separate model with only one output head was trained for each individual language and task. 2) Multi-task learning with seven output heads: here, a joint model was trained for all tasks and languages, equipped with one output head for each language and task. That is, the model had three output heads for the sentiment analysis (corresponding to the three languages), and had four more output heads (one for each language in Task B). 3) Multi-task learning with two output heads: in this approach, similar to the previous, all tasks and languages were used to train the same model. Here, however, unlike in the previous case, our goal was to train only one output head for each task. Thus one output head was trained to predict the labels of Task A and Task B respectively. This is similar to the arcitecture used by the Spanish hate-speech detection [24]. 4) Language-specific multi-task learning: similar to the two previous multi-task systems, but here we selected a specific language and only used datasets containing that language. Results of our experiments on Task A are summarized in Table 4. As Table 4 shows, although the single-language multi-task model attained markedly higher results than that reported by the winning team, in most cases the use of multi-task architecture did not lead to marked improvements compared to its single-task counterpart. Our goal with the multi-task architecture, however, was not to improve the performance of the sentiment analysis model, but rather to improve the detection of hateful speech. Results of our experiments on Task B are summarized in Table 5. As Table 5 shows, we attained the best results on the test (and dev) set for all cases using a multi-language multi-task architecture, or a single-language multi-task architecture. Based on the results attained on the dev set (where available at the deadline), we chose the seven-headed multi-task multi-language model for our final submission. Results attained in the official competition [29] by this model are summarized in table Table 6. The table shows that this model performed relatively well in both tasks. The rankings achieved, however, were markedly better for Task B, our main target. Thus, for the detection of hateful language, the multi-task multi-language model beyond attaining an improved performance compared to its single-task counterpart, also attained a competitive performance in the challenge. Table 5 Macro F1 scores for Task B [detection of homophobic/transphobic language] Language Architecture Languages Output Heads Tamil Malayalam English Tamil-English Dev Test Dev Test Dev Test Dev Test Multi-task All 7 0.69 0.32 0.66 0.75 0.49 0.49 0.64 0.55 Multi-task All 2 0.68 0.31 0.71 0.76 0.51 0.49 0.60 0.57 Single-task One 1 0.50 0.27 0.52 0.52 0.48 0.45 0.47 0.48 Multi-task Tamil & English 4 0.67 0.31 - - 0.49 0.47 0.61 0.59 Multi-task Malayalam 2 - - 0.72 0.65 - - - - Best competing team - - - 0.37 - 0.97 - 0.55 - 0.49 Table 6 Our official macro F1 scores and rankings across the two tasks tasks. Task Language Precision Recall Macro F1 Rank Tamil 0.15 0.13 0.13 7 Task A Malayalam 0.66 0.62 0.64 2 Kannada 0.50 0.49 0.48 5 Tamil - - 0.33 2 Tamil-English - - 0.55 1 Task B Malayalam - - 0.70 6 English - - 0.49 2 6. Analysis First, we examined our sentiment output heads when probed with hateful data to analyze further whether our system is viable. In Figure 3a and Figure 3b we show the logits when probed with hateful data through the matching sentiment head. In Figure 3a we show that the safe quartile is greater than the homophobic and transphobic quartile. This suggests that the model learned an expected feature (i.e. safe posts having more positivity, and less negativity than hateful ones). (a) Positive sentiment (b) Negative sentiment Figure 3: Sentiments of probing our sentiment heads with hateful data (a) English-Tamil (b) Tamil (c) English (d) Malayalam Figure 4: Language specific confusion matrices for hateful language We also examined the confusion matrix, shown in Figure 4. This was an important tool of analysis, as classifying homophobic comments as transphobic might not be a large problem, but classifying either as safe is problematic. In Figure 4c we can see that five transphobic comments got misclassified as safe. However, this is most likely due to the training data since there were only six labels of transphobic data (see Table 2). 7. Conclusion and future work We introduced a pipeline that fine-tunes a multi-lingual transformer using multi-task learning. The pipeline uses sentiment data to improve homophobic/transphobic detection. Our experi- ments show that the multi-task model outperforms the single-task model and that language- specific training can improve the accuracy further. We have also demonstrated that the sentiment output heads of our model identify hateful content as more negative than safe content. Future work could focus on studying solely sentiment labels relevant for hateful speech. One could also look at other types of task-mixing, such as temporal-scaled mixing. Different early stopping methods should also be considered. Furthermore, data augmentation methods should also be considered, to counteract the problem of data imbalance. Finally, one can experiment with various tasks with different levels of relatedness and languages with varied similarities. 8. Acknowledgments The work presented here was partially supported by Vinnova, in the project Language models for Swedish authorities (Språkmodeller för svenska myndigheter) ref. no.: 2019-02996 References [1] A. A. Siegel, Online Hate Speech, SSRC Anxieties of Democracy, Cambridge University Press, 2020, p. 56–88. [2] I. Bigoulaeva, V. Hangya, A. Fraser, Cross-lingual transfer learning for hate speech detection, in: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, 2021, pp. 15–25. [3] K. Gelber, L. McNamara, Evidencing the harms of hate speech, Social Identities 22 (2016) 324–341. [4] A. Schmidt, M. Wiegand, A survey on hate speech detection using natural language processing, in: Proceedings of the fifth international workshop on natural language processing for social media, 2017, pp. 1–10. [5] B. R. Chakravarthi, R. Priyadharshini, T. Durairaj, J. McCrae, P. Buitelaar, P. Kumaresan, R. Ponnusamy, Overview of the shared task on homophobia and transphobia detection in social media comments, in: Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 369–377. URL: https://aclanthology.org/2022.ltedi-1.57. doi:1 0 . 1 8 6 5 3 / v 1 / 2022.ltedi- 1.57. [6] M. Singh, P. Motlicek, Idiap submission@ lt-edi-acl2022: Homophobia/transphobia detec- tion in social media comments, in: Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, 2022, pp. 356–361. [7] G. Kovács, P. Alonso, R. Saini, Challenges of hate speech detection in social media, SN Computer Science 2 (2021) 1–15. doi:1 0 . 1 0 0 7 / s 4 2 9 7 9 - 0 2 1 - 0 0 4 5 7 - 3 . [8] G. Kovács, P. Alonso, R. Saini, M. Liwicki, Leveraging external resources for offensive content detection in social media, AI Commun. 35 (2022) 87–109. URL: https://doi.org/10. 3233/AIC-210138. doi:1 0 . 3 2 3 3 / A I C - 2 1 0 1 3 8 . [9] B. R. Chakravarthi, R. Priyadharshini, R. Ponnusamy, P. K. Kumaresan, K. Sampath, D. Then- mozhi, S. Thangasamy, R. Nallathambi, J. P. McCrae, Dataset for identification of homopho- bia and transophobia in multilingual youtube comments, arXiv preprint arXiv:2109.00227 (2021). [10] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, 2019. URL: https://arxiv.org/abs/1911.02116. doi:1 0 . 4 8 5 5 0 / A R X I V . 1 9 1 1 . 0 2 1 1 6 . [11] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the problem of offensive language, in: Proceedings of the international AAAI conference on web and social media, volume 11, 2017, pp. 512–515. [12] S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, O. Frieder, Hate speech detection: Challenges and solutions, PLOS ONE 14 (2019) 1–16. URL: https://doi.org/10.1371/journal. pone.0221152. doi:1 0 . 1 3 7 1 / j o u r n a l . p o n e . 0 2 2 1 1 5 2 . [13] Z. Waseem, D. Hovy, Hateful symbols or hateful people? predictive features for hate speech detection on twitter, in: Proceedings of the NAACL student research workshop, 2016, pp. 88–93. [14] E. Katona, J. Buda, F. Bolonyai, Using n-grams and statistical features to identify hate speech spreaders on twitter., in: CLEF (Working Notes), 2021, pp. 2025–2034. [15] S. S. Sabry, T. Adewumi, N. Abid, G. Kovács, F. Liwicki, M. Liwicki, Hat5: Hate language identification using text-to-text transfer transformer, arXiv preprint arXiv:2202.05690 (2022). [16] T. Adewumi, S. S. Sabry, N. Abid, F. Liwicki, M. Liwicki, T5 for hate speech, augmented data and ensemble, arXiv preprint arXiv:2210.05480 (2022). [17] J. S. Malik, G. Pang, A. v. d. Hengel, Deep learning for hate speech detection: A comparative study, arXiv preprint arXiv:2202.09517 (2022). [18] P. Alonso, R. Saini, G. Kovács, Hate speech detection using transformer ensembles on the hasoc dataset, in: International conference on speech and computer, Springer, 2020, pp. 13–21. [19] E. Lavergne, R. Saini, G. Kovács, K. Murphy, Thenorth@ haspeede 2: Bert-based language model fine-tuning for italian hate speech detection, 2020. [20] Y. Zhang, Q. Yang, An overview of multi-task learning, National Science Re- view 5 (2017) 30–43. URL: https://doi.org/10.1093/nsr/nwx105. doi:1 0 . 1 0 9 3 / n s r / n w x 1 0 5 . arXiv:https://academic.oup.com/nsr/article-pdf/5/1/30/31567358/nwx105.pdf. [21] S. Chen, Y. Zhang, Q. Yang, Multi-task learning in natural language processing: An overview, 2021. URL: https://arxiv.org/abs/2109.09138. doi:1 0 . 4 8 5 5 0 / A R X I V . 2 1 0 9 . 0 9 1 3 8 . [22] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, 2019. URL: https://arxiv.org/abs/1910.10683. doi:1 0 . 4 8 5 5 0 / A R X I V . 1 9 1 0 . 1 0 6 8 3 . [23] H. Huang, Y. Liang, N. Duan, M. Gong, L. Shou, D. Jiang, M. Zhou, Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 2485–2494. URL: https://aclanthology.org/D19-1252. doi:1 0 . 1 8 6 5 3 / v 1 / D 1 9 - 1 2 5 2 . [24] F. M. Plaza-Del-Arco, M. D. Molina-González, L. A. Ureña-López, M. T. Martín-Valdivia, A multi-task learning approach to hate speech detection leveraging sentiment analysis, IEEE Access 9 (2021) 112478–112489. doi:1 0 . 1 1 0 9 / A C C E S S . 2 0 2 1 . 3 1 0 3 6 9 7 . [25] B. R. Chakravarthi, R. Priyadharshini, V. Muralidaran, N. Jose, S. Suryawanshi, E. Sherly, J. P. McCrae, Dravidiancodemix: sentiment analysis and offensive language identification dataset for dravidian languages in code-mixed text, Language Resources and Evalua- tion 56 (2022) 765–806. URL: https://doi.org/10.1007/s10579-022-09583-7. doi:1 0 . 1 0 0 7 / s10579- 022- 09583- 7. [26] T. Kudo, J. Richardson, SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing, in: Proceedings of the 2018 Con- ference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 66–71. URL: https://aclanthology.org/D18-2012. doi:1 0 . 1 8 6 5 3 / v 1 / D 1 8 - 2 0 1 2 . [27] N. Reimers, I. Gurevych, Making monolingual sentence embeddings multilingual using knowledge distillation, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2020. URL: https://arxiv.org/abs/2004.09813. [28] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, 2019. URL: https: //arxiv.org/abs/1907.11692. doi:1 0 . 4 8 5 5 0 / A R X I V . 1 9 0 7 . 1 1 6 9 2 . [29] K. Shumugavadivel, M. Subramanian, P. K. Kumaresan, B. R. Chakravarthi, B. B, S. Chin- naudayar Navaneethakrishnan, L. S.K, T. Mandl, R. Ponnusamy, V. Palanikumar, M. Balaji J, Overview of the Shared Task on Sentiment Analysis and Homophobia Detection of YouTube Comments in Code-Mixed Dravidian Languages, in: Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, CEUR, 2022.