Multilingual Hate Speech Detection Using Ensemble of Transformer Models Md Saroar Jahan1,∗,† , Fadi Hassan1,† , Walid Aransa1,† and Abdessalam Bouchekif1,† 1 Huawei Finland Research Center Abstract The classification of hate speech and offensive language presents significant challenges, primarily due to the scarcity of low-resource datasets and the absence of pre-trained models. This paper offers a comprehensive overview of offensive language identification results in the context of HASOC-2023 across various languages and tasks, including Sinhala and Gujarati, Bengali, Assamese, and Bodo, and Hateful span detection. To address these challenges, we harnessed the power of BERT-based models, leveraging resources such as XLM-RoBERTa-large, l3-cube, BanglaHateBert, and BenglaBERT. Our research findings yielded promising results, notably showcasing the superior performance of XLM-RoBERTa-large over monolingual models in the majority of cases. For Task 3, SpanBERT performed outstandingly. Notably, our team FiRC-NLP contributions were acknowledged with top-ranking achievements, securing the first position in Task 1, and Task 3, while clinching the second position in Task 4. Keywords Hateful Span Detection, Conversational Hate Detection, SpanBERT 1. Introduction Social media is a widely popular and convenient platform for open expression and online communication with others. Unfortunately, it also provides the means for distributing abusive and aggressive content such as sexism, racism, politics, cyberbullying, and blackmailing. Nockleby [1] stated that ”hate speech disparages a person or group based on some characteristics such as race, color, and ethnicity”. Addressing offensive language on social media is now a major challenge. Various shared tasks and data-sharing initiatives within the research community aim to motivate researchers to develop innovative solutions for detecting abusive content. Among the initiatives, HASOC has gained significant popularity, with its previous editions: HASOC-2019 [2], HASOC-2020[3], HASOC-2021[4] and HASOC-2022[5]. These editions focus on Hate speech and offensive language identification in English, German, and Hindi. SemEval is another noteworthy initiative. SemEval-2019 [6] focuses on the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. SemEval- 2020 [7] extends its scope to include Arabic, English, Danish, Greek, and Turkish content. In SemEval-2023 [8], the focus is on detecting and identifying comments and tweets containing Forum for Information Retrieval Evaluation, December 15–18, 2023, India ∗ Corresponding author. † These authors contributed equally. Envelope-Open saroar.jahan@huawei.com (M. S. Jahan); fadi.hassan@huawei.com (F. Hassan); walid.aransa@huawei.com (W. Aransa); abdessalam.bouchekif@huawei.com (A. Bouchekif) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings sexist expressions. Additionally, other shared tasks are proposed, such as GermEval [9] for the German language, EVALITA[10] for Italian languages, and OSACT [11] for Arabic content, all of which contribute to this important area of research.” The best models developed for including models such as Roberta [12], DeBERTa [13], ALBERT [14] and XLM-RoBERTa [15]. In SemEval 2023 task 10-A, the top-performing models [16] and [17] are based on DeBERTa. The best performing models in SemEval 2022 task 12-A [18] used an ensemble of ALBERT models of different sizes, while the second-ranked team [19] used Roberta-base and XLM-Roberta. Monolingual Transformers give better results when addressing challenges related to low-resource languages, compared to Multilingual. The winning team in SemEval-2020 Task 12-A for Arabic [20] and Danish [21] languages achieved highest performance by using AraBERT [22] and Nordic BERT 1 , respectively. Hate speech detection becomes more challenging when social posts are written in a Code-Mixed (CM) language. Code-mixing, the practice of blending words from two languages within a single sentence, is becoming increasingly common in various bilingual communities, which renders the automatic making detection task more challenging [23, 24]. In HASOC-2022, three tasks were hosted: Task 1 and Task 2 involved binary and multi-class classification for both German and code-mixed languages, while Task 3 focused on identifying offensive language in Marathi. The highest performance in Task 1 [25] was achieved using Google-MuRIL 2 (BERT model pre-trained on 17 Indian languages). HASOC 2023 introduced Task 1 and Task 4, focusing on the detection of hate speech, offensive content, and profanity. Task 3 centered on detecting hate speech spans within social media posts. We actively engaged in this competition, taking on Task 1 for languages including Bengali, Gujarati, Sinhala, Assamese, and Bodo, additionally, we took on Task 3 which involved English hate speech span detection. To accomplish these tasks, we made use of the HASOC-2023 shared dataset for both training and validation purposes without any external data. Our strategy predominantly relied on cutting-edge transformer models to tackle these challenges. This paper is structured as follows: Section 2 presents a detailed description of the tasks and datasets, Section 3, we provides an in-depth look at our methodology and model architecture. Lastly, the conclusion section offers definitive statements and delineates potential directions for future research. 2. Task Description This section presents the task descriptions for HASOC 2023 [26] as follows: Task 1 and 4 focus on identifying hate speech, offensive language, and profanity in different languages using natural language processing techniques [27, 28, 29, 30, 31, 32, 33]. These task mainly involves classifying tweets into two categories: Hate and Offensive (HOF) or Non-Hate and Offensive (NOT). • Task 1A: deals with identifying hate and offensive content in Sinhala, a low-resource Indo-Aryan language spoken in Sri Lanka. 1 https://github.com/certainlyio/nordic_bert 2 https://huggingface.co/google/muril-base-cased • Task 1B: focuses on identifying hate and offensive content in Gujarati, another low- resource Indo-Aryan language spoken by approximately 50 million people in India. The training set for this task consists of around 200 tweets. • Task 4: aims to detect hate speech in Bengali, Bodo, and Assamese languages. Data is primarily collected from Twitter, Facebook, or YouTube comments. Task 3 aims to detect the various hateful spans within a sentence already considered hateful [34]. The input texts are all in English. The detection of hateful spans is achieved by mapping this into a sequence labeling problem. For every token of the sequences, human annotators have manually annotated the start and end of a hateful span. This is achieved by the BIO notation tagging, where ’B’ represents the beginning of the hate span,’ I’ forms the continuation of a hate span, and ’O’ represents the non-hate tag. Table 1 Example of datasets of task 1 and 4 Sentence Translation Label Task, Train and Language Test size "বােলর িশক্ষা মন্ত্রী" Stupid Education Minister HOF Task4, 1281, 320 Bengali "কুকুৰ বু িল িকয় ৈকেছ অসভ্য Why are you calling me a dog, HOF Task4, 4036, 1009 ক'ৰবাৰ, লাজ নাই" rude somewhere, no shame Asamee "मोसौ खुगायाव एमफौ नांबाय Both are drunkards f***rs HOF Task 4 1679, 420 नोंनाव सैम" Bodo 3. Methodology This section offers a comprehensive overview encompassing the model architecture description and the strategies employed to address each task. Due to the similarities between Task 1 and Task 4, we have consolidated them into a single section, while Task 3 are separately described. 3.1. Task 1 and 4 Model Architecture For Task 1 and Task 4, we adopted two main strategies: 1. Utilizing Different BERT Models: We conducted experiments with both multilingual and monolingual BERT models. 2. Augmenting Training Data: Our second approach involved enhancing the training data through automatic annotation. We assessed several models, including multilingual ones such as XLM-RoBERTa-large and IndicBERT, and monolingual models like L3-cube, Bangla BERT, and Bangla Hate BERT. Following our experiments, we selected XLM-RoBERTa-large as the baseline model due to its superior performance when compared to all the monolingual models. This performance difference may be attributed to the age of some monolingual models, such as Bangla Hate BERT, which is considerably older compared to XLM-RoBERTa-large. However, for the Bangla L3-cube monolingual model, it exhibited a slightly better performance by +0.06 F1 score compared to XLM-RoBERTa-large. Nevertheless, since XLM-RoBERTa-large outperformed most monolingual models in most cases, we opted to choose it as the baseline model. Table 2 Task 1 F1 Scores for Different Models by Language Language Model F1 Score Indic-Bert (trained on 12 major Indian languages) 73.4 Gujarati L3-cube Gujarati (monolingual) 79.1 XLM-RoBERTa-large 81.6 Indic-Bert (trained on 12 major Indian languages) 74.5 Sinhala L3-cube Sinhala (monolingual) 78.6 XLM-RoBERTa-large 80.4 Indic-Bert (trained on 12 major Indian languages) 70.5 L3-cube Sinhala (monolingual) 75.6 Bengali XLM-RoBERTa-large 75.1 banglaBERT 68.1 BanglaHateBERT 65.5 Figure 1: Illustration of the Approach to Enhance Model Performance: Incorporating Annotated Test Data into Training Data for Sinhala (Similar approach tested with public data). To further enhance our model’s performance, we pursued a second strategy, which involved expanding our training dataset. However, due to a lack of suitable datasets for most of these languages, we hypothesized that incorporating automatically annotated test data into the training data could improve model learning. To implement this, we initially trained the model using 90% of the training data and 10% of the evaluation data. We then determined the optimal thresholds during evaluation and applied upper and lower thresholds to automatically annotate part of the test data. For example, we used a 0.90 upper threshold and a 0.20 lower threshold. After automatically annotating that portion of test data with these thresholds, we retrained the model by adding this part of test data to the training data and observed a 3% improvement in model performance. This hypothesis was further tested with external public data, where automatic annotations were applied using these upper and lower thresholds, resulting in a 1-2% improvement. Additionally, we also employed the ensemble of 5 models, which contributed to a 0.4% increase in F1 scores (see Table 3). Table 3 F1 Scores for Different Models by Language after Adding part of Test Data to Training and Using Ensemble Model Language Model F1 Score XLM-RoBERTa-large 81.6 Gujarati XLM-RoBERTa-large 5 ensemble model 82.0 XLM-RoBERTa-large (200 train data + adding filtered test data 401 sample) 84.8 XLM-RoBERTa-large 80.4 Sinhala XLM-RoBERTa-large 5 ensemble model 80.9 XLM-RoBERTa-large (7.5k train data + adding filtered test data 600 sample) 83.8 3.2. Task 3 Model Architecture In Task 3, the goal is to find all the hateful spans. A hateful span is a group of words that together express the hatred in the sentence. In this task, the provided data size: 1936 training samples, 485 validation samples, and 606 test samples, and the specific labels used for this task, such as ”B-HateSpan” to denote the first token in a hateful span and ”I-HateSpan” to indicate tokens inside a hateful span. Additionally, the section includes an analysis of the model’s performance and an in-depth explanation of the outcomes. The architecture of our best submitted model for Task 3 employs a teacher-student framework and utilizes the SpanBERT-base-cased model along with Conditional Random Fields (CRF) for sequence tagging. The approach can be summarized as follows: • Teacher Model: Ensemble of 𝑘 SpanBERT-base-cased models, each combined with CRF. • Student Model: A single SpanBERT-base-cased model, also integrated with CRF. The student model is distilled from the teacher model using a specific formula: ℒloss = (1 − 𝛼) ⋅ CE(𝑠𝑡𝑢𝑑𝑒𝑛𝑡_𝑠𝑐𝑜𝑟𝑒, 𝑡𝑎𝑟𝑔𝑒𝑡) + 𝛼 ⋅ MSE(𝑠𝑡𝑢𝑑𝑒𝑛𝑡_𝑙𝑜𝑔𝑖𝑡𝑠, 𝑡𝑒𝑎𝑐ℎ𝑒𝑟_𝑙𝑜𝑔𝑖𝑡𝑠) (1) 1. Model Comparison: Table 5 provides a comparison of different base models with varying configurations, including casing, k-fold cross-validation, and tagging schemes (BIO and IO). It is observed that SpanBERT-large with lower casing and a 5-fold cross- validation scheme achieved the highest private score of 62.322, indicating its effectiveness in identifying hateful spans. 2. Impact of Casing: The casing of the model input, whether lower case or true case, seems to affect the model’s performance. Lower casing generally performs better, as indicated by the higher private and public scores in several configurations. Table 4 Evaluation of hate span detection performance utilizing various models, with the submitted model highlighted in bold. Base Model Casing K-fold Tagging Private Score Public Score SpanBERT-large Lower case 5 BIO 62.322 55.052 SpanBERT-base True case - BIO 41.528 33.755 SpanBERT-base True case 5 BIO 55.547 48.566 SpanBERT-base Lower case 10 BIO 57.541 51.013 SpanBERT-base Lower case 5 BIO 57.605 53.378 SpanBERT-base True case - BIO 55.177 45.602 DeBERTa-v3-xlarge True case - BIO 43.102 38.249 DeBERTa-v3-large True case - BIO 47.433 39.222 DeBERTa-v3-large True case - IO 15.426 12.446 3. Tagging Scheme: The choice of tagging scheme (BIO vs. IO) also influences performance. Models using the BIO tagging scheme tend to yield better results, as seen in higher private and public scores. 4. Ensemble vs. Single Model: The ensemble approach using multiple SpanBERT-base- cased models as teachers seems to provide valuable knowledge transfer to the student model, resulting in improved performance. 5. Distillation Effect: The use of distillation with an 𝛼 value of 0.95 for transferring knowledge from teachers to the student model helps enhance performance compared to a standalone student model (See Eq.1). Overall, the model architecture involving an ensemble of SpanBERT models with CRF, especially when using lower casing and BIO tagging, demonstrates strong performance in identifying hateful spans in text. The distillation process further boosts the student model’s effectiveness. Table 5 The official outcomes from our participation in the HASOC-23 encompassing Task 1, 3, and 4, best models are presented Team name Task, Language Base model Macro F1 Rank FiRC-NLP Task 1b (Gujrate) XLM-RoBERTa-large 0.848 1/17 Task 1a (Sinhala) XLM-RoBERTa-large 0.838 1/16 Task 3 (English) SpanBERT-base 0.570 1/12 Task 4 (Bengali) XLM-RoBERTa-large 0.764 2/20 Task 4 (Assamese) XLM-RoBERTa-large 0.725 2/20 Task 4 (Bodo) XLM-RoBERTa-large 0.848 4/19 4. Conclusion In this paper, we have presented a comprehensive analysis of hate speech and offensive language identification across multiple languages and tasks in HASOC-2023 competition. In Task 1 and Task 4, our research involves identifying offensive language in Sinhala, Gujarati, Bengali, Assamese, and Bodo languages, and Task 3, which involves hateful span detection in English text. Our research not only showcased the effectiveness of transformer-based models in these shared tasks but also emphasized the importance of model selection, task-specific customization, and innovative strategies to address the challenges posed by low resource languages, multilingual and cross-lingual contexts. As future work, further investigations are needed to explore the use of more diverse and specialized transformer models, as well as fine-tuning model parameters to achieve even better results. Additionally, we need to inspect the application of ensemble techniques and the incorporation of multiple thresholds for automatic annotation represents promising avenues for improving model robustness and generalization. References [1] J. T. Nockleby, Hate speech, Encyclopedia of the American constitution 3 (2000) 1277–1279. [2] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, A. Patel, Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages, in: Proceedings of the 11th forum for information retrieval evaluation, 2019, pp. 14–17. [3] T. Mandl, S. Modha, A. Kumar M, B. R. Chakravarthi, Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german, in: Forum for Information Retrieval Evaluation, 2020, pp. 29–32. [4] S. Modha, T. Mandl, G. K. Shahi, H. Madhu, S. Satapara, T. Ranasinghe, M. Zampieri, Overview of the HASOC subtrack at FIRE 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech, in: FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event, India, December 13 - 17, 2021, ACM, 2021, pp. 1–3. [5] S. Modha, T. Mandl, P. Majumder, S. Satapara, T. Patel, H. Madhu, Overview of the HASOC subtrack at FIRE 2022: Identification of conversational hate-speech in hindi-english code- mixed and german language, in: Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, Kolkata, India, December 9-13, 2022, volume 3395 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 475–488. [6] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval), arXiv preprint arXiv:1903.08983 (2019). [7] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova, G. Karadzhov, H. Mubarak, L. Derczynski, Z. Pitenis, Ç. Çöltekin, Semeval-2020 task 12: Multilingual offensive language identification in social media (offenseval 2020), arXiv preprint arXiv:2006.07235 (2020). [8] H. Kirk, W. Yin, B. Vidgen, P. Röttger, Semeval-2023 task 10: Explainable detection of online sexism, in: Proceedings of the The 17th International Workshop on Semantic Evaluation, SemEval@ACL 2023, Toronto, Canada, 13-14 July 2023, Association for Computational Linguistics, 2023, pp. 2193–2210. [9] M. Wiegand, M. Siegel, J. Ruppenhofer, Overview of the germeval 2018 shared task on the identification of offensive language (2018). [10] C. Bosco, D. Felice, F. Poletto, M. Sanguinetti, T. Maurizio, Overview of the evalita 2018 hate speech detection task, in: EVALITA 2018-Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, volume 2263, CEUR, 2018, pp. 1–9. [11] H. Mubarak, H. Al-Khalifa, A. Al-Thubaity, Overview of OSACT5 shared task on Arabic offensive language and hate speech detection, in: Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, European Language Resources Association, 2022, pp. 162–166. [12] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized BERT pretraining approach, CoRR abs/1907.11692 (2019). [13] P. He, J. Gao, W. Chen, Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing, CoRR abs/2111.09543 (2021). [14] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, ALBERT: A lite BERT for self-supervised learning of language representations, CoRR abs/1909.11942 (2019). [15] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, CoRR abs/1911.02116 (2019). [16] M. Zhou, PingAnLifeInsurance at SemEval-2023 task 10: Using multi-task learning to better detect online sexism, in: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), 2023. [17] F. Hassan, A. Bouchekif, W. Aransa, Firc at semeval-2023 task 10: Fine-grained classification of online sexism content using deberta, in: Proceedings of the The 17th International Workshop on Semantic Evaluation, SemEval@ACL 2023, Toronto, Canada, 13-14 July 2023, Association for Computational Linguistics, 2023, pp. 1824–1832. [18] G. Wiedemann, S. M. Yimam, C. Biemann, UHH-LT at SemEval-2020 task 12: Fine-tuning of pre-trained transformer networks for offensive language detection, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020. [19] S. Wang, J. Liu, X. Ouyang, Y. Sun, Galileo at SemEval-2020 task 12: Multi-lingual learning for offensive language identification using pre-trained language models, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020. [20] H. Alami, S. Ouatik El Alaoui, A. Benlahbib, N. En-nahnahi, LISAC FSDM-USMBA team at SemEval-2020 task 12: Overcoming AraBERT’s pretrain-finetune discrepancy for Arabic offensive language identification, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020. [21] M. Pàmies, E. Öhman, K. Kajava, J. Tiedemann, LT@Helsinki at SemEval-2020 task 12: Multilingual or language-specific BERT?, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020. [22] W. Antoun, F. Baly, H. Hajj, AraBERT: Transformer-based model for Arabic language understanding, in: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, European Language Resource Association, Marseille, France, 2020, pp. 9–15. URL: https://aclanthology.org/ 2020.osact-1.2. [23] I. A. Bhat, V. Mujadia, A. Tammewar, R. A. Bhat, M. Shrivastava, Iiit-h system submission for fire2014 shared task on transliterated search, in: Proceedings of the Forum for Information Retrieval Evaluation, 2014, pp. 48–53. [24] M. L. Ripoll, F. Hassan, J. Attieh, G. Collell, A. Bouchekif, Multi-lingual contextual hate speech detection using transformer-based ensembles, in: Forum for Information Retrieval Evaluation (Working Notes)(FIRE). CEUR-WS. org, 2022. [25] N. K. Singh, U. Garain, An analysis of transformer-based models for code-mixed conversational hate-speech identification, in: Forum for Information Retrieval Evaluation (Working Notes)(FIRE). CEUR-WS. org, 2022. [26] S. Masud, M. Bedi, M. A. Khan, M. S. Akhtar, T. Chakraborty, Proactively reducing the hate intensity of online posts via hate speech normalization, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, Association for Computing Machinery, New York, NY, USA, 2022, p. 3524–3534. URL: https://doi.org/10.1145/3534678.3539161. doi:10.1145/3534678.3539161 . [27] S. Satapara, H. Madhu, T. Ranasinghe, A. E. Dmonte, M. Zampieri, P. Pandya, N. Shah, M. Sandip, P. Majumder, T. Mandl, Overview of the hasoc subtrack at fire 2023: Hate- speech identification in sinhala and gujarati, in: K. Ghosh, T. Mandl, P. Majumder, M. Mitra (Eds.), Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, Goa, India. December 15-18, 2023, CEUR Workshop Proceedings, CEUR-WS.org, 2023. [28] K. Ghosh, A. Senapati, A. S. Pal, Annihilate Hates (Task 4, HASOC 2023): Hate Speech Detection in Assamese, Bengali, and Bodo languages, in: Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023. [29] S. Satapara, S. Masud, H. Madhu, M. A. Khan, M. S. Akhtar, T. Chakraborty, S. Modha, T. Mandl, Overview of the HASOC subtracks at FIRE 2023: Detection of hate spans and conversational hate-speech, in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2023, Goa, India. December 15-18, 2023, ACM, 2023. [30] T. Ranasinghe, K. Ghosh, A. S. Pal, A. Senapati, A. E. Dmonte, M. Zampieri, S. Modha, S. Satapara, Overview of the HASOC subtracks at FIRE 2023: Hate speech and offensive content identification in assamese, bengali, bodo, gujarati and sinhala, in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2023, Goa, India. December 15-18, 2023, ACM, 2023. [31] K. Ghosh, A. Senapati, U. Garain, Baseline bert models for conversational hate speech detection in code-mixed tweets utilizing data augmentation and offensive language identification in marathi, in: Fire, 2022. URL: https://api.semanticscholar.org/CorpusID: 259123570. [32] K. Ghosh, D. A. Senapati, Hate speech detection: a comparison of mono and multilingual transformer model with cross-language evaluation, in: Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation, De La Salle University, Manila, Philippines, 2022, pp. 853–865. URL: https://aclanthology.org/2022.paclic-1.94. [33] K. Ghosh, D. Sonowal, A. Basumatary, B. Gogoi, A. Senapati, Transformer-based hate speech detection in assamese, in: 2023 IEEE Guwahati Subsection Conference (GCON), 2023, pp. 1–5. doi:10.1109/GCON58516.2023.10183497 . [34] S. Masud, M. A. Khan, M. S. Akhtar, T. Chakraborty, Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection, in: Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.