Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Thomas Mandl1 , Sandip Modha2 , Gautam Kishore Shahi3 , Hiren Madhu4 , Shrey Satapara5 , Prasenjit Majumder5 , Johannes Schäfer1 , Tharindu Ranasinghe6 , Marcos Zampieri7 , Durgesh Nandini8 and Amit Kumar Jaiswal9,10 1 University of Hildesheim, Germany 2 LDRP-ITR, Gandhinagar, India 3 University of Duisburg-Essen, Germany 4 Indian Institute of Science, Bangalore, India 5 DA-IICT, Gandhinagar, India 6 University of Wolverhampton, United Kingdom 7 Rochester Institute of Technology, USA 8 University of Bamberg, Germany 9 University of Bedfordshire, United Kingdom 10 University of Leeds, United Kingdom Abstract The widespread of offensive content online such as hate speech poses a growing societal problem. AI tools are necessary for supporting the moderation process at online platforms. For the evaluation of these identification tools, continuous experimentation with data sets in different languages are necessary. The HASOC track (Hate Speech and Offensive Content Identification) is dedicated to develop benchmark data for this purpose. This paper presents the HASOC subtrack for English, Hindi, and Marathi. The data set was assembled from Twitter. This subtrack has two sub-tasks. Task A is a binary classification problem (Hate and Not Offensive) offered for all three languages. Task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY offered for English and Hindi. Overall, 652 runs were submitted by 65 teams. The performance of the best classification algorithms for task A are F1 measures 0.91, 0.78 and 0.83 for Marathi, Hindi and English, respectively. This overview presents the tasks and the data development as well as the detailed results. The systems submitted to the competition applied a variety of technologies. The best performing algorithms were mainly variants of transformer architectures. Keywords Social Media, Hate Speech, Offensive Language, Multilingual Text Classification, Machine Learning, Evaluation, Deep Learning Forum for Information Retrieval Evaluation, December 13-17, 2021, India Envelope-Open mandl@uni-hildesheim.de (T. Mandl); sjmodha@gmail.com (S. Modha); gautam.shahi@uni-due.de (G. K. Shahi); hirenmadhu16@gmail.com (H. Madhu); shreysatapara@gmail.com (S. Satapara); p_majumder@daiict.ac.in (P. Majumder); johannes.schaefer@uni-hildesheim.de (J. Schäfer); marcos.zampieri@rit.edu (M. Zampieri) Orcid 0000-0002-8398-9699 (T. Mandl); 0000-0003-2427-2433 (S. Modha); 0000-0001-6168-0132 (G. K. Shahi); 0000-0002-6701-6782 (H. Madhu); 0000-0001-6222-1288 (S. Satapara); 0000-0003-3207-3821 (T. Ranasinghe); 0000-0002-2346-3847 (M. Zampieri) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1. Introduction There are various types of potentially harmful content in social media such as misinformation and fake news [1], aggression [2], cyber-bullying [3, 4], pejorative language [5], offensive language [6], online extremism [7], to name a few. The automatic identification of problematic content has been receiving significant attention from the AI and NLP communities. In particular, the identification of offensive content, most notably hate speech, has been a growing research area. Within this broad area, various related phenomena have been addressed in isolation such as cyber-bulling, misogyny, aggression, and abuse [8, 9, 10] while some recent work has focused on modeling multiple types of offensive content at once [11, 12]. While research in this area has been gaining momentum [13], there is increasing evidence that social media platforms still struggle to keep up with the demand for technology, particularly for languages other than English [14]. For example, a recent article pointed out that Facebook does not have technology for identifying hate speech in the 22 official languages of India, its biggest market worldwide.1 To further contribute to the research in this field, the HASOC 2021 competition contributes with empirically-driven research aiming to find the best methods for the identification of offensive content in social media. In its third edition, HASOC 2021 features re-runs of English and Hindi tasks allowing for better comparison with the results from the editions HASOC 2019 [15] and HAOSC 2020 [16]. Marathi, a Indo-Aryan language similar to Hindi spoken by over 80 million people in India, was added as a new language in HAOSC 2021. A Subtask-2 including conversational hate speech is described in an additional overview paper [17]. 2. Related Work This section briefly reviews related research on hate speech identification and data sets created with this goal in mind. Current Benchmarks Recent shared task competitions organised such as TRAC [2], HASOC [18] and OffensEval [19] have presented multiple datasets for hate speech and offensive content identification. While a clear majority of these competitions present English data, several recent shared tasks have created new datasets for various languages such as Greek [20], Danish [21], Mexican Spanish [22], and Turkish [23]. These data sets have influenced the creation of machine learning models to automatically detect offensive content, ranging from SVM models [24] with traditional features to state-of-the-art transformer models [25]. As most of these models typically require training data for each language, it is important to have training data for various languages. Furthermore, one data set per language is not sufficient because the topics of hate speech could change, the potential bias of a data set cannot be easily revealed, and the concept cannot be clearly defined but has a subjective component. These data sets can be categorised in to two main categories. Data sets such as Offensive Language Detection in Spanish Variants (MeOffendEs@IberLEF 2021) [26] and DEtection of TOXicity in comments In Spanish (DETOXIS) [27] focus on general concepts of offensive content 1 https://www.nytimes.com/2021/10/23/technology/facebook-india-misinformation.html while other data sets are dedicated to more specific topics than general offensive content. A recent data set for Russian which models hate against ethnic groups as a multi-class problem [28] and Guest et al. [29] which has annotated misogyny as a multi-class problem are two recent data sets that focus on specific topics in offensive content identification. Annotation for Hate Speech The key activity in data set creation is annotation. Human annotators need to decide whether the texts presented to them belong to one of the classes relevant to the task. This process can be organised in different ways. There is no commonly agreed best practice. Some researchers employ a small number of experts [29] or non-experts [15] while others rely on crowd workers [30]. There is a high level of subjectivity associated with the labelling and the class assignment. This can be more serious in cases of systematic bias due to different knowledge levels about issues in society or even about language variants [31]. Also demographic features may lead to bias [32]. Sometimes users of data collections consider some tweets as erroneously labelled. However, it needs to be taken into consideration that the data providers need to follow a consistent protocol and deviations in the opinions about individual tweets are natural. These cases of different options and individual standards form part of any data set because typically, more than one person needs to work on the annotation. The typical method for measuring annotation quality is that some items are annotated at least twice, and metrics for inter-rater agreement measures the agreement. In cases of low agreement, it is unclear whether the reason behind this is a lack of common understanding between the annotators or the collection contains many dubious cases. One study showed that the agreement is substantially lower than for clear cases [33]. Before starting the annotation, it is not clear how large the portion of dubious cases is. So, even the inter-rater agreement cannot be a guarantee that the annotation is very good. Reliability of Data Sets Hate speech detection systems are not created only for research but also for real-world applications. It is crucial not just to measure the quality of the classification for one data set but also to analyse how well a system can generalise and be transferred to other data sets. This would be an indicator for a high level of generalisability in realistic scenarios. Substantial experiments by Fortuna et al. [34] showed that training with one data set and testing with another one can decrease the performance by over 30%. Many potential reasons can be seen as obstacles for the generalisability [35, 36, 37, 38] such as dataset size and annotation quality. However, little is known about their effects. Consequently, the creation of further hate speech data sets is necessary not only for measuring the performance of classifiers but also for the analysis of data sets, the creation processes, and measuring the reliability with new methods. 3. HASOC Task Overview and Data Set The HASOC 2021 dataset is another contribution to the growing body of resources for the analysis of Hate Speech classification. In the following sections, the tasks and the creation process of the data set are described. 3.1. Task Definition This task focuses on Hate speech and Offensive language identification for English, Hindi, and Marathi. Sub-task A is a coarse-grained binary classification in which participating systems are required to classify tweets into two classes, namely: Hate or Offensive (HOF) vs Non-Hate and Non-Offensive (NOT). • HOF - Hate and Offensive: This post contains hate, offensive or profane content. • NOT - Non Hate-Offensive: This post does not contain any Hate Speech, profanity or offensive content. This post contains normal content, statements or anything else. If the utterances are considered to be “normal” and not offending to anyone, they should not be labelled as this could be part of youth language or other language registers. 3.1.1. Sub-task B: Identifying Hate, profane and offensive posts (fine-grained) The second sub-task is a fine-grained classification task offered for English and Hindi. Hate- speech and offensive posts from the sub-task A need to be further classified into the following three categories: • HATE - Hate speech: Posts under this class contain Hate speech content. Ascribing negative attributes or deficiencies to groups of individuals because they are members of a group (e.g. “all poor people are stupid”). These posts includes hateful comments toward groups because of race, political opinion, sexual orientation, gender, social status, health condition or similar. • OFFN - Offensive: Posts under this class contain offensive content. Degrading, dehu- manizing or insulting an individual. • PRFN - Profane: These posts contain profane words. Unacceptable language in the absence of insults and abuse. This typically concerns the usage of obscenity, swearwords (Fuck etc.) and cursing (Hell! Damn! etc.). 3.2. Data Set Assembly The sampling of the data set was planned during the time when India was facing the second and extremely hard COVID-19 wave. Therefore, during the sampling process, major topics in social media are highly influenced by COVID-19, and these topics are frequent in the data set [39, 40, 41]. In addition to this, tweets were also sampled about topics related to the brutal post-poll violence in the Indian state West Bengal. Table 1 lists the topics and trending hashtags which were used during the sampling period. To obtain potentially hateful tweets from the very large corpus of tweets, we have trained a weak classifier based on SVM model with N-gram feature on the HASOC 2019 [42] and 2020 [16] data sets. The purpose of this was to create a weak binary classifier that gives an F1-score around 0.5. We used this classifier to predict labels on the downloaded tweet corpus. We randomly selected tweets classified as HOF (hateful/profane/offensive) by the week classifier. We randomly added 5% of the tweets which were not rated as belonging to the class HOF by the classifier. The main rationale behind this merging process is to ensure that the final data set Trending Hashtags Description of Topics #ResignModi Resignation of PM Modi over COVID-19 crisis in India #ModiKaVaccineJumla Controversy due to shortage of COVID-19 Vaccine #Murderer_Modi Death due to shortage of Oxygen attributed to Modi #IndiaCovidCrisis Brutal second COVID-19 wave in India #TMCTerror West bengal Post-poll violence. #BengalBurning West Bengal Post-poll violence. #ChineseWave Anger on China #chinesevirus Racist tweets on Chinese #communistvirus Hashtags trend by right-wing group #covidvaccine COVID-19 Vaccine #NoVaccinePassports vaccine passport #chinavirus Racist tweets on Chinese #wuhanvirus COVID-19 Origin #islamophobia Tweets related to hatred against Islam #JusticeForShahabuddin Death of Controversial Indian politician in India Table 1 Trending topics from the HASOC data set sample contains a balanced distribution of hateful and non-hateful tweets. We downloaded additional tweets using profane keywords to create an even more balanced data set. Table 2 lists examples for different classes from the data set. The size of data sets for training and testing are shown in Table 3 and Table 4. The tweets were extracted from Twitter using a targeted sampling approach. All tweets were annotated by at least two annotators. Any conflict between the annotators was resolved by a third annotator. The interrater agreement in subtask 1A is 69% and 72 % for English and Hindi, respectively. For subtask 1B, the agreement for English is 55% and 68% for Hindi. The data set for Marathi is based on recently released MOLD dataset [43]. MOLD contains data collected from Twitter. Gaikwad et al. [43] used 22 common curse words in Marathi together with search phrases related to politics, entertainment, and sports along with the hashtag #Marathi. With that, Gaikwad et al. [43] have collected a total 2,547 tweets that were annotated by six volunteer annotators who are native speakers of Marathi. After removing non-Marathi tweets, the final version of MOLD contains 2,499 annotated tweets randomly split 75%, 25% into training and testing sets, respectively. Only the sub-task A was available for Marathi. 4. Participation and Evaluation This section details the statistics about the participation in HASOC 2021 by the different teams from all over the world. HASOC 2021 is the third edition of the HASOC at the Forum for Information Retrieval (FIRE) 2021. HASOC started in 2019. This year, HASOC received a record number of participants. A total of 102 teams registered for the participation and 65 teams have submitted 652 runs for all the subtasks. Table 5 summarizes the statistics about the participation. Tweet Task-1 label Task-2 label yeah when she’s finally done w you you wanna pop back into HOF PRFN her life fuck off #ModiKaVaccineJumla Mr. Modi, where is your ”DeshBhakt” HOF OFFN BJP workers now??? Do you feel COVID is attacking only the anti-nationals or anti-BJPs ???? Shame a Curse On!!!! @30iPpgStmILw0SI @ChinaDaily #ChineseVirus #WuhanVirus NOT NONE is the #correct name for the #pandemic . #Shameless @manoramaonline Shame on people who are still supporting HOF HATE her... including Manorama. keeping MUM #ArrestMamata #BengalBurning #BengalViolence https://t.co/o7lXp6nYZW @timotheelvr BITCH GET OUT OF HERE WE ALL KNOW HOF PRFN SIALL IS REAL I am booked in to get my first dose of the #Covidvaccine and NOT NONE truth be told I am a bit nervous | First Dog on the Moon https://t.co/u7r8ThfOLW Table 2 Examples of tweet for each class from the data set Class English Marathi Hindi NOT 1,342 1,205 3,161 HOF 2,501 669 1,433 PRFN 1,196 - 213 HATE 683 - 566 OFFN 622 - 654 Sum 3,843 1,874 4,594 Table 3 Statistical overview of the Training Data Unlike previously, this year we decided to develop our own submission platform2 rather than using a third party service. We also provided a leaderboard facility to all participants and the community. The HASOC 2021 leaderboard can be accessed on our Github site3 . 5. Results This section presents the details about the results of the runs by the all participating teams who also submitted a paper describing their system. Figure 1 presents histograms of the performances of all the teams. Each bin in the histogram 2 https://hasocfire.github.io/submission/index.html 3 https://hasocfire.github.io/submission/leaderboard.html Class English Marathi Hindi NOT 798 418 1,027 HOF 483 207 505 PRFN 379 74 HATE 224 215 OFFN 195 216 Sum 1,281 625 1,532 Table 4 Statistical overview of the Test Data for determining the final results # of teams registered # of teams submitting runs # of runs # of papers 102 65 652 47 Table 5 Participation statistics Rank Team Name Macro F1 Rank Team Name Macro F1 1 t1 0.7825 18 MUM [44] 0.7423 2 Super Mario [45] 0.7797 19 BIU [46] 0.7400 3 Hasnuhana 0.7797 20 Data Pirates [47] 0.7394 4 NLP-CIC 0.7775 21 TeamBD [48] 0.7393 5 NeuralSpace [49] 0.7748 22 HNLP [50] 0.7379 6 KuiYongyi [51] 0.7725 23 JCT 0.7349 7 SATLab [52] 0.7718 24 TeamOulu [53] 0.7339 8 neuro-utmn-thales [54] 0.7682 25 SSN_NLP_MLRG [55] 0.7320 9 PreCog IIIT Hyderabad [56] 0.7648 26 AI-NLP-ML@IITP 0.7308 10 hate-busters 0.7641 27 Chandigarh_Concordia 0.7274 11 Sakshi HASOC [57] 0.7612 28 SSNCSE_NLP [58] 0.7264 12 UINSUSKA [59] 0.7555 29 S_Cube 0.7195 13 IRLab@IITBHU [60] 0.7547 30 HUNLP [61] 0.7194 14 SOA_NLP [62] 0.7542 31 TNLP [63] 0.7181 15 algo_unlock [64] 0.7536 32 IIT_Patna [65] 0.6848 16 UMUTeam [66] 0.7520 33 JU_PAD [67] 0.6762 17 CAROLL_Passau [68] 0.7504 34 DLRG [69] 0.6628 Table 6 Results of Task 1A Hindi depicts a range of 0.01 Macro F1 score. It provides an overview over the distribution of the results. 5.1. Hindi The best submission for Task A was achieved with a fine-tuned Multilingual-BERT with a classifier layer added at the final phase. The team trained on the HASOC Hindi data set for 20 Figure 1: Histograms of performance distribution epochs. With this fine-tuned Multilingual-BERT, the team [45] was able to achieve Macro F1 score of 0.7797. However, the second team was just 0.0049 points behind this best submission. Apart from fine-tuning a XLM-R transformer, the authors computed vector representations for emojis using the system Emoji2Vec and sentence embeddings for hashtags. These three resulting representations were concatenated before classification. The team was able to achieve the best results for Task B with the same approach [49]. This shows that simply ignoring emojis and hashtags in social media analysis might not always be the adequate approach. The second team in task B performed just 0.0017 points lower than this best team. This team Rank Team Name Macro F1 Rank Team Name Macro F1 1 NeuralSpace [49] 0.5603 13 algo_unlock [64] 0.4794 2 SATLab [52] 0.5586 14 DLRG [69] 0.4658 3 hate-busters 0.5582 15 S_Cube 0.4513 4 NLP-CIC 0.5530 16 HNLP [50] 0.4431 5 KuiYongyi [51] 0.5509 17 t1 0.4290 6 UMUTeam [66] 0.5167 18 UINSUSKA [59] 0.4257 7 IRLab@IITBHU [60] 0.5127 19 AI-NLP-ML@IITP 0.4077 8 PreCog IIIT Hyderabad [56] 0.5111 20 Chandigarh_Concordia 0.3906 9 SSN_NLP_MLRG [55] 0.5110 21 IIT_Patna [65] 0.3782 10 MUM [56] 0.4952 22 Super Mario [45] 0.2890 11 Data Pirates 0.4828 23 SOA_NLP [62] 0.2702 12 Hasnuhana 0.4825 24 Ignite [70] 0.0621 Table 7 Results of Task 1B Hindi fine-tuned a Multilingual-BERT transformer with a softmax loss function unlike the two teams previously mentioned which both applied a binary cross Entropy loss. Tables 6 and 7 clearly indicate that the top six for Task A and the top five teams for Task B have achieved very close Macro F1s with less than 0.001 difference. For Task A, the mean F1 score achieved by all the best submissions is 0.7436. The standard deviation of the submissions is 0.0289. However, for the top 10 submissions, the standard deviation is only 0.0058. Which is approximately only 15 th of the standard deviation of all teams. For task B, the mean F1 score achieved by all the best submissions is 0.4493 which shows that the fine-grained classification remains difficult. We need to consider that the interrater agreement is also low for this task. In this case, the standard deviation between systems is 0.1114, while it is 0.0241 for the best 10 submissions. The standard deviation of all teams is approximately 4.5 times higher than the top 10 teams’ standard deviation. 5.2. English The best submission for Task A used a GCN based approach in which the team defined tweets and words as nodes. A word node is connected with all the tweet nodes to which it belongs and a word node is connected to other word nodes that fall into the sliding window of that node across all tweets. Furthermore, the authors used TF-IDF weights as node weights. They were able to achieve 0.8215 as Macro F1 score [61]. The second team used a soft-voting ensemble of four different transformer models jointly fine-tuned on the original training set and the HatebaseTwitter data. using this external ressource, the team was able to achieve a F1 score which is only 0.0016 lower than first team. However the same team ranked first in Task B while using the same approach as for Task A and yielded a Macro F1 of 0.6577 [54]. The second team in Task B used BERT, TF-IDF and the similarity score between the two as features and concatenated them to feed this text representation into a classifier. They achieved a Macro F1 score of 0.6482. For Task A, the mean F1 score achieved by all the best submissions is 0.7569 while the Rank Team Name Macro F1 Rank Team Name Macro F1 1 NLP-CIC 0.8305 29 Alehegn Adane 0.7623 2 HUNLP [61] 0.8215 30 PC1 0.7618 3 neuro-utmn-thales [54] 0.8199 31 TeamBD [48] 0.7602 4 HNLP [50] 0.8089 32 IIT_Patna [65] 0.7578 5 Chandigarh_Concordia 0.8040 33 TIB-VA [71] 0.7565 6 KuiYongyi [51] 0.8030 34 S_Cube 0.7563 7 t1 0.8026 35 SOA_NLP [62] 0.7551 8 UINSUSKA [59] 0.8024 36 SSNCSE_NLP [58] 0.7541 9 TUW-Inf [72] 0.8018 37 JZ2021 [73] 0.7497 10 UMUTeam [66] 0.8013 38 Binary Beings [74] 0.7491 11 HASOC21rub [75] 0.8013 39 E8@IJS 0.7484 12 Super Mario [45] 0.8006 40 JU_CSE_Team 0.7468 13 Hasnuhana 0.8006 41 TCS Res. Lab Gurgaon [76] 0.7448 14 NeuralSpace [49] 0.7996 42 AI-NLP-ML@IITP 0.7413 15 Sakshi HASOC [57] 0.7993 43 MUM [44] 0.7389 16 IRLab@IITBHU [60] 0.7976 44 BIU [46] 0.7388 17 PreCog IIIT Hyderabad [56] 0.7959 45 QQQ [77] 0.7374 18 IMS-SINAI [78] 0.7947 46 Oswald 0.7339 19 SSN_NLP_MLRG [55] 0.7919 47 JCT 0.7327 20 giniUs 0.7909 48 TNLP [63] 0.7314 21 biCourage [79] 0.7900 49 DLRG [69] 0.7255 22 hate-busters 0.7894 50 TU Berlin [80] 0.7203 23 SATLab [52] 0.7823 51 UBCS [81] 0.7070 24 TAD 0.7776 52 PUCV 0.7037 25 Beware Haters [82] 0.7722 53 JU_PAD [67] 0.6813 26 TeamOulu [53] 0.7700 54 NLP_JU 0.5999 27 Vishesh Gupta [83] 0.7680 55 Team P&P 0.5133 28 AUST_AI 0.7644 56 ML-LTU 0.5012 Table 8 Results of Task 1A English standard deviation is 0.06255. For the top 10 submissions, the standard deviation is 0.01049 which is approximately 16 th of the standard deviation of all teams. For Task B, the mean F1 score achieved by the best submissions is 0.5707 and while the standard deviation is 0.0888. For the best 10 submissions, the standard deviation is 0.0114. The standard deviation of all teams is approximately 8 times the standard deviation of the top 10 teams. 5.3. Marathi The best submission for this task use a fine tuned XLM-R Large model with a simple softmax layer to predict the probabilities of class labels. They performed transfer learning from English data released for OffensEval 2019 [19] and Hindi data released for HASOC 2019 [18] and show that performing transfer learning from Hindi is better than performing transfer learning from English. They achieved an F1 score of 0.9144 [84]. Their approach shows the importance of Rank Team Name Macro F1 Rank Team Name Macro F1 1 NLP-CIC 0.6657 20 biCourage [79] 0.5966 2 neuro-utmn-thales [54] 0.6577 21 PreCog IIIT Hyderabad [56] 0.5927 3 HASOC21rub [75] 0.6482 22 Vishesh Gupta [83] 0.5871 4 Super Mario [45] 0.6447 23 MUM [56] 0.5771 5 UINSUSKA [59] 0.6417 24 Binary Beings [74] 0.5765 6 HNLP [50] 0.6396 25 S_Cube 0.5739 7 Hasnuhana 0.6392 26 AI-NLP-ML@IITP 0.5732 8 Beware Haters [82] 0.6311 27 DLRG [69] 0.5713 9 HUNLP [61] 0.6296 28 giniUs 0.5666 10 UMUTeam 0.6289 29 IIT_Patna [65] 0.5652 11 NeuralSpace [49] 0.6268 30 TCS Res. Lab Gurgaon [76] 0.5638 12 SSN_NLP_MLRG 0.6242 31 TU Berlin [80] 0.4969 13 TUW-Inf [72] 0.6207 32 Chandigarh_Concordia 0.4630 14 PC1 0.6174 33 t1 0.4003 15 KuiYongyi [51] 0.6116 34 SOA_NLP [62] 0.3995 16 SATLab [52] 0.6114 35 QQQ [77] 0.3770 17 hate-busters 0.6096 36 Team P&P 0.3454 18 IRLab@IITBHU [60] 0.6093 37 Oswald 0.3346 19 E8@IJS 0.5994 Table 9 Results of Task 1B English Rank Team Name Macro F1 Rank Team Name Macro F1 1 WLV-RIT [84] 0.9144 14 UMUTeam [66] 0.8423 2 neuro-utmn-thales [54] 0.8808 15 MUM [44] 0.8411 3 Hasnuhana 0.8756 16 hate-busters 0.8407 4 SATLab [52] 0.8749 17 Super Mario [45] 0.8395 5 PreCog IIIT Hyderabad [56] 0.8734 18 Sakshi HASOC [57] 0.8306 6 BIU [46] 0.8697 19 SSN_NLP_MLRG [55] 0.8223 7 t1 0.8696 20 HUNLP [61] 0.7895 8 JCT 0.8693 21 SSNCSE_NLP [55] 0.7773 9 algo_unlock [64] 0.8657 22 TNLP [63] 0.7519 10 NeuralSpace [49] 0.8651 23 DLRG [69] 0.7338 11 KuiYongyi [51] 0.8611 24 Chandigarh_Concordia 0.7096 12 IRLab@IITBHU [60] 0.8545 25 Mind Benders [85] 0.5388 13 NLP-CIC 0.8472 Table 10 Results of Task 1A Marathi performing transfer learning from a closely related language. The team in second place applied a fine tuned LaBSE transformer [86] on the Marathi data set as well as on the Hindi data set and achieved a F1 score of 0.8808. Their experiments show that LaBSE transformer [86] outperforms XLM-R in the monolingual settings, but XLM-R performs better when Hindi and Marathi data are combined [54]. For task A in Marathi, the mean F1 score achieved by all submissions is 0.8255 and while the standard deviation is 0.0774. Again for the top 10 submissions, the standard deviation is much lower and lies at 0.0143. 6. Conclusions and Future Work The third edition of HASOC has shown that transformer-based classification techniques are the state-of-the-art approach for hate speech and offensive content identification online. This corroborates the findings of recent related competitions such as OffensEval 2020 at SemEval [87]. The best results obtained by participants of HASOC 2021 in terms of macro F1-score were 0.83 in English, 0.78 in Hindi, 0.91 in Marathi. From Figure 1, we can argue that the results can be approximated by a negatively skewed distribution. In a potential future edition of HASOC, we could encourage participants to use some time- series based classification model for the classification of tweets [88]. HASOC 2021 offered a set of tasks for English, Hindi and Marathi. In the upcoming HASOC edition, we intend to investigate a task for summarization of hateful and normal tweets on long-running debatable topics [89] such as the Middle-East crisis, the Kashmir problem and religious intolerance. Acknowledgments We are thankful to Mr. Pavan Pandya and Mr. Harshil Modh for their contribution in developing the HASOC run submission platform and in the annotation process. We are also thankful to Ms. Mohana Dave and Mr. Vraj Shah for help in the data set sampling and annotation process. We thank all reviewers for HASOC 2021 for their work in a short period of time. We also thank Ms. Ramona Böcker for supporting the paper checking process. References [1] P. Nakov, G. D. S. Martino, T. Elsayed, A. Barrón-Cedeño, R. Míguez, S. Shaar, F. Alam, F. Haouari, M. Hasanain, W. Mansour, B. Hamdan, Z. S. Ali, N. Babulkov, A. Nikolov, G. K. Shahi, J. M. Struß, T. Mandl, M. Kutlu, Y. S. Kartal, Overview of the CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and Fake News, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 12th International Conference of the CLEF Association, CLEF Virtual Event, September 21-24, volume 12880 of Lecture Notes in Computer Science, Springer, 2021, pp. 264–291. URL: https://doi.org/10.1007/978-3-030-85251-1_19. [2] R. Kumar, A. K. Ojha, S. Malmasi, M. Zampieri, Evaluating aggression identification in social media, in: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, TRAC@LREC 2020, Marseille, France, May, European Language Resources Association (ELRA), 2020, pp. 1–5. URL: https://aclanthology.org/2020.trac-1.1/. [3] J. Shetty, K. Chaithali, A. M. Shetty, B. Varsha, V. Puthran, Cyber-bullying detection: A com- parative analysis of twitter data, in: Advances in Artificial Intelligence and Data Engineer- ing, Springer, 2020, pp. 841–855. doi:D O I h t t p s : / / d o i . o r g / 1 0 . 1 0 0 7 / 9 7 8 - 9 8 1 - 1 5 - 3 5 1 4 - 7 _ 6 2 . [4] W. N. H. W. Ali, M. Mohd, F. Fauzi, Identification of profane words in cyberbullying incidents within social networks, Journal of Information Science Theory and Practice 9 (2021) 24–34. doi:h t t p s : / / d o i . o r g / 1 0 . 1 6 3 3 / J I S T a P . 2 0 2 1 . 9 . 1 . 2 . [5] L. P. Dinu, I.-B. Iordache, A. S. Uban, M. Zampieri, A computational exploration of pejora- tive language in social media, in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 3493–3498. URL: https://aclanthology.org/2021.findings-emnlp.296. [6] T. Ranasinghe, M. Zampieri, Multilingual offensive language identification with cross- lingual embeddings, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 5838–5844. URL: https://aclanthology.org/2020.emnlp-main.470. doi:1 0 . 1 8 6 5 3 / v1/2020.emnlp- main.470. [7] S. Aldera, A. Z. Emam, M. Al-Qurishi, M. A. AlRubaian, A. Alothaim, Online extrem- ism detection in textual content: A systematic literature review, IEEE Access 9 (2021) 42384–42396. URL: https://doi.org/10.1109/ACCESS.2021.3064178. [8] S. Jaki, T. De Smedt, M. Gwóźdź, R. Panchal, A. Rossa, G. De Pauw, Online hatred of women in the incels. me forum: Linguistic analysis and automatic detection, Journal of Language Aggression and Conflict 7 (2019) 240–268. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 7 5 / j l a c . 0 0 0 2 6 . j a k . [9] P. Fortuna, J. Soler Company, L. Wanner, Toxic, hateful, offensive or abusive? what are we really classifying? an empirical analysis of hate speech datasets, in: Proceedings of the 12th Language Resources and Evaluation Conference, LREC, Marseille, France, May 11-16, European Language Resources Association, 2020, pp. 6786–6794. URL: https: //aclanthology.org/2020.lrec-1.838/. [10] M. Mladenovic, V. Osmjanski, S. Vujicic Stankovic, Cyber-aggression, cyberbullying, and cyber-grooming: A survey and research challenges, ACM Computing Surveys 54 (2021) 1:1–1:42. URL: https://doi.org/10.1145/3424246. [11] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Predicting the type and target of offensive posts in social media, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 1415–1420. URL: https://aclanthology.org/ N19-1144. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 1 4 4 . [12] S. Rosenthal, P. Atanasova, G. Karadzhov, M. Zampieri, P. Nakov, SOLID: A large-scale semi-supervised dataset for offensive language identification, in: Findings of the Associ- ation for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, Online, 2021, pp. 915–928. URL: https://aclanthology.org/2021.findings-acl.80. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . f i n d i n g s - a c l . 8 0 . [13] S. Jaki, S. Steiger (Eds.), Digitale Hate Speech - Interdisziplinäre Perspektiven auf Erken- nung, Beschreibung und Regulation, Springer, Cham, 2022. [14] S. Modha, P. Majumder, T. Mandl, C. Mandalia, Detecting and visualizing hate speech in social media: A cyber watchdog for surveillance, Expert Systems and Applications 161 (2020) 113725. URL: https://doi.org/10.1016/j.eswa.2020.113725. doi:1 0 . 1 0 1 6 / j . e s w a . 2 0 2 0 . 113725. [15] T. Mandl, S. Modha, P. Majumder, D. Patel, Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages), in: Working Notes of the Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE, CEUR-WS, 2019. URL: http://ceur-ws.org/Vol-2517/T3-1.pdf. [16] T. Mandl, S. Modha, G. K. Shahi, A. K. Jaiswal, D. Nandini, D. Patel, P. Majumder, J. Schäfer, Overview of the HASOC track at FIRE 2020: Hate speech and offensive content identifica- tion in Indo-European Languages, in: Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, Hyderabad, India, December 16-20, volume 2826, CEUR-WS.org, 2020, pp. 87–111. URL: http://ceur-ws.org/Vol-2826/T2-1.pdf. [17] S. Satapara, S. Modha, T. Mandl, H. Madhu, P. Majumder, Overview of the HASOC Subtrack at FIRE 2021: Conversational Hate Speech Detection in Code-mixed language , in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation, CEUR, 2021. [18] S. Modha, T. Mandl, P. Majumder, D. Patel, Overview of the HASOC track at FIRE 2019: Hate speech and offensive content identification in Indo-European Languages, in: Working Notes of FIRE 2019 - Forum for Information Retrieval Evaluation, Kolkata, India, December 12-15, volume 2517, CEUR-WS.org, 2019, pp. 167–190. URL: http://ceur-ws.org/Vol-2517/ T3-1.pdf. [19] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval), in: Proceedings of the 13th International Workshop on Semantic Evaluation, Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019, pp. 75–86. URL: https: //aclanthology.org/S19-2010. doi:1 0 . 1 8 6 5 3 / v 1 / S 1 9 - 2 0 1 0 . [20] Z. Pitenis, M. Zampieri, T. Ranasinghe, Offensive language identification in Greek, in: Proceedings of the 12th Language Resources and Evaluation Conference, Euro- pean Language Resources Association, Marseille, France, 2020, pp. 5113–5119. URL: https://aclanthology.org/2020.lrec-1.629. [21] G. I. Sigurbergsson, L. Derczynski, Offensive language and hate speech detection for Danish, in: Proceedings of the 12th Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, 2020, pp. 3498–3508. URL: https://aclanthology.org/2020.lrec-1.430. [22] M. E. Aragón, M. Á. Á. Carmona, M. Montes-y Gómez, H. J. Escalante, L. V. Pineda, D. Moctezuma, Overview of MEX-A3T at IberLEF 2019: Authorship and aggressiveness analysis in Mexican Spanish Tweets., in: Iberian Languages Evaluation Forum (IberLEF) SEPLN, 2019, pp. 478–494. URL: http://ceur-ws.org/Vol-2421/MEX-A3T_overview.pdf. [23] Ç. Çöltekin, A corpus of Turkish offensive language on social media, in: Proceedings of the 12th Language Resources and Evaluation Conference, LREC Marseille, France, May 11-16, European Language Resources Association, 2020, pp. 6174–6184. URL: https: //aclanthology.org/2020.lrec-1.758/. [24] V. Indurthi, B. Syed, M. Shrivastava, N. Chakravartula, M. Gupta, V. Varma, FERMI at SemEval-2019 task 5: Using sentence embeddings to identify hate speech against immigrants and women in Twitter, in: Proceedings of the 13th International Workshop on Semantic Evaluation, Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019, pp. 70–74. URL: https://aclanthology.org/S19-2009. doi:1 0 . 1 8 6 5 3 / v 1 / S 1 9 - 2 0 0 9 . [25] T. Ranasinghe, M. Zampieri, H. Hettiarachchi, Brums at hasoc 2019: Deep learning models for multilingual hate speech and offensive language identification, in: FIRE (Working Notes), CEUR, 2019. [26] F. M. Plaza-del Arco, M. Casavantes, H. J. Escalante, M. T. Martín-Valdivia, A. Montejo-Ráez, M. Montes, H. Jarquín-Vásquez, L. Villaseñor-Pineda, et al., Overview of MeOffendEs at IberLEF 2021: Offensive language detection in Spanish variants, Procesamiento del Lenguaje Natural 67 (2021) 183–194. URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/ pln/article/view/6388. [27] J. Gonzalo, M. Montes-y-Gómez, P. Rosso, Iberlef 2021 overview: Natural language processing for iberian languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing., Málaga, Spain, September, 2021, volume 2943 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 1–15. URL: http://ceur-ws.org/ Vol-2943/Overview_iberLEF_2021.pdf. [28] E. V. Pronoza, P. Panicheva, O. Koltsova, P. Rosso, Detecting ethnicity-targeted hate speech in Russian social media texts, Information Processing and Management 58 (2021) 102674. URL: https://doi.org/10.1016/j.ipm.2021.102674. [29] E. Guest, B. Vidgen, A. Mittos, N. Sastry, G. Tyson, H. Z. Margetts, An expert annotated dataset for the detection of online misogyny, in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, Association for Computational Linguistics, 2021, pp. 1336–1350. URL: https://aclanthology.org/2021.eacl-main.114/. [30] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopoulos, SemEval-2021 task 5: Toxic spans detection, in: Proceedings of the 15th International Workshop on Semantic Eval- uation (SemEval-2021), Association for Computational Linguistics, Online, 2021. URL: https://aclanthology.org/2021.semeval-1.6. [31] M. Sap, D. Card, S. Gabriel, Y. Choi, N. A. Smith, The risk of racial bias in hate speech detection, in: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL, Florence, Italy, July 28- August 2, Volume 1: Long Papers, Association for Computational Linguistics, 2019, pp. 1668–1678. URL: https://doi.org/10.18653/v1/ p19-1163. [32] H. A. Kuwatly, M. Wich, G. Groh, Identifying and measuring annotator bias based on annotators’ demographic characteristics, in: Proceedings of the Fourth Workshop on Online Abuse and Harms, WOAH Online, November 20, Association for Computational Linguistics, 2020, pp. 184–190. URL: https://doi.org/10.18653/v1/2020.alw-1.21. [33] J. Salminen, H. Almerekhi, A. M. Kamel, S. Jung, B. J. Jansen, Online hate ratings vary by extremes: A statistical analysis, in: Proceedings of the 2019 Conference on Human Information Interaction and Retrieval, CHIIR , Glasgow, Scotland, UK, March 10-14, ACM, 2019, pp. 213–217. doi:1 0 . 1 1 4 5 / 3 2 9 5 7 5 0 . 3 2 9 8 9 5 4 . [34] P. Fortuna, J. Soler Company, L. Wanner, How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?, Information Processing and Management 58 (2021) 102524. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . i p m . 2 0 2 1 . 102524. [35] W. Yin, A. Zubiaga, Towards generalisable hate speech detection: a review on obstacles and solutions, PeerJ Computer Science 7 (2021) e598. doi:1 0 . 7 7 1 7 / p e e r j - c s . 5 9 8 . [36] B. Vidgen, L. Derczynski, Directions in abusive language training data, a systematic review: Garbage in, garbage out, PLOS ONE 15 (2021) 1–32. URL: https://doi.org/10.1371/journal. pone.0243300. [37] T. Ranasinghe, M. Zampieri, An evaluation of multilingual offensive language identification methods for the languages of india, Information 12 (2021). URL: https://www.mdpi.com/ 2078-2489/12/8/306. doi:1 0 . 3 3 9 0 / i n f o 1 2 0 8 0 3 0 6 . [38] T. Ranasinghe, M. Zampieri, Multilingual offensive language identification for low-resource languages, ACM Transactions on Asian and Low-Resource Language Information Process- ing 21 (2021). URL: https://doi.org/10.1145/3457610. [39] G. K. Shahi, A. Dirkson, T. A. Majchrzak, An exploratory study of covid-19 misinformation on twitter, Online social networks and media 22 (2021) 100104. [40] G. K. Shahi, Amused: An annotation framework of multi-modal social media data, arXiv preprint arXiv:2010.00502 (2020). [41] G. K. Shahi, D. Nandini, Fakecovid–a multilingual cross-domain fact check news dataset for covid-19, arXiv preprint arXiv:2006.11343 (2020). [42] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, A. Patel, Overview of the HASOC track at FIRE 2019: Hate speech and offensive content identification in Indo-European languages, in: Proceedings of the 11th Forum for Information Retrieval Evaluation, 2019, pp. 14–17. [43] S. S. Gaikwad, T. Ranasinghe, M. Zampieri, C. Homan, Cross-lingual offensive language identification for low resource languages: The case of Marathi, in: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Held Online, 1-3 September, 2021, pp. 437–443. URL: https://aclanthology.org/2021.ranlp-1. 50. [44] A. Hegde, M. D. Anusha, H. L. Shashirekha, Ensemble Based Machine Learning Models for Hate Speech and Offensive Content Identification, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [45] S. Banerjee, M. Sarkar, N. Agrawal, P. Saha, M. Das, Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [46] Y. Hacohen-Kerner, M. Uzan, Detecting Offensive Language in English, Hindi, and Marathi using Classical Supervised Machine Learning Methods and Word/Char N-grams, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [47] P. Mankar, A. Gangurde, D. Chaudhari, A. Pawar, Machine Learning Models for Hate Speech and Offensive Language Identification for Indo-Aryan Language: Hindi, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [48] M. S. Jahan, M. Oussalah, J. K. Mim, M. Islam, Offensive Language Identification Using Hindi-English Code-Mixed Tweets, and Code-Mixed Data Augmentation, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [49] M. Bhatia, T. S. Bhotia, A. Agarwal, P. Ramesh, S. Gupta, K. Shridhar, F. Laumann, A. Dash, One to Rule Them All: Towards Joint Indic Language Hate Speech Detection, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [50] A. Mitra, P. Sankhala, Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [51] Y. Kui, Detect Hate and Offensive Content in English and Indo-Aryan Languages based on Transformer, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [52] Y. Bestgen, A simple language-agnostic yet strong baseline system for hate speech and offensive content identification, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [53] M. S. Jahan, D. R. Beddiar, M. Oussalah, N. Arhab, Y. Bounab, Hate and Offensive language detection using BERT for English Subtask A, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [54] A. Glazkova, M. Kadantsev, M. Glazkov, Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [55] K. Adaikkan, T. Durairaj, Multilingual Hate speech and Offensive language detection in English, Hindi, and Marathi languages, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [56] A. Kadam, A. Goel, J. Jain, J. S. Kalra, M. Subramanian, M. Reddy, P. Kodali, A. T. H, M. Shrivastava, P. Kumaraguru, Battling Hateful Content in Indic Languages HASOC ’21, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [57] S. Kalra, K. N. Inani, Y. Sharma, G. S. Chauhan, Detection of Hate, Offensive and Pro- fane Content from the Post of Twitter using Transformer-Based Models, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [58] A. Anand, J. Golecha, B. B, B. Jayaraman, M. T. T, Machine Learning based hate speech identification for English and Indo-Aryan languages, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [59] S. Agustian, R. Saputra, A. Fadhilah, “Feature Selection” with Pretrained-BERT for Hate Speech and Offensive Content Identification in English and Hindi Languages, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [60] S. Chanda, S. Ujjwal, S. Das, S. Pal, Fine-tuning Pre-Trained Transformer based model for Hate Speech and Offensive Content Identification in English, Indo-Aryan and Code-Mixed (English-Hindi) languages, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [61] N. Bölücü, P. Canbay, Hate Speech and Offensive Content Identification with Graph Convolutional Networks, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [62] A. Kumar, P. K. Roy, S. Saumya, An Ensemble Approach for Hate and Offensive Language Identification in English and Indo-Aryan Languages , in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [63] R. Rajalakshmi, S. Srivarshan, F. Mattins, K. E, P. Seshadri, A. K. M, Conversational Hate-Speech detection in Code-Mixed Hindi-English Tweets, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [64] A. Velankar, H. Patil, A. Gore, S. Salunke, R. Joshi, Hate and Offensive Speech Detection in Hindi and Marathi, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [65] K. Maity, A. Kumar, S. Saha, Attention Based BERT-FastText model for Hate Speech and Offensive Content Identification in English and Hindi Languages, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [66] C. Caparrós-Laiz, J. Antonio, G. Díaz, R. Valencia-Garcia, Detecting Hate Speech on English and Indo-Aryan Languages with BERT and Ensemble learning, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [67] P. Nandi, D. Das, Detection of Hate or Offensive Phrase using Magnified Tf-Idf, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [68] S. Kannan, J. Mitrović, Hatespeech and Offensive Content Detection in Hindi Language using C-BiGRU, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [69] R. Rajalakshmi, F. Mattins, S. S, P. Reddy, A. K. M, Hate Speech and Offensive Content Identification in Hindi and Marathi Language Tweets using Ensemble Techniques, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [70] I. Jadhav, A. Kanade, V. Waghmare, D. Chaudhari, Hate and Offensive Speech Detection in Hindi Twitter Corpus, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [71] S. Hakimov, R. Ewerth, Combining Textual Features for the Detection of Hateful and Offensive Language, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [72] K. Gémes, A. Kovács, M. Reichel, G. Recski, Offensive text detection on English Twitter with deep learning models and rule-based systems, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [73] J. Zeng, L. Xu, ALBERT for Hate Speech and Offensive Content Identification, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [74] S. Saseendran, S. R, S. V, S. Giri, Classification of Hate Speech and Offensive Content using an approach based on DistilBERT, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [75] W. Yu, B. Boenninghoff, D. Kolossa, Hybrid Representation Fusion for Twitter Hate Speech Identification, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [76] S. Sangwan, L. Dey, M. Shakir, Gated Multi-task learning framework for text classification, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [77] Y. Xu, H. Ning, Y. Sun, Hate Speech and Offensive Content Identification Based on Self-Attention, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [78] F. M. P. del Arco, S. Halat, S. Padó, R. Klinger, Multi-Task Learning with Sentiment, Emotion, and Target Detection to Recognize Hate Speech and Offensive Language, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [79] R. Wilkens, D. Ognibene, biCourage: ngram and syntax GCNs for Hate Speech detection, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [80] S. Mohtaj, V. Schmitt, S. Möller, A Feature Extraction based Model for Hate Speech Identification, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [81] N. P. Motlogelwa, E. Thuma, M. Mudongo, T. Leburu-Dingalo, G. Mosweunyane, Leverag- ing Text Generated from Emojis for Hate Speech and Offensive Content Identification, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [82] D. N, R. Avireddy, A. Ambalavanan, B. R. Selvamani, Hate Speech Detection using LIME guided Ensemble Method and DistilBERT, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [83] V. Gupta, R. Kumar, R. Pamula, Hate Speech and Offensive Content Identification in English Tweets, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [84] M. Nene, K. North, T. Ranasinghe, M. Zampieri, Transformer Models for Offensive Language Identification in Marathi, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [85] D. Gajbhiye, S. Deshpande, P. Ghante, A. Kale, D. Chaudhari, Machine Learning Models for Hate Speech Identification in Marathi Language, in: Forum for Information Retrieval Evaluation (Working Notes) (FIRE), CEUR-WS.org, 2021. [86] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, W. Wang, Language-agnostic bert sentence embedding, CoRR abs/2007.01852 (2020). URL: https://arxiv.org/abs/2007.01852. [87] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova, G. Karadzhov, H. Mubarak, L. Der- czynski, Z. Pitenis, Ç. Çöltekin, Semeval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020), in: Proceedings of the Fourteenth Work- shop on Semantic Evaluation, 2020, pp. 1425–1447. [88] G. K. Shahi, I. Bilbao, E. Capecci, D. Nandini, M. Choukri, N. Kasabov, Analysis, clas- sification and marker discovery of gene expression data with evolving spiking neural networks, in: International Conference on Neural Information Processing, Springer, 2018, pp. 517–527. [89] S. Modha, P. Majumder, T. Mandl, R. Singla, Design and analysis of microblog-based summarization system, Social Network Analysis and Mining 11 (2021) 1–16. URL: https: //doi.org/10.1007/s13278-021-00830-3.