Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches Ravindra Nayak1 , Raviraj Joshi2 1 Sri Jayachamarajendra College of Engineering, Mysore 2 Indian Institute of Technology Madras, Chennai Abstract In the recent past, social media platforms have helped people in connecting and communicating to a wider audience. But this has also led to a drastic increase in cyberbullying. It is essential to detect and curb hate speech to keep the sanity of social media platforms. Also, code mixed text containing more than one language is frequently used on these platforms. We, therefore, propose automated techniques for hate speech detection in code mixed text from scraped Twitter. We specifically focus on code mixed English-Hindi text and transformer-based approaches. While regular approaches analyze the text in- dependently, we also make use of content text in the form of parent tweets. We try to evaluate the performances of multilingual BERT and Indic-BERT in single-encoder and dual-encoder settings. The first approach is to concatenate the target text and context text using a separator token and get a single representation from the BERT model. The second approach encodes the two texts independently using a dual BERT encoder and the corresponding representations are averaged. We show that the dual-encoder approach using independent representations yields better performance. We also employ simple ensem- ble methods to further improve the performance. We describe the systems built by our team r1_2021 for HASOC 2021 Subtask 2 and the subsequent set of experiments. Keywords Hate Speech Detection, Social Media, Code Mixed, Hinglish, Multilingual, Indic, BERT, Context-aware, Deep Learning 1. Introduction Social media is a boon to many, as they have helped in creating and promoting budding businesses on such platforms. Although it has vast use cases, it comes with a caveat too. People with malicious intent have considered it as an opportunity to promote hate speech among a wider audience [1, 2]. There has been multiple research that directly links social media to poor mental health. Because such platforms are outnumbered by youngsters, their mental stability is trivial in shaping their future careers. So it is necessary to take actions against such malevolent content on a large scale. Offensive language such as insulting, hurtful, derogatory or obscene content directed towards people might suppress meaningful discussions. As there are no restrictions on expressing peoples opinions on such platforms, it might lead to the defaming of personalities. So it is the platform’s responsibility to restrain such content. Hate speech mainly involves discriminating Forum for Information Retrieval Evaluation, December 13-17, 2021, India " ravindranyk707@gmail.com (R. Nayak); ravirajoshi@gmail.com (R. Joshi) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Table 1 Few training samples of context and tweets. HOF indicates Hate & Offensive speech, whereas NOT indicates Non-Hate Offensive speech content. Context Tweet Label - INDIA NEEDS VACCINES NOT INDIA NEEDS VACCINES Yes ma’am India need vaccine Doctors and facilities NOT INDIA NEEDS VACCINES Is there any Vaccine which can prevent India from HOF you INDIA NEEDS VACCINES vaccine insano k liye hain reptiles k liye nhi HOF - Look at this insensitive piece of shit HOF Look at this insensitive piece of shit After all that is happening since last month but still NOT Look at this insensitive piece of shit Yo wtf HOF Look at this insensitive piece of shit Haha HOF against people based on religion, community, race, nationality, gender or any other identity factors [3, 4]. Even though manual moderation of hate speech is always precise, it isn’t recommended considering the huge volumes of data that is being pumped into social media. So there is a constant need for automated techniques to suppress such hateful content where the ages of all the groups are exposed to [5, 6, 7, 8]. As we have seen advances in computing capabilities, machine learning algorithms have gained their importance in tasks that involve understanding natural language. In this work, we are interested in the hate speech detection of tweets. This paper mainly focuses on evaluating the HASOC 2021, Identification of Conversational Hate-Speech in Code- Mixed Languages (ICHCL) subtask [9]. This task aims to detect hate speech in individual tweets and their respective comments and replies which support hate speech directly or indirectly. The dataset contains scraped text from Twitter with binary labels. A conversational thread can contain abusive or offensive content, which is not apparent just from a single comment or the reply to a comment but can be identified if given the context of the parent content as shown in Table 1. Furthermore, the contents on such social media are spread in so many different languages, including code-mixed languages such as Hinglish (mix of Hindi and English in roman text) [10]. The hate speech detection task can be considered as a binary text classification problem [11]. We solely depend upon the tweets and the context of the tweet to determine hateful content. Even though some of the tweets can be rejected considering the behaviour of the content creator, we cannot always guarantee that this information is available every time. We evaluate various deep learning techniques, specifically the multilingual BERT based models. We have tried to experiment on various fine-tuning methods, and how they are helpful for the model to detect malicious tweets. 2. Related Work Hate speech detection is precise when manually moderated. The context of such tweets is also important to identify hatefulness. For the code mixed data, in particular, the moderator must have a vast knowledge of vocabulary across languages to curb malicious content. If we have enough data on the user’s behaviour and tweet content, then it could help us in mitigating such content by blacklisting such users. Many approaches like graph convolution networks are being used that capture not only the structure of online communities but also the linguistic behaviour of the users within them [12]. Dictionary-based approaches are popular for text data where we try to maintain a list of words or phrases that might be profane or any kind of racial slurs. Various machine learning approaches involve the usage of extra-linguistic features in conjunction with character n- grams to build binary logistic regression classifiers [13]. There have been studies showing that including knowledge graph features have helped in building better models [14]. Word level embeddings like Glove have helped in better capture of the semantics of words in comparison to one-hot encoding [15]. Another similar approach is to make use of sentence-level embeddings like ELMo which help in extracting rich features from the text. These embeddings are then fed to bi-directional LSTMs or CNNs for classification [16]. As these embeddings are trained on huge corpora of data, they are often called transfer learning as they help in reusing feature-rich vectors for similar classification tasks. Various other features like LIWC features, SentiWordnet and Profanity vectors also aid the model [17]. For the code mixed Hinglish data sets, there have been studies on ensembling BERT based embeddings along with Bi-LSTM to improve the model [3]. As context plays an important role in the detection of hate speech, context-aware models are built which take previous tweet’s features as an input along with the current tweet. Various ensembles of traditional machine learning algorithms with deep learning techniques have also been explored [18]. 3. Architecture details In this section, we describe the details of different techniques along with their hyperparameters. Figure 1 gives a summary of the model details along with 2 architectures that were explored in this work. We use transformer-based neural networks as they have shown great progress in NLP tasks [19]. As these networks help in the parallelisation of computations, they have an immense advantage over their predecessor networks like RNN and LSTM. Transformers reduce the latency of model inference time as they are capable of making use of the contemporary hardware available. We explore two multilingual variations of BERT-based models viz. m-BERT and Indic-BERT. Both the BERT variations include Hindi as one of the pre-training languages. 3.1. Multilingual-BERT (m-BERT) This model’s architecture is based on BERT-base [20]. It is a model that contains 12 transformer blocks, 12 self-attention heads, hidden size of 768. The input for BERT contains a maximum embedding of 512 words and it outputs a sequential representation. Special tokens like [CLS] and [SEP] are used to specify the start of a sentence and separation of sentences respectively. For a classification task, final encoder representations are considered and a softmax is applied to classify the representation. Figure 1: Representation of single encoder approach (left) and dual encoder approach (right). As the BERT-base consists of only English text, we use a Multilingual BERT-base model that has been trained on 102 languages using a shared word-piece vocabulary of size 110k. Oversampling of low resource languages is done to overcome data imbalance. It has shown great results on zero-shot transfer learning for various downstream tasks and also helped in code-switched data tasks [21]. 3.2. Indic-BERT This model is based on ALBERT [22], which is a lighter version of BERT as they incorporate parameter sharing across layers which in turn leads to lesser parameters. They have also made modifications in pre-training mechanisms by introducing new pre-training tasks that have led to better sentence embeddings. ALBERT contains 12 transformer blocks, 12 self-attention heads, a hidden size of 768 and an input embedding size of 128. Indic-BERT is a multilingual ALBERT based model that has been trained on 12 major Indian languages with a shared vocabulary size of 200k. It has outperformed multilingual BERT in some of the Indic tasks [23]. 4. Experimental Setup 4.1. Dataset details The HASOC 2021 ICHCL dataset [9] consists of tweets and their context if any, along with the labels. The binary labels consist of (NOT) Non-Hate Offensive and (HOF) Hate and Offensive. This dataset comprises 2 level hierarchy where an individual tweet can be followed by a comment, Table 2 Statistics of the dataset Feature Train Test Total words 273627 68019 Max word length 166 131 Avg word count 47.67 50.45 Unique tokens 5717 1343 and that comment can have a reply. In the case of comments, we consider individual tweets as the context, and for replies, we consider the concatenated context of the first tweet and the comment associated with it. The dataset mainly consists of a train and test set. There are a total of 7088 tweets provided as a dataset, out of which there are 5740 training samples and 1348 test samples. We have considered a random 10 per cent of the data for validation purposes. Training data contains 2841 hate speech samples and 2899 non-hate speech samples, whereas test data contained 695 hate speech samples along with 653 non-hate speech samples. As the task mainly focuses on context-based hate speech detection, there were 82 individual tweets in training and 16 individual tweets in testing. The remaining 5658 data points in training and 1332 data points in the test used the individual tweets as the context. More statistics on the data is provided in Table 2. 4.2. Data preprocessing Various data preprocessing techniques are used to clean and normalize the tweets. • Removal of URLs: Often people use hyperlinks to different websites. As this might not help us, we are removing it. • Removal of User Mentions: User mentions are commonly used in tweets. Their removal is necessary as it is not helpful to the model. • Removal of Non-Hindi and Non-English characters: As we are sure about the dataset containing only Roman and Devanagari text, we remove characters outside the Unicode block. • Retain Emojis and Hashtags: We retain emojis and hashtags, as this will help in determining whether a tweet is supporting a hateful tweet, in the absence of text. 4.3. Training details All the models were trained using the PyTorch framework and Hugging Face library [24]. The models have been finetuned up to a maximum of 5 epochs and the minimum validation loss is the criteria used for picking the best epoch. As discussed in Figure 1, we mainly work on 2 approaches for m-BERT and Indic-BERT. • Single Encoder Approach (single sentence representation): This is a basic approach of fine-tuning BERT based models, where we add a dense layer after the BERT [CLS] token embedding followed by the softmax classifier. The context text and the target text Table 3 Result metrics for different BERT configurations. FE indicates BERT model with frozen embedding layer. C-Avg indicates averaging over the [CLS] token embeddings of the context and tweet. Dictionary indicates a static list of profane words. Model Precision Recall F1 score m-BERT baseline 66.07 65.63 65.53 Indic-BERT baseline 67.18 67.17 67.17 m-BERT + frozen embeddings (FE) 70.03 67.40 66.65 m-BERT + FE + C-Avg 67.70 67.65 67.65 m-BERT + FE + C-Avg + Dictionary 68.82 68.62 68.61 Indic-BERT + FE + Dictionary 70.71 70.07 69.99 Indic-BERT + FE + C-Avg + Dictionary 71.09 70.44 70.37 Ensemble 2 (Indic-BERT C-Avg+ m-BERT C-Avg) 71.65 71.59 71.60 Ensemble 4 (Indic-BERT + Indic C-Avg + m-BERT + m-BERT C-Avg) 73.21 73.17 73.07 are concatenated using a separator token to get a single [CLS] representation from the BERT model. • Dual Encoder Approach (averaging the context and target representations): As context plays a vital role in our dataset, we passed the context and the tweet separately to the BERT to get their [CLS] token embeddings. This embedding acts as a sentence representation for individual context and tweets. These embeddings are averaged and further passed to the dense layer for classification. If the context is absent, then we consider only the tweet representations. 5. Results and Discussions We evaluate different BERT-based approaches for the task of Hate speech detection. The results of the experiments are outlined in Table 3. The macro precision, recall, and F1-scores are metrics used to compare the models. As the target text uses code-mixed Hindi and English language, we use m-BERT and Indic-BERT as our baselines. In the baseline approach, we concatenate the target and the context text using a separator token. We perform a series of experiments on top of the baseline model by freezing the embedding layer and incorporating a static dictionary of offensive words. The frozen embeddings showed promising results as the token embeddings were not overfitted to the training data. The static dictionary is used as a deterministic classifier by directly tagging a text as hateful if any offensive word is present in the text. The dictionary was created using various web sources and neither train data nor test data were referenced during the process. In the dual encoder approach, we average out the [CLS] token embeddings for the context and the tweet, further showing improvement in F1 scores. Integration of the static dictionary with this method further improves the F1 numbers. The Indic-BERT model with frozen embeddings, static dictionary, and dual representations approach outperformed all the other techniques. We combine the best-performing models using simple ensemble techniques to get the best results. The scores of the individual models are fused using averaging. The confusion matrices for best models are shown in Figure 2. Note that the best-run r1_2021_v5 Figure 2: Confusion matrices for the Ensenble 4 approach (left) and IndicBERT Dual Encoder approach (right). The rows correspond to true class and columns correspond to predicted class. Each cell value further is segregated as the number of contextual examples + the number of non-contextual examples. submitted to the shared task was based on m-BERT + FE + C-Avg and resulted in an F1-score of 67.42%. The other experiments were conducted post shared task and results are reported on the same test set. 6. Conclusion Under the HASOC 2021 ICHCL task, we try to evaluate various finetuning techniques for code mixed data considering the context of the tweets. We have mainly focused on multilingual BERT based architectures. We observe that frozen embeddings give better results by retaining rich token representations from the pre-trained model. Moreover, averaging over sentence representations has helped the model in understanding the context better while trying to classify the current tweet. Using the ensemble of models, we report the best F1 score of 73.07%, over the m-BERT baseline F1 score of 65.53%. Note that our leader board F1-score is 67.42% and the other experiments were performed post final submission. Primarily we emphasize the importance of averaging representations using the dual BERT encoder setting in context-based text classification problems. Acknowledgments This research was conducted under the guidance of L3Cube, Pune. We would like to express our gratitude towards our mentors at L3Cube for their continuous support and encouragement. References [1] C. Ezeibe, Hate speech and election violence in nigeria, Journal of Asian and African Studies 56 (2021) 919–935. [2] A. Matamoros-Fernández, J. Farkas, Racism, hate speech, and social media: A systematic review and critique, Television & New Media 22 (2021) 205–224. [3] N. Vashistha, A. Zubiaga, Online multilingual hate speech detection: Experimenting with hindi and english social media, Information 12 (2021). URL: https://www.mdpi.com/ 2078-2489/12/1/5. doi:10.3390/info12010005. [4] S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, O. Frieder, Hate speech detection: Challenges and solutions, PloS one 14 (2019) e0221152. [5] A. Schmidt, M. Wiegand, A survey on hate speech detection using natural language processing, in: Proceedings of the fifth international workshop on natural language processing for social media, 2017, pp. 1–10. [6] S. Modha, T. Mandl, G. K. Shahi, H. Madhu, S. Satapara, T. Ranasinghe, M. Zampieri, Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech, in: FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event, 13th-17th December 2021, ACM, 2021. [7] R. Joshi, R. Karnavat, K. Jirapure, R. Joshi, Evaluation of deep learning models for hos- tility detection in hindi text, in: 2021 6th International Conference for Convergence in Technology (I2CT), IEEE, 2021, pp. 1–5. [8] A. Wani, I. Joshi, S. Khandve, V. Wagh, R. Joshi, Evaluating deep learning approaches for covid19 fake news detection, in: International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, Springer, 2021, pp. 153–163. [9] S. Satapara, S. Modha, T. Mandl, H. Madhu, P. Majumder, Overview of the HASOC Subtrack at FIRE 2021: Conversational Hate Speech Detection in Code-mixed language , in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation, CEUR, 2021. [10] R. Joshi, R. Joshi, Evaluating input representation for language identification in hindi- english code mixed text, arXiv preprint arXiv:2011.11263 (2020). [11] R. Joshi, P. Goel, R. Joshi, Deep learning for hindi text classification: A comparison, in: International Conference on Intelligent Human Computer Interaction, Springer, 2019, pp. 94–101. [12] P. Mishra, M. D. Tredici, H. Yannakoudakis, E. Shutova, Abusive language detection with graph convolutional networks, 2019. arXiv:1904.04073. [13] Z. Waseem, D. Hovy, Hateful symbols or hateful people? predictive features for hate speech detection on Twitter, in: Proceedings of the NAACL Student Research Workshop, Association for Computational Linguistics, San Diego, California, 2016, pp. 88–93. URL: https://aclanthology.org/N16-2013. doi:10.18653/v1/N16-2013. [14] P. Maheshappa, B. Mathew, P. Saha, Using knowledge graphs to improve hate speech detection, in: 8th ACM IKDD CODS and 26th COMAD, CODS COMAD 2021, Association for Computing Machinery, New York, NY, USA, 2021, p. 430. URL: https://doi.org/10.1145/ 3430984.3431072. doi:10.1145/3430984.3431072. [15] Z. Zhang, L. Luo, Hate speech detection: A solved problem? the challenging case of long tail on twitter, 2018. arXiv:1803.03662. [16] M.-A. Rizoiu, T. Wang, G. Ferraro, H. Suominen, Transfer learning for hate speech detection in social media, 2019. arXiv:1906.03829. [17] P. Mathur, R. Sawhney, M. Ayyar, R. Shah, Did you offend me? classification of offensive tweets in Hinglish language, in: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 138–148. URL: https://aclanthology.org/W18-5118. doi:10.18653/v1/W18-5118. [18] L. Gao, R. Huang, Detecting online hate speech using context aware models, in: Pro- ceedings of the International Conference Recent Advances in Natural Language Pro- cessing, RANLP 2017, INCOMA Ltd., Varna, Bulgaria, 2017, pp. 260–266. URL: https: //doi.org/10.26615/978-954-452-049-6_036. doi:10.26615/978-954-452-049-6_036. [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polo- sukhin, Attention is all you need, 2017. arXiv:1706.03762. [20] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, 2019. arXiv:1810.04805. [21] T. Pires, E. Schlinger, D. Garrette, How multilingual is multilingual bert?, 2019. arXiv:1906.01502. [22] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, Albert: A lite bert for self-supervised learning of language representations, 2020. arXiv:1909.11942. [23] D. Kakwani, A. Kunchukuttan, S. Golla, G. N.C., A. Bhattacharyya, M. M. Khapra, P. Kumar, IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilin- gual Language Models for Indian Languages, in: Findings of EMNLP, 2020. [24] T. Wolf, J. Chaumond, L. Debut, V. Sanh, C. Delangue, A. Moi, P. Cistac, M. Funtowicz, J. Davison, S. Shleifer, et al., Transformers: State-of-the-art natural language processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38–45.