Transformer Ensemble System for Detection of Offensive Content in Dravidian Languages B S N V, Chaitaya1 , Karri, Anjali1 1 Indian Institute of Information Technology, SriCity, India Abstract Hate speech is a form of oral, written or physical activity that criticizes or uses derogatory language in correspondence to a person or a community discriminating their identity factors. Hate speech or the use of offensive language can endanger democratic principles and societal stability. The growing usage of social media is also increasing the number of people being affected by hate speech. Online hate speech moderation has been significantly increasing, especially through social media platforms like Facebook, Twitter, YouTube, and Instagram. It is high time to take appropriate actions to curb the intensifying online hate speech by supporting the detection of hate speech or offensive language texts in social media. The work presented to Hate Speech and Offensive Content Identification in Dravidian-CodeMix (HASOC) 2021, a joint assignment under Forum for Information Retrieval Evaluation (FIRE) 2021, is described in this paper. In this paper, we proposed an ensemble system of transformer models (mBERT, DistilBERT and MuRIL) to achieve the task of identifying social media code-mixed comments/posts in Dravidian Languages (Malayalam-English and Tamil-English) as offensive or not- offensive texts. The motivation behind this was to use the power of transformers in combination with ensembling to enhance the prediction quality. For sub-task 2, the proposed ensemble method received 3rd and 6th positions in Malayalam and Tamil languages, respectively. The code is publicly available at https://github.com/chaitnayabasava/HSU_TransEmb. Keywords Hate speech, Offensive Language, BERT Transformers, Ensemble, CodeMix 1. Introduction Social media platforms offer users freedom of expression. Simultaneously, they also bring up new challenges in terms of freedom of expression, speech, and human dignity. Hate speech on the internet is the expression of tensions between various groups and can also have a detrimental impact on society. Hate speech expressed through social media is not inherently different from hate expressed outside, but it could have specific difficulties stemming from its indefiniteness, durability, and anonymity. Hate speech in online venues may persist in many formats across several platforms, and it can be connected multiple times. Counteracting hate speech in the internet world demands more thought and innovative strategies. Social media platforms such as Youtube, Facebook, and Twitter each have algorithms for identifying hate speech. Nonetheless, identifying and classifying hate speech is still a significant issue for social media firms alongside researchers. FIRE 2021: Forum for Information Retrieval Evaluation, December 13-17, 2021, India Envelope-Open viswachaitanya.b16@iiits.in (B. S. N. V. Chaitaya); anjalipoornima.k16@iiits.in (K. Anjali) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) India being a diverse country, most of the Indians mix up different languages with English while communicating. In the multilingual community, code-mixing is common, and code-mixed writings are occasionally produced in non-native scripts. Due to the convenience of using local languages alongside English, code-mixed languages are becoming increasingly popular on different social media platforms. However, ambiguity is introduced by spelling variances and the absence of grammatical standards, making it increasingly arduous to automate text analysis. We can observe a growing demand for offensive language identification, especially on social media messages, which are mostly code-mixed. Many researchers have been looking into varying algorithms for detecting hate speech, and most of the studies concentrated on monolingual text data. But, due to the intricacy of code-mixing, models trained on monolingual data commonly fail when tried on code-mixed data. Therefore, as part of HASOC 2021 [1], we developed a classification model to identify offensive texts in code-mixed Dravidian languages. HASOC 2021 has two sub-tasks and this paper provides the working notes on sub-task 2, which involves categorizing the given code-mixed tweet as offensive or non-offensive. The evaluation metric reported and considered for model selection in this paper is the weighted average F1-score. The competition page and reference document [1] provide further information on the challenges. We organized the rest of the paper as follows: section 2 highlights the relevant work, section 3 details the proposed technique, section 4 depicts the experiments and outcomes, and section 5 concludes the article and summarises our findings. 2. Related Work The task of hate-speech detection is often treated as a text classification task. Using machine learning or deep learning approaches to detect offense, hostility, and hate speech in user- generated content is one of the most effective strategies for combating this problem. As indicated by recent articles, this topic has got a lot of attention recently. Few survey articles that describe significant areas that have been investigated for this task include are as follows. [2] represents a survey covering the important areas that were investigated for employing natural language processing to automatically recognize various types of utterances. [3] looked at strategies for detecting hate speech in social media and separating it from ordinary obscenities. The findings showed that the most difficult part is distinguishing between profanity and hate speech. [4] examined the complexities of the concept of hate speech, which is defined differently across platforms and settings, and offers a unified definition. In the literature, a number of distinct classifiers have been used in various works. [5] was one of the earliest research in the problem of hate speech detection. The authors developed a prototype for detection abusive messages using a decision-tree generator with 47-features corresponding to the syntax and semantics. Later, machine learning classification methods like SVM and logistic regression were used to tackle the task of hate speech detection. For instance, [6] used logistic regression to perform obscenity-related offensive tweets detection. [7] constructed machine learning models like a Support Vector Machine with a linear kernel, and a Random Forest with 100 trees to identify cyber hate for a range of protected traits such as race, disability, and sexual orientation to facilitate the automatic detection of cyber hate online, specifically on Twitter. The feature set used by them included Bag of Words, features obtained Figure 1: The architecture of the proposed transformers ensemble with all the training steps included. The Raw data is firstly pre-processed as explained in section 3.1 and used to train 3.3 the models described in 3.2. The class probabilities from each of these models are further used to build a soft ensemble to predict the final class labels. by identifying hostile words and phrases for hate speech and typed dependencies. Although bag-of-words methods have a high recall rate, they also have a high incidence of false positives. [8] and [9] developed Convolution network-based models to achieve Hate-speech detection. [8] trained different CNN models with different sets of features like 4-grams, word2vec word vectors, word vectors which are randomly generated and combination of word vectors with n-grams. With 78.3% F-score, CNN model with word2vec features vectors performed best. [9] experimented with 16k annotated dataset and used features by coupling the embedding learned by deep neural network models and gradient boosted decision trees. [10] and [11] employed Recurrent Neural networks namely LSTM and BiLSTM. The former authors implemented SVM and LSTM for Hate-speech detection in Italian language, using morpho-syntactic and syntactic features, sentiment polarity, and lexical text features. The later, experimented with Convolutional Networks, BiLSTM and Convolutional Networks with BiLSTM to identify postings indicating the user’s use of medicine. Frequently, a single solution to a complicated problem does not apply to all possible circumstances. As a result, researchers employ ensemble methods to solve such issues. Thus, [11] and [12] addressed this classification task using ensembles with stacked deep learning CNN ensembles and an ensemble of Recurrent Neural Network classifiers respectively. Therefore, taking inspiration of using deep learning techniques and ensembles, in this paper we proposed an ensemble of transformers [13]. The power of pre-trained transformers was harnessed by BERT. BERT is a pre-trained model on unlabelled text corpus which can further be fine-tuned for specific tasks like classification. [14] presented an overall idea of all the methods and results for Offensive Language Identification in Dravidian Languages-EACL 2021. [15] provided an overall idea of the task of hate speech recognition in Tamil, Malayalam, Hindi, English and German as part of the HASOC track at FIRE 2020. The authors of [16] worked to compare different pretrained text embeddings to classify hate speech in Indian Code-Mixed sentences. 3. Methodology To achieve the task of classifying the code-mixed tweet as an offensive or not-offensive tweet, we proposed an ensemble model of transformers. This section elaborates on the dataset and its pre-processing steps, subsequently explaining the ensemble setting. Figure 1 depicts the architecture of the proposed Transformer Ensemble model for classifying offensive tweets using the dataset given by the organizers of HASOC 2021 [1]. 3.1. Pre-Processing The phase of pre-processing is very crucial, especially while working with tweets. Unprocessed tweets are unstructured, often containing redundant information and noise that could mislead predictions. We processed the tweets by transforming them to lower case and subsequently tokenized each tweet. Tokenization converts a tweet into words, punctuation marks, numeric digits, and other symbols. These tokenized tweets were further processed by removing the punctuation’s since they do not add much information to the underlying content. Tweets mostly go with the # and @ handles, which would not help us much in modelling and may lead to biases that ultimately hamper the predictions. We removed digits, URLs, # and @ handles using regex expressions. Emojis and emoticons have become an integral part of our everyday lives and frequently appear in social media texts. We also removed these symbols and characters during pre-processing. The categories of the given dataset are also not uniform. We dropped data points with labels: not-Tamil and not-Malayalam. Finally, we trained different models using cleaned tweets with two labels, namely ’NOT’ and ’OFF’. 3.2. Models To build our ensemble model, we majorly worked with three different transformer-based models, namely multilingual BERT (mBERT), Multilingual Representations for Indian Languages (MuRIL) and Distilled BERT (DistilBERT). [17] has marked the use of transformer models with encoder- decoder blocks using attention maps for long sequence tasks. The goal of transformers is to completely manage the dependencies between input and output using attention maps and recurrent networks. Bidirectional Encoder Representations from Transformers, Google’s BERT [13] has paved the way for a new era of using transfer learning in NLP. This language model is built with a multi-layer bidirectional Transformer encoder along with bi-directional self- attention layers. It enables the users to fine-tune the pre-trained language model to achieve state-of-the-art performance in many NLP-related tasks like question answering, translation, classification, etc. BERT’s pre-training objectives, Masked Language Modelling (MLM) and Next Sentence Prediction, are straightforward yet effective. The MLM masks tokens in the input randomly and, the goal of the model is to predict the masked tokens. The next-sentence prediction makes sure that the model understands the connection between consecutive sentences. Thus these unsupervised pre-training objectives made BERT a powerful pre-trained model for language representations. The original pre-trained models of Google BERT have been trained on lower-cased English text. Since our task was the classification of tweets in code-mixed Dravidian languages, we tried Table 1 Different transformer models have been trained and evaluated using dev and test sets. Weighted F1-score on dev and test sets of each considered model are tabulated here. Among the five models, the top three best performed models (mBERT, MuRIL and DistilBERT) are used to build the soft ensemble model. Model Tamil dev Tamil test Mal dev Mal test mBERT 0.92 0.65 0.76 0.73 distillBERT 0.90 0.63 0.75 0.69 MuRIL 0.92 0.66 0.78 0.73 indicBERT 0.84 0.62 0.70 0.72 xlm-Roberta 0.83 0.63 0.72 0.65 Table 2 The weighted F1-score on dev and test sets of top performing models and the proposed ensemble model (HSU_TransEmb) are tabulated. An increase in the scores for dev set is observed. Model Tamil dev Tamil test Mal dev Mal test mBERT 0.92 0.65 0.76 0.73 distillBERT 0.90 0.63 0.75 0.69 MuRIL 0.92 0.66 0.78 0.73 HSU_TransEmb 0.93 0.65 0.80 0.73 to use other BERT models from HuggingFace [18] that were pre-trained in different languages. The mBERT is the original BERT base model pre-trained on the top 102 languages, including Tamil and Malayalam. The model is pre-trained with the same MLM objective as BERT with the Wikipedia corpus. mBERT develops complex cross-lingual representations that enable language transfer of code-mixed tweets more efficiently. [19] proposed a lighter version of BERT which reduced the number of parameters by 40% preserving 97% of the language representations knowledge and increasing the computation speed by 60%. DistilBERT is a lighter and faster transformer model with a triple loss combining the language modelling, distillation of the BERT base, and cosine-distance. For the proposed ensemble model, we used the multilingual DistilBERT model, having 6 transformer layers with 12 attention heads and 134M parameters in total and is a distilled version of mBERT. [20] proposed MuRIL, a multilingual Language model that was trained specifically on a large corpus of 17 Indian Languages. This model was designed to perform a range of fine-tuned NLP tasks in Indian languages. This model is also trained on transliterated data, which is a regular occurrence in the Indian environment and can help in improving the performance of the classification task in Dravidian languages. 3.3. Training Hugging Face pre-trained transformer models have been used to build models for the fine-tuning task of tweet classification. The outputs of the last hidden layer of the corresponding models are averaged and used as the final feature representation of the tweet. This representation is finally passed through an output layer with output dimensions equal to the number of classes, two. We used a batch size of 32 with a max sequence length of 256 and trained the classifier by monitoring the cross-entropy loss, which increases when the prediction diverges from the ground truth. The dataset provided by Chakravarthi et al. [1] has a slight imbalance issue between the two available classes (’OFFENSIVE’, ’NOT-OFFENSIVE’). The Malayalam dataset has 2047 not- offensive and 1953 offensive tweets whereas, the Tamil dataset has 2020 not-offensive and 1980 offensive tweets. To address this imbalance, we used inverse weighting to penalize the incorrect predictions of the lower-represented class more in the cross-entropy loss function. Finally, we trained the models with a learning rate of 1e-5 for 30 epochs. 3.4. Ensemble of Transformers We employed a voting soft ensemble model for getting the final predictions. As discussed in section 3.3, we fine-tuned each considered model using the provided dataset. The motivation behind using an ensemble voting mechanism is to have a system that combines the outputs of various BERT based models to give the final predictions. The base models were trained using varying amounts of data and transformer layers, resulting in each model identifying different patterns from the text. By using the ensemble setting, we can capture and use these multiple patterns to give the final prediction. This setting has helped us improve the performance above the F1-score of the best performing model amongst the considered one’s. In the proposed soft voting ensemble setting, the prediction probabilities of each model are averaged as shown in eq 1. The final prediction then comes from using eq 2, where 𝑝𝑛𝑜𝑡 𝑖 is the probability of the comment being ’NOT-OFFENSIVE’ predicted by model 𝑖 and 𝑛 is the total number of models considered for the ensemble setting. 𝑛−1 𝑒𝑛𝑠𝑒𝑚𝑏𝑙𝑒 = 1 𝑝𝑛𝑜𝑡 ∑ 𝑝𝑖 (1) 𝑛 𝑖=0 𝑛𝑜𝑡 𝑁 𝑂𝑇 , 𝑒𝑛𝑠𝑒𝑚𝑏𝑙𝑒 > 0.5 if 𝑝𝑛𝑜𝑡 𝑝𝑟𝑒𝑑 = { (2) 𝑂𝐹 𝐹 , else 4. Experiments and Results We considered various transformer-based multilingual models, trained with datasets containing the two languages (Tamil and Malayalam) in focus and fine-tuned them using the provided dataset using the training setup described in section 3.3. To apply the pre-trained models of BERT, we first need to tokenize the input using the Bert Tokenizers. These tokenizers split the input text into tokens and add tokens like [CLS] and [SEP] used to indicate the start and end of sentences. We considered the max length as 256 so that the input sentences are padded or truncated to this length. Lastly, the attention mask is created and returned along with the tokenized input. The classifier is fed the average of features from the last hidden layer of the BERT model and fine-tuned using Adam optimizer with weight decay with a learning rate of 1e-5. We trained the models with a batch size of 32 for 30 epochs each. Table 1 summarizes the individual model’s performance using the weighted F1-score. The three considered models (mBERT, MuRIL & DistilBERT) were the top performers on the dev set in both the Dravidian languages and so were used in the ensemble setting, described in section 3.4. Table 2 compares the weighted F1-score obtained by using the three considered models both individually and in the ensemble setting. We observe that the proposed ensemble model improved the overall performance in both the Dravidian languages on the dev set. But the performance of all the models has deteriorated drastically on the test set, especially in Tamil. We may address this by using a vast dataset that covers varying patterns in the code-mixed text. 5. Conclusion We proposed an ensemble transformer model that utilized various transformers trained on multilingual data to identify hate speech and offensive language in the Dravidian languages, Tamil and Malayalam. The proposed ensemble model was able to outperform the standalone models on the dev set. Yet, the F1-score of all the models is very low on the provided test set. The poor performance may be mainly be attributed to the change in distribution from the train and dev sets. In future work, we will consider using multiple open-sourced Hate speech recognition code-mix datasets along with the provided dataset to cover various possible data patterns. We will also explore the effects of using language-specific LSTM based models like ULMFit [21]. References [1] B. R. Chakravarthi, P. K. Kumaresan, R. Sakuntharaj, A. K. Madasamy, S. Thavareesan, P. B, S. Chinnaudayar Navaneethakrishnan, J. P. McCrae, T. Mandl, Overview of the HASOC-DravidianCodeMix Shared Task on Offensive Language Detection in Tamil and Malayalam, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation, CEUR, 2021. [2] A. Schmidt, M. Wiegand, A Survey on Hate Speech Detection using Natural Language Processing, in: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Association for Computational Linguistics, Valencia, Spain, 2017, pp. 1–10. URL: https://aclanthology.org/W17-1101. doi:1 0 . 1 8 6 5 3 / v 1 / W 1 7 - 1 1 0 1 . [3] S. Malmasi, M. Zampieri, Detecting Hate Speech in Social Media, in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, INCOMA Ltd., Varna, Bulgaria, 2017, pp. 467–472. URL: https://doi.org/10.26615/ 978-954-452-049-6_062. doi:1 0 . 2 6 6 1 5 / 9 7 8 - 9 5 4 - 4 5 2 - 0 4 9 - 6 _ 0 6 2 . [4] P. Fortuna, S. Nunes, A Survey on Automatic Detection of Hate Speech in Text, ACM Computing Surveys (CSUR) 51 (2018) 1 – 30. [5] E. Spertus, Smokey: Automatic Recognition of Hostile Messages, in: Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence, AAAI’97/IAAI’97, AAAI Press, 1997, p. 1058–1065. [6] G. Xiang, B. Fan, L. Wang, J. Hong, C. Rose, Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus, in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, Association for Computing Machinery, New York, NY, USA, 2012, p. 1980–1984. URL: https://doi.org/10. 1145/2396761.2398556. doi:1 0 . 1 1 4 5 / 2 3 9 6 7 6 1 . 2 3 9 8 5 5 6 . [7] P. Burnap, M. L. Williams, Us and them: identifying cyber hate on Twitter across multiple protected characteristics, EPJ Data science 5 (2016) 1–15. [8] B. Gambäck, U. K. Sikdar, Using Convolutional Neural Networks to Classify Hate- Speech, in: Proceedings of the First Workshop on Abusive Language Online, Asso- ciation for Computational Linguistics, Vancouver, BC, Canada, 2017, pp. 85–90. URL: https://aclanthology.org/W17-3013. doi:1 0 . 1 8 6 5 3 / v 1 / W 1 7 - 3 0 1 3 . [9] P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection in tweets, in: Proceedings of the 26th international conference on World Wide Web companion, 2017, pp. 759–760. [10] F. Del Vigna12, A. Cimino23, F. Dell’Orletta, M. Petrocchi, M. Tesconi, Hate me, hate me not: Hate speech detection on facebook, in: Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), 2017, pp. 86–95. [11] D. Mahata, J. Friedrichs, R. R. Shah, et al., # phramacovigilance-Exploring Deep Learning Techniques for Identifying Mentions of Medication Intake from Twitter, arXiv preprint arXiv:1805.06375 (2018). [12] G. K. Pitsilis, H. Ramampiaro, H. Langseth, Detecting offensive language in tweets using deep learning, arXiv preprint arXiv:1801.04433 (2018). [13] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [14] B. R. Chakravarthi, R. Priyadharshini, N. Jose, A. Kumar M, T. Mandl, P. K. Kumaresan, R. Ponnusamy, H. R L, J. P. McCrae, E. Sherly, Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada, in: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics, Kyiv, 2021, pp. 133–145. URL: https://aclanthology.org/2021. dravidianlangtech-1.17. [15] T. Mandl, S. Modha, A. Kumar M, B. R. Chakravarthi, Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German, in: Forum for Information Retrieval Evaluation, 2020, pp. 29–32. [16] S. Banerjee, B. Raja Chakravarthi, J. P. McCrae, Comparison of Pretrained Embeddings to Identify Hate Speech in Indian Code-Mixed Text, in: 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), 2020, pp. 21–25. doi:1 0 . 1 1 0 9 / I C A C C C N 5 1 0 5 2 . 2 0 2 0 . 9 3 6 2 7 3 1 . [17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo- sukhin, Attention is all you need, in: Advances in neural information processing systems, 2017, pp. 5998–6008. [18] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Transformers: State-of-the-Art Natural Lan- guage Processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online, 2020, pp. 38–45. URL: https://www.aclweb.org/anthology/2020.emnlp-demos.6. [19] V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2020. a r X i v : 1 9 1 0 . 0 1 1 0 8 . [20] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P. Aggarwal, R. T. Nagipogu, S. Dave, S. Gupta, S. C. B. Gali, V. Subramanian, P. Talukdar, MuRIL: Multilingual Representations for Indian Languages, 2021. a r X i v : 2 1 0 3 . 1 0 7 3 0 . [21] J. Howard, S. Ruder, Universal language model fine-tuning for text classification, arXiv preprint arXiv:1801.06146 (2018).