HateDetectors at HASOC 2020: Hate Speech Detection using Classical Machine learning and Transfer learning based approaches Varsha Reddya , Surendra Telidevarab Abstract In this paper, we describe our models submitted to the HASOC shared task conducted as part of FIRE 2020. We have presented two directions to approach the problem of hate speech detection which are based on the advancements in the field of natural language processing. We present classical machine learning approaches using Support Vector Machine (SVM) and transfer learning approaches at sentence level using BERT. We have shown through experimental results that Transfer learning based approaches beat Machine learning based approaches in identifying the nuances of hate speech in a sentence. We have performed experiments on the English and Hindi datasets. On the public test dataset provided, we obtain a macro F1 score of 0.90 and 0.67 on English and Hindi languages respectively for subtask A and scores of 0.54 and 0.44 for subtask B. We also observe a difference of around 0.05 macro F1 score between BERT based models and SVM based models on English public test data, while we see only a difference of 0.01 on Hindi public test data in subtask A. We have highlighted the importance of a monolingual model over a multi lingual BERT based model for hate speech detection. We also highlight the importance of having a large, balanced training dataset on model performance for hate speech detection. Keywords Hate Speech Detection, Transfer Learning, Machine Learning, SVM, BERT, CEUR-WS 1. Introduction Social media has become a great platform for communication amongst people living distantly. It is also a platform for expressing one’s ideas and thoughts. The number of users on different social media platforms is increasing day by day and hence they have a wide reach. While social media when used constructively is a wonderful tool, the cases of cyberbullying, harassment, targeted spread of hate and so on, have increased a lot.1,2 Online hate can not only affect the mental health of a person, but can also translate to violence in the real world as well and hence this issue requires attention.3,4 FIRE ’20, Forum for Information Retrieval Evaluation, December 16–20, 2020, Hyderabad, India " varsha.redla@students.iiit.ac.in (V. Reddy); surendra.telidevara@students.iiit.ac.in (S. Telidevara)  © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 https://www.cfr.org/backgrounder/hate-speech-social-media-global-comparisons 2 https://www.hindustantimes.com/analysis/it-is-time-to-regulate-hate-speech-on-social-media/ story-x2JfnAcZ4mh404CM2wQLpO.html 3 https://timesofindia.indiatimes.com/home/science/Hate-posts-on-social-media-can-affect-mental-health/ articleshow/41570101.cms 4 https://www.washingtonpost.com/nation/2018/11/30/how-online-hate-speech-is-fueling-real-life-violence/ Hate speech detection has gained a lot of importance in the NLP community. The HASOC 2020 [1] shared task which is part of FIRE 2020 is an attempt in this direction. It provides a forum for classifying text into hate speech. It has two subtasks, A and B, in three languages: English, Hindi and German. subtask-A is a binary classification task in which we are required to classify the post into hate speech (HOF) and non hate speech (NOT). subtask-B tries to further address the fine tuned versions of hate speech: Hate speech (HOF), Profanity (PRF) and Offensive speech(OFF). The posts which have been classified as HOF in the previous subtask have to be further classified into one of these 3 categories. We have conducted our investigation on the English and Hindi datasets. Simple word based filtering approaches to tackle this problem are not efficient enough as a sentence could only be tagged as hateful based on the context in which that word has been spoken. So, with this simple swear word-based filtering approach, we might end up falsely tagging a lot of sentences hateful. The BERT model [2] which is a pre trained language model based on the neural transformer networks [3], has achieved state of art results on most of the NLP tasks. There are two main pre trained model versions of BERT available: Monolingual models which has been pre trained on a specific language and Multilingual model which has been trained on multiple languages and the hypothesis is that it could be used on any language. Hence we wanted to utulize these pretrained models on this task. We also wanted to experiment the effectiveness of the multilingual model as opposed to the monolingual version. Another aspect we wanted to see was how well these pretrained transformer models perform when compared to classical models like SVM, Logistic regression and so on. We have experimented with Support vector machines with multiple embedding schemes and compared the performance with the BERT based models. The rest of the paper is organized as follows: Section 2 deals with the data description. In Section 3, we present our model and in Section 4, 5 we present our results and discussion. In section 6, we mention our final submitted models and finally in section 7 we conclude. 2. Dataset Description HASOC 2020 has offered two subtasks: Subtask-A: Classification of the post as hate speech (HOF) and non hate speech (NOT). All categories of hate like profanity, offensive language, hate speech are labelled as HOF as part of this task. This is a binary classification task. The English dataset is almost perfectly balanced while the Hindi dataset is imbalanced with only 25% of the data being HOF. For English subtask-A, we have used an additional dataset named OLID [4], proposed by Of- fensEval [5]. This is a dataset for hate speech detection. There are 4400 posts under the label of OFF denoting offensive and 8840 posts categorized as NOT denoting non-hateful posts. Subtask-B: The posts with ’HOF’ label have to be further classified into one of the fine tuned labels: HATE, OFFN, PRFN denoting hate speech, offensive content and profane content respec- tively. The posts with ’NOT’ from the previous subtask-A are labelled as NONE in subtask-B. Here,unlike the case in subtask-A, both English and Hindi datasets were highly imbalanced, with the majority class being NONE in both cases. Additionally in the case of English lan- gauge, PRFN class formed the majority among HATE, PRFN and OFFN classes. Table 1 HASOC 2020 Dataset description subtask-A (Binary Classification) Label English Hindi Dataset Dataset HOF 1856 847 NOT 1852 2116 subtask-B (Multi Class Classification) Label English Hindi Dataset Dataset HATE 158 234 NONE 1852 2116 OFFN 321 465 PRFN 1377 148 The table 1 has the number of posts per label for English and Hindi datasets. 3. Methodology and experiments This section is divided as follows: We first present a common pre-processing pipeline which we follow for both subtasks A and B. We then present our method description for subtask-A and then describe our methodology for subtask-B. 3.1. Pre-processing The dataset has been extracted from social media sites like twitter. So, it is expected that it would have a lot of social media jargon and hence contains unconventional social media style of writing. It is essential to pre-process the dataset before passing it on to the models. The following is the pre-processing pipeline we followed for English: 1. URL Removal: All the urls have been replaced with a special token. 2. User Mention Removal: All the user mentions have been replaced with another special token. 3. Conversion of emojis and emoticons into corresponding text. 4. Expansion of hashtags, removal of stop words and unnecessary symbol, case folding. 5. Map all elongations to the same word. (For example "yeah", "yeahhhh", "yeeaahhh" into "yeah") For Hindi we additionally stem the words and we did not perform step-5. All other steps were performed. We have used the NLTK [6] library in python to perform some of the above tasks. 3.2. Subtask-A We have experimented with two kinds of approaches: 1. Classical Machine Learning Approaches using SVM 2. Transfer Learning Approaches using Neural BERT 3.2.1. SVM Based Model We perform all further processing on the pre-processed data. There are two steps in this: 1. Sentence Encoding: The resultant text after pre-processesing has to be converted into vector form for the classification models to train upon. The output of this step is a vec- tor which tries to represent the sentence. For this, we had experimented with different sentence encoding mechanisms: a) Universal Sentence Encoder [7] (Used only for English dataset) We will use the ab- breviation USE to denote Universal Sentence Encoder embedding scheme. We used the model available on TF Hub for Universal Sentence Encoder.5 b) Averaging the word level fasttext [8] embeddings in the sentence. We will call this encoding scheme Avg Fasttext in this paper.6 7 c) TfIdf Vectorization. We will call this encoding scheme TfIdf vect in this paper. 2. Classification: We have experimented with SVM [9] classifier provided by scikit-learn [10] library in python. For the C value in parameter, we performed grid search on values from 0.0001 to 100, with a jump per interval of 10. We have used 5-fold cross validation to come up with the best model for the task. The kernel used was radial basis function (’rbf’) [11]. We have experimented with two modes of loss calculation. First mode is the standard loss calculation. Second mode is giving more weightage to class with lesser number of samples in the loss. The weightage given is inversely proportional to the number of samples in that class. We will call this mode 𝑆𝑉 𝑀𝑏 in this paper. While trying the above machine learning approaches on the Hindi dataset, we experimented on two forms of data: 1. Original devanagari form 2. Transliterated form: We performed transliteration on the Devanagari form to convert it to the standard Roman form. We have used mylanguages website 8 to perform the transliteration. 3.2.2. BERT Based Model The pre-processing pipeline followed before performing the experiments mentioned in this section is same as that mentioned in Section 3.1. We describe our approach below: 1. English: For English, we have used the monolingual pre-trained cased BERT model [2] to get the sentence encoding. We will call this model 𝐵𝐸𝑅𝑇𝑚𝑜𝑛𝑜 in this paper. The clas- sifier used is a dense layer, with the output dimension as 1, representing the label. We used an additional dataset named OLID, to boost up the number of training samples. 5 https://tfhub.dev/google/universal-sentence-encoder/1 6 Here, embedding themselves might have been pre-trained using deep learning based approaches, but these have been reported under the encoding for the machine learning approaches as the title represents the nature of the classification algorithm used and not the embedding training approach. 7 We are just using USE/Avg Fasttext as embedding schemes. We did not fine tune these architectures and hence we did not place these mechanisms in the Transfer Learning section. 8 http://mylanguages.org/devanagari_romanization.php We appended the OLID dataset to the 2020 English subtask-A dataset to obtain the final dataset. We took the pre-trained BERT model and fine tuned it with this final dataset. 2. Hindi: For Hindi, we have used the multilingual cased pre-trained BERT model 9 to get the sentence encoding. We will call this model 𝐵𝐸𝑅𝑇𝑚𝑢𝑙𝑡𝑖 in this paper. The classifier again is a dense layer with the output dimension as 1, denoting the label itself. We took this pre-trained multilingual model and fine tuned it on the 2020 Hindi dataset for subtask-A. 3.3. Subtask-B For subtask-B, the preprocessing pipeline is same as the one described in Section 3.1. The dataset for this subtask was pretty imbalanced. There were two strategies which we had tried, second one adding on to the first strategy. We follow the same approaches for English and Hindi. The sentence encoding schemes remain same as the ones mentioned in Section 3.2.1. 1. Approach 1: In this approach, we do not consider it as an extension to subtask-A. We simply take the encoded data and its corresponding label for subtask-B, for training, and fit an SVM on this data. This would translate to multiclass classification with 4 labels: NONE, HATE, PRFN and OFFN. We can see here that this classification is independent of the previous task’s output. 2. Approach 2: In this approach, we first filter out all the non-NONE labelled data from the training data for subtask-B and fit an SVM on it. Henceforth, we will call this model SVM_1 in the paper. Now we take the first subtask’s output and directly label the ’NOT’ predictions from the first task as ’NONE’. For the remaining posts, we predict using the SVM_1 model. 4. Results HASOC organizers have released public test data with actual ground truth labels this year. The final leaderboard ranks were decided based on a private test dataset. All the results reported in Tables 2-5 are on the 2020 public test data. We included macro precision, macro recall and macro F1 score in our results. Macro values were calculated by averaging the corresponding value for all the classes. Organizers of HASOC have used macro-F1 as the primary metric in deciding the leaderboard positions. Table 6 shows the performance of our model on the private test data. The results of Hindi subtask-B were not announced at the time of submission of this paper. For English subtask-A, results labelled with OLID+2020 denote models trained on 2020 HASOC dataset appended with OLID dataset as mentioned in the methodology section. The results without that label are trained on the 2020 HASOC dataset only. The label trans in Hindi results denotes that the model has been trained on the transliterated version of the dataset. All the results for subtask-B are based on the two approaches described in section 3.3. 9 https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip Table 2 English Subtask-A results on public test data Machine Learning Approaches Method Macro Recall Macro Preci- Macro F1 sion score USE + SVM𝑏 0.86 0.86 0.86 USE + SVM 0.85 0.85 0.85 USE + SVM𝑏 (OLID+2020) 0.86 0.86 0.86 USE + SVM (OLID+2020) 0.85 0.86 0.86 Avg Fasttext + SVM𝑏 0.87 0.87 0.86 Avg Fasttext + SVM 0.87 0.87 0.86 TfIdf Vect + SVM𝑏 0.86 0.85 0.85 TfIdf Vect + SVM 0.87 0.86 0.85 TfIdf Vect+SVM𝑏 (𝑂𝐿𝐼 𝐷 + 2020) 0.88 0.88 0.88 TfIdf Vect+SVM(OLID+2020) 0.87 0.86 0.85 Transfer Learning Approaches 𝐁𝐄𝐑𝐓𝐦𝐨𝐧𝐨 (OLID+2020) 0.87 0.93 0.90 BERT𝑚𝑢𝑙𝑡𝑖 (OLID+2020) 0.87 0.88 0.88 BERT𝑚𝑜𝑛𝑜 0.89 0.90 0.89 BERT𝑚𝑢𝑙𝑡𝑖 0.88 0.84 0.88 Table 3 English Subtask-B results on public test data Attempt-1 (Multi class Classification with NONE label) Method Macro Recall Macro Preci- Macro F1 sion score TfIdf Vect + SVM𝑏 0.73 0.50 0.51 TfIdf Vect + SVM 0.61 0.47 0.45 𝐔𝐒𝐄 + 𝐒𝐕𝐌𝐛 0.53 0.55 0.54 USE + SVM 0.77 0.47 0.46 Avg Fasttext + SVM𝑏 0.54 0.58 0.54 Avg Fasttext + SVM 0.56 0.46 0.43 Attempt-2 (Filter NONE label) TfIdf Vect + SVM𝑏 0.58 0.51 0.51 TfIdf vect + SVM 0.48 0.45 0.42 USE + SVM𝑏 0.53 0.53 0.53 USE + SVM 0.74 0.48 0.48 Avg Fasttext + SVM𝑏 0.51 0.50 0.51 Avg Fasttext + SVM 0.52 0.47 0.44 5. Discussion An important thing to notice in the case of BERT models is that when there is monolingual model available, it is outperforming the multilingual model. We saw a difference of around 0.012 macro F1 score between both the models. The multilingual model though turns out to be better than the classical SVM model. We saw a difference of around 0.02 macro F1 score again Table 4 Hindi Subtask-A results on public test data Machine Learning Approaches Method Macro Recall Macro Preci- Macro F1 sion score Avg Fasttext + SVM𝑏 0.66 0.68 0.66 Avg Fasttext + SVM 0.81 0.56 0.53 TfIdf Vect + SVM𝑏 0.67 0.64 0.65 TfIdf Vect + SVM 0.73 0.54 0.50 Avg Fasttext + SVM𝑏 (𝑡𝑟𝑎𝑛𝑠) 0.56 0.56 0.56 Avg Fasttext + SVM (trans) 0.35 0.50 0.41 TfIdf Vect + SVM𝑏 (𝑡𝑟𝑎𝑛𝑠) 0.70 0.65 0.66 TfIdf Vect + SVM (trans) 0.77 0.52 0.46 Transfer Learning Methods 𝐁𝐄𝐑𝐓𝐦𝐮𝐥𝐭𝐢 0.68 0.67 0.67 Table 5 Hindi Subtask-B results on public test data Attempt-1 (Multi class classification with NONE label)) Method Macro Recall Macro Preci- Macro F1 sion score TfIdf Vect + SVM𝑏 0.57 0.36 0.39 TfIdf Vect + SVM 0.69 0.28 0.27 Avg Fasttext + SVM𝑏 0.39 0.38 0.39 Avg Fasttext + SVM 0.19 0.25 0.21 Attempt-2 (Filter NONE label) 𝐓𝐟𝐈𝐝𝐟 𝐕𝐞𝐜𝐭 + 𝐒𝐕𝐌𝐛 0.56 0.41 0.44 TfIdf Vect + SVM 0.65 0.37 0.37 Avg Fasttext + SVM𝑏 0.41 0.46 0.42 Avg Fasttext + SVM 0.46 0.36 0.33 between the multilingual BERT model and the SVM model. This can be seen in Table 2. We can see from the results of subtask-A that BERT model has performed better than SVM model for both languages. We could see significant improvement in macro F1 score between 𝑆𝑉 𝑀𝑏 model and the SVM model in cases where there was data imbalance. In subtask-B where the dataset is heavily imbalanced and is very small, we could see improvement upto 0.12 macro F1 score, which is quite significant. In subtask-A, there wasn’t any significant improvement for English as the dataset was already balanced. There was an improvement upto 0.15 macro F1 score for Hindi subtask-A which is expected due to the imbalance in the dataset. From Table 5 we can see that there is an improvement of 0.12 macro F1 score between SVM models in attempt-1 and attempt-2. We see that the second approach has indeed reduced the bias towards the majority class [NONE] and has improved the overall Macro F1 score. We could see a significant number of samples now getting classified into classes other than ‘NONE’. From experimental analysis, we could see that usage of additional data for English Subtask-A Table 6 Performance on private test data Subtask-A Language Macro F1 Macro F1 of the best Leaderboard performing model position English 0.4981 0.5152 15th Hindi 0.5129 0.5337 10th Subtask-B English 0.2299 0.2652 17th Hindi 0.2272 0.3345 13th did help in improving the macro F1 score. As we can see in Table 2 there was an improvement of 0.01 macro F1 score in the case of BERT and 0.03 macro F1 score in the case of TfIdf Vect + 𝑆𝑉 𝑀𝑏 when we additionally use the OLID dataset. 6. Submitted models We have submitted two runs for each subtask for English and Hindi. For English subtask- A, we submitted USE+𝑆𝑉 𝑀𝑏 model and the 𝐵𝐸𝑅𝑇𝑚𝑜𝑛𝑜 (OLID+2020) model described in the previous sections. For Hindi subtask-A, we submitted 𝐵𝐸𝑅𝑇𝑚𝑢𝑙𝑡𝑖 based model and the Avg Fasttext model described above. For subtask-B, we submitted approaches 1 and 2 described in section 3.3 as part of the two runs for both the languages. For subtask-B, we submitted models with USE sentence encodings for English and for Hindi we submitted models with TfIdf sentence encoding scheme. 7. Conclusion We have described two main directions for the hate speech detection problem: Classical ma- chine learning approach and Transfer learning based approach. We have shown through ex- periments that BERT based models pre-trained on large English corpora and fine tuned on the dataset available yield good results. We have achieved a macro F1 of 0.90 and 0.67 for English and Hindi subtask A respectively which is 0.05 and 0.01 more than the best performing SVM counterpart. The proposed model performed reasonably well, it was off by very small margins compared to the best performing models in almost all cases. It was off by 0.01 and 0.02 F1 score for English and Hindi respectively in subtask A and 0.03 and 0.11 F1 score in subtask B. The poor performance in subtask-B could be attributed to the fact that the dataset was small and imbalanced. Furthermore, the classes individually being very fine grained makes it difficult for models to capture. We should try to build more robust models which capture these subtleties between classes and also possibly try to build a more balanced and larger dataset in order to improve the efficiency of the hate speech detection systems. Various approaches like LSTM, CNN, LSTM-Attention, etc can also be tried. Comparing the efficiency of these approaches with BERT and SVM based models is left to future work. We plan to experiment with BERT based architectures for subtask-B as part of future study. We hope our experimentation and analysis helps in tackling the problem of hate speech detection in some way. References [1] T. Mandl, S. Modha, G. K. Shahi, A. K. Jaiswal, D. Nandini, D. Patel, P. Majumder, J. Schäfer, Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Iden- tification in Indo-European Languages), in: Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, CEUR, 2020. [2] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo- sukhin, Attention is all you need, Advances in neural information processing systems 30 (2017) 5998–6008. [4] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Predicting the type and target of offensive posts in social media, arXiv preprint arXiv:1902.09666 (2019). [5] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval), arXiv preprint arXiv:1903.08983 (2019). [6] E. Loper, S. Bird, Nltk: the natural language toolkit, arXiv preprint cs/0205028 (2002). [7] D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo- Cespedes, S. Yuan, C. Tar, et al., Universal sentence encoder, arXiv preprint arXiv:1803.11175 (2018). [8] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics 5 (2017) 135– 146. [9] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, B. Scholkopf, Support vector machines, IEEE Intelligent Systems and their Applications 13 (1998) 18–28. [10] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Jour- nal of Machine Learning Research 12 (2011) 2825–2830. [11] M. J. Orr, et al., Introduction to radial basis function networks, 1996.