Applying Transfer Learning using BERT-based models for Hate Speech Detection Sakshi Kalraa , Kalit Naresh Inania , Yashvardhan Sharmaa and Gajendra Singh Chauhanb a Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Pilani Campus, Rajasthan, India b Department of Humanities and Social Sciences, Birla Institute of Technology and Science Pilani, Pilani Campus, Rajasthan, India Abstract Hateful and Offensive speech is rising along with social media. This issue has motivated researchers to devise novel approaches which perform better than the traditional algorithms. This paper presents the methods adopted by the BITS Pilani team for Subtask 1A of the Hate Speech and Offensive Content Identification in English and Indo-Aryan Language task proposed by the Forum of Information Retrieval Evaluation in 2021. We have used data augmentation to make the models generalize better. We have experimented with different feature extraction techniques along with machine learning algorithms. But, fine-tuning the pre-trained BERT-based models using transfer learning gave us the best results for all the given languages on the test set. We got the highest Macro-F1 of 0.7993 for the English Language, 0.7612 for the Hindi Language, and 0.8306 for the Marathi Language using the pre-trained BERT-based models. Keywords offensive language detection, hate speech, label classification, BERT-variants, HASOC 1. Introduction Over the past years, the user base of social media platforms and online forums has grown exponentially. Every day around 500 million tweets are generated. People use these platforms to express their views and share other relevant information. But, since people come from different backgrounds, sometimes they might get hit by hateful, offensive, and objectionable speech. These issues arise due to the platform’s anonymity allowing people to use racist, fanatic, and offensive terms in their speech. If unchecked, this poses a significant threat to society. As a consequence, social media platforms need to monitor all their user posts. But, detecting and removing offensive speech by humans would require tremendous effort. Thus, a need arises to automate this task using modern machine learning and natural language processing algorithms. Toxic speech has two parts: hate and offensive speech. According to the UN, hate speech could be defined as ”any communication in speech, writing or behavior, that attacks or Forum for Information Retrieval Evaluation, December 13-17, 2021, India Envelope-Open p20180437@pilani.bits-pilani.ac.in (S. Kalra); f20180207@pilani.bits-pilani.ac.in (K. N. Inani); yash@pilani.bits-pilani.ac.in (Y. Sharma); gsc@pilani.bits-pilani.ac.in (G. S. Chauhan) GLOBE https://www.bits-pilani.ac.in/pilani/yash/profile (Y. Sharma); https://www.bits-pilani.ac.in/pilani/yash/profile (G. S. Chauhan) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) uses pejorative or discriminatory language concerning a person or a group based on who they are, in other words, based on their religion, ethnicity, nationality, race, color, descent, gender or other identity factors” while offensive speech could be defined as ”causes someone to feel resentful, upset, or annoyed.” Finding common patterns in such text as tricky as people from different geographical and cultural backgrounds use it differently. Online communities, social media enterprises, and technology companies are investing heavily and encouraging research in this area by organizing tasks and workshops. One such community is FIRE, actively managing the HASOC tasks since 2019[1]. This paper contains details regarding - Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages (SubTask A). The task is aimed at classifying a user tweet as either hate and offensive or not. We show the superiority of applying transfer learning on pre-trained BERT models over traditional machine learning algorithms. 1.1. Key Contributions 1. This paper shows the application of transfer learning by using pre-trained BERT models for hate and offensive speech detection. 2. The dataset used for the task was obtained by joining the data provided by the HASOC team for the year 2021 with the past two years’ data. This would make our model generalize better. 3. Before feeding the data into our models, appropriate text processing techniques like lemmatization, removing stop words, removing mentions, URLs, etc. have been per- formed. 4. For feature extraction, techniques like TF-IDF weightings as well as word embeddings representations are used. These extracted features were then fed into machine learning algorithms namely logistic regression, random forest, and support vector classifier. 5. We have compared the BERT-based models with other machine learning models which use traditional natural language processing approaches for feature extraction. From the comparative study, it can be concluded that fine-tuned BERT-based models are the best suited for the above task. 2. Related Work Various machine learning and deep learning approaches have been tested for automated hate and offensive speech detection[2]. The majority of the traditional machine learning approaches use feature extraction from speech text like a bag of words, n-grams, lexical and linguistic features[3]. Recently, word embedding methods have also been proposed for such tasks[4]. But these approaches fail to capture the entire context of the speech. Today, deep learning approaches[5] are gaining popularity in text classification, sentiment analysis, language mod- eling, machine translation, and many more. Some of these approaches are Convolutional Neural Networks(CNNs)[6], Recurrent Neural Networks(RNNs) [7], Long Short-Term Mem- ory(LSTMs)[8], and the most recent is a transformer-based architecture [9] namely Bidirectional Encoder Representations(BERT)[10]. In [11] the authors provide a concise outline of the three shared tasks raised at the PAN 2021 lab on computerized text forensics and stylometry aided at the CLEF conference. The undertakings include authorship confirmation across domains, creator profiling for discourse spreaders, and style change disclosure for multi-writer documents. To a limited extent, they continue and advance past shared tasks, with the general objective of promoting state-of-the-art, accommodating an objective evaluation on recently created benchmark datasets. In [12] author uses various machine learning algorithms based on regression and classification as per the task requirement is to classify the hate speech and offensive words in the code-mixed language. Feature extraction is done using TF-IDF and n-grams based models on the dataset collected from the HASOC 2020 task, consisting of the Malayalam and Tamil Languages records, and got the F1-score of 0.77 for the Malayalam and 0.87 for the Tamil language. One more work was reported [13]for detecting the hate speech words on Twitter. The deep convolutional neural network model has been incorporated along with the GloVe embedding vectors to understand semantics. The results show that their model outperformed the existing models by achieving a F1-score of 0.92. 3. Proposed Techniques and Algorithms The paper describes various approaches and draws out a comparison between them. The first approach extracts N-grams features from the preprocessed text and are weighted according to TF-IDF values. Then, models using machine learning algorithms are trained upon these features. The second approach uses word embeddings for numeric word representations, and models implementing machine learning algorithms are trained similarly to the previous method. Finally, a pre-trained BERT model is employed for training. This BERT model with twelve layers is trained on a large corpus of English data in a self-supervised way. This means it is trained on the raw texts only, with no humans labeling them in any way with an automatic process to generate inputs and labels from those texts. As a result, it tends to provide a better generalization than models trained from scratch. For the model adaptation for our task, it is fine-tuned and trained upon the dataset provided. Along with this, variations of BERT models like RoBERTa [14] and DistilBERT [15] have been used. Multilingual models are used for training and to classify the data from the languages such as Hindi and Marathi. Figure 1 describes the complete methodology we adopted for our experiments. The code is available in the github repository. 1 4. Dataset We performed data augmentation to make our models generalize better on new data. Thus, the dataset used for training was created by combining the organizers’ datasets for HASOC 2021[16, 17], 2020, and 2019. Due to the unavailability of datasets from previous years, only data provided for HASOC 2021 was used for the Marathi language. The combined dataset consists of labeled tweets with the following classes: 1 https://github.com/Kalra-Sakshi/HASOC-Subtask-1.git Logistic @narendramodi, you are Regression #NotMyPrimeMinister anymore. Your egoistic and populistic ways have no place in #Delhi. TF-IDF representations Millions are dying Random because of your inactions Forest and you are focused on ruining Delhi’s heritage. Classifier Get out of my city! Word embedding #ResignModi https://t.co/GcW74Ccpyz representations HOF Support notmyprimeminist er anymore Vector egoistic populistic Google News Classifier NOT way place delhi GloVe Text million dying Pretrained (100 dim) Preprocessing inaction focused Embeddings ruining delhi heritage get city resignmodi Convert text to Tokenizer Pre-trained Fine lowercase, remove mentions BERT model Tuning and links, remove (12-layers) Classifier stop words, apply Tokenize text, add lemmatization [CLS] and [SEP] tokens, map tokens to BERT-base Distil-BERT Multilingual-BERT RoBERTa their IDs, pad/truncate to a fixed length Figure 1: Flowchart of our methodology and techniques Table 1 Dataset Statistics Language Train Data Test Data English 14556 1281 Hindi 13540 1532 Marathi 1874 625 • Non-Hate-Offensive(NOT) - Tweets with this label does not contain any Hate speech, profane, offensive content. • Hate and Offensive(HOF) - Tweets with this label includes Hate, offensive, and profane content. Along with this, each tweet is labeled with a HASOC ID provided by the organizers. Table 1 shows the statistics of the dataset used after concatenation of data from previous years. 5. Experimental Work 5.1. Machine Learning Algorithms using TF-IDF Representations Firstly, the given tweets are preprocessed before the feature extraction part. For the English language, we convert the tweets to lowercase and remove the extra spaces, URLs, Twitter mentions, stopwords, and tokenize them using the functions available in the ’NLTK’ package. Preprocessing is done for the Hindi and Marathi languages using the ’iNLTK’ package [18] Table 2 An overview of hyper parameter setting used Hyperparameter Description Optimizer Adam Learning Rate 2e-5 Number of Epochs 8 Batch Size 32 similarly. Then, n-gram TF-IDF features are extracted. Here, n is a variable that ranges from one to three. Here, we use three machine learning algorithms: Logistic Regression, Support Vector Classifier, and Random Forest Classifier available in the ’scikit-learn’package 2 [19]. While training, a 5-fold grid search is performed to find the best set of hyperparameters. Logistic Regression gave the best performance for the English language, and the Support Vector Classifier worked well for the other two languages. 5.2. Machine Learning Algorithms using Word Embedding Representations Here, we have experimented with two types of word embedding representations. One of them is the GloVe(100 D) embeddings. GloVe [20] is an unsupervised learning algorithm for obtaining vector representations for words. Model Training is performed on aggregated global word- to-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. The other approach is using Google News pre-trained Word2Vec model. Next, the mean of all the individual embeddings of words in a tweet is taken to get the numeric vector representation of the text. Then, we trained each model, using machine learning algorithms discussed above, on these vector features. A 5-fold grid search is used to get the best set of hyperparameters. Support Vector Classifier gave the best performance for all the languages. 5.3. BERT and its Variations We experimented with various pre-trained transformer-based models provided by the Hugging Face Package [21]. We experimented with the following models for the English language: bert- base-uncased, roberta-base and distilbert-base-uncased. The following models were tested for Hindi and Marathi: bert-base-multilingual-cased, distilbert-base-multilingual-cased and indic- bert. The tweets were tokenized using transformer specific tokenizers. Then, a transformer specific model was used for sequence classification. Hyperparameter tuning is done to get the best results. Table 2 lists all the hyperparameters used while model training. 2 https://scikit-learn.org/stable/ Table 3 Evaluation metrics for English using n-gram TF-IDF based models Model Macro-F1 Accuracy Logistic Regression 0.7477 77.28% Random Forest Classifier 0.7371 75.09% Support Vector Classifier 0.7443 76.73% Table 4 Evaluation metrics for English using word embedding representation models Model Macro-F1 Accuracy LR + GloVe Embeddings 0.6816 68.93% SVC + GloVe Embeddings 0.6963 71.19% LR + Google News Embeddings 0.7097 71.97% SVC + Google News Embeddings 0.7323 74.39% Table 5 Evaluation metrics for English by finetuning BERT-based models Model Macro-F1 Accuracy RoBERTa-base 0.7874 81.10% BERT-base 0.7993 81.34% DistilBERT-base 0.8065 82.51% Table 6 Evaluation metrics for Hindi subtask Model Macro-F1 Accuracy SVC + n-gram TF-IDF 0.7013 76.57% BERT multilingual base 0.7505 79.76% DistilBERT multilingual base 0.7612 80.35% 6. Results and Evaluations All model performances are evaluated on the basis of Macro F1 and Accuracy. In the first approach using TF-IDF word representations, the Support Vector Classifier model performed the best compared to the other two algorithms. Similarly, for the model using word embeddings, the Support Vector Classifier performed well. We could not use the word embedding model for Hindi and Marathi due to library limitations. For all the languages, the pre-trained BERT-based models performed better than the feature extraction approaches. Though, all BERT-based variations seemed to give similar performance. DistilBERT multilingual performed slightly better than the BERT-multilingual base for the Hindi language, while it was the opposite case for the Marathi language. The results are tabulated in Tables 3, 4, 5, 6 and 7. Table 7 Evaluation metrics for Marathi subtask Model Macro-F1 Accuracy SVC + n-gram TF-IDF 0.7249 76.48% IndicBERT 0.7505 80.48% DistilBERT multilingual base 0.8072 83.04% BERT multilingual base 0.8305 85.12% 7. Conclusions and Future Work We can see from the above results that the pre-trained BERT models are better and able to capture the context of a given sentence and thus provide better representation for learning. Therefore, the transfer learning approach on pre-trained BERT models is better suited for identifying hate and offensive speech than the traditional feature extraction approaches. For the future scope, the performance for the Indian languages, namely - Hindi and Marathi, could be improved by using better word tokenization with specific tokens for Indian language words. The low performance on the Marathi language may be due to the limited data compared to the other two languages. Thus, models can be trained on a larger corpus in the future. Moreover, deeper transformer architectures may be tried out in the future. References [1] P. Mehta, T. Mandl, P. Majumder, S. Gangopadhyay, Report on the fire 2020 evaluation initiative, in: ACM SIGIR Forum, volume 55, ACM New York, NY, USA, 2021, pp. 1–11. [2] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the problem of offensive language, in: Proceedings of the International AAAI Conference on Web and Social Media, volume 11, 2017. [3] A. Gaydhani, V. Doma, S. Kendre, L. Bhagwat, Detecting hate speech and offensive language on twitter using machine learning: An n-gram and tfidf based approach, arXiv preprint arXiv:1809.08651 (2018). [4] R. Kshirsagar, T. Cukuvac, K. McKeown, S. McGregor, Predictive embeddings for hate speech detection on twitter, arXiv preprint arXiv:1809.10644 (2018). [5] P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection in tweets, in: Proceedings of the 26th international conference on World Wide Web companion, 2017, pp. 759–760. [6] Z. Zhang, L. Luo, Hate speech detection: A solved problem? the challenging case of long tail on twitter, Semantic Web 10 (2019) 925–945. [7] G. K. Pitsilis, H. Ramampiaro, H. Langseth, Effective hate-speech detection in twitter data using recurrent neural networks, Applied Intelligence 48 (2018) 4730–4742. [8] A. Bisht, A. Singh, H. Bhadauria, J. Virmani, et al., Detection of hate speech and offensive language in twitter data using lstm model, in: Recent trends in image and signal processing in computer vision, Springer, 2020, pp. 243–264. [9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo- sukhin, Attention is all you need, in: Advances in neural information processing systems, 2017, pp. 5998–6008. [10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [11] J. Bevendorff, B. Chulvi, G. L. D. L. Peña Sarracén, M. Kestemont, E. Manjavacas, I. Markov, M. Mayerl, M. Potthast, F. Rangel, P. Rosso, et al., Overview of pan 2021: Authorship verification, profiling hate speech spreaders on twitter, and style change detection, in: In- ternational Conference of the Cross-Language Evaluation Forum for European Languages, Springer, 2021, pp. 419–431. [12] V. Pathak, M. Joshi, P. Joshi, M. Mundada, T. Joshi, Kbcnmujal@ hasoc-dravidian-codemix- fire2020: Using machine learning for detection of hate speech and offensive code-mixed social media text, arXiv preprint arXiv:2102.09866 (2021). [13] P. K. Roy, A. K. Tripathy, T. K. Das, X.-Z. Gao, A framework for hate speech detection using deep convolutional neural network, IEEE Access 8 (2020) 204951–204962. [14] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019). [15] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019). [16] T. Mandl, S. Modha, G. K. Shahi, H. Madhu, S. Satapara, P. Majumder, J. Schäfer, T. Ranas- inghe, M. Zampieri, D. Nandini, A. K. Jaiswal, Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Lan- guages, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation, CEUR, 2021. URL: http://ceur-ws.org/. [17] S. Gaikwad, T. Ranasinghe, M. Zampieri, C. M. Homan, Cross-lingual offensive language identification for low resource languages: The case of marathi, in: Proceedings of RANLP, 2021. [18] G. Arora, inltk: Natural language toolkit for indic languages, arXiv preprint arXiv:2009.12534 (2020). [19] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine Learning research 12 (2011) 2825–2830. [20] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543. [21] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al., Huggingface’s transformers: State-of-the-art natural language processing, arXiv preprint arXiv:1910.03771 (2019). A. Online Resources The implementation of different pre-trained BERT-models are available at • Huggingface.