HASOCOne@FIRE-HASOC2020: Using BERT and Multilingual BERT models for Hate Speech Detection Suman Dowlagara , Radhika Mamidia a International Institute of Information Technology - Hyderabad (IIIT-Hyderabad), Gachibowli, Hyderabad, Telangana, India, 500032 Abstract Hateful and Toxic content has become a significant concern in today’s world due to an exponential rise in social media. The increase in hate speech and harmful content motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. In this task, we propose an approach to automatically classify hate speech and offensive content. We have used the datasets obtained from FIRE 2019 and 2020 shared tasks. We perform experiments by taking advantage of transfer learning models. We observed that the pre-trained BERT model and the multilingual-BERT model gave the best results. The code is made publically available at https://github.com/suman101112/hasoc-fire-2020 Keywords Hate speech, offensive content, label classification, transfer learning, BERT 1. Introduction Nowadays, people are frequently using social media platforms to communicate their opinions and share information. Although the communication among users can lead to constructive conversa- tions, the people have been increasingly hit by hateful and offensive content due to these platforms’ anonymity features. It has become a significant issue. The threat of abuse and harassment made many people stop expressing themselves. According to the Cambridge dictionary, Hate speech and offensive content is defined as, • To harass and cause lasting pain by attacking something uniquely dear to the target. • To use words that are considered insulting by most people. The main obstacle with hate speech is, it is difficult to classify based on a single sentence because most of the hate speech has context attached to it, and it can morph into many different shapes depending on the context. Another obstacle is that humans cannot always agree on what can be classified as hate speech. Hence it is not very easy to create a universal machine learning algorithm that would detect it. Also, the datasets used to train models tend to "reflect the majority view of the people who collected or labeled the data". To deal with the above scenarios and to encourage research on hate speech and offensive content, the NLP community organized several tasks and workshops such as Task 12: OffensEval 2: Multilin- gual Offensive content identification in Social Media text 1 , OSATC4 shared task on offensive content FIRE ’20, Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India. ~ suman.dowlagar@research.iiit.ac.in (S. Dowlagar); radhika.mamidi@iiit.ac.in (R. Mamidi)  0000-0001-8336-195X (S. Dowlagar) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 1 https://sites.google.com/site/offensevalsharedtask/ detection 2 . Similarly, the FIRE 2020’s HASOC shared task was devoted to the Hate Speech and Of- fensive Content Identification in Indo-European Languages. This task aims to classify the given an- notated tweets. This paper presents the state-of-the-art BERT transfer learning models for automated detection of hate speech and offensive content. The paper is organized as follows. Section 2 provides related work on hate speech and offensive content detection. Section 3 describes the methodology used for this task. Section 4 presents the experimental setup and the performance of the model. Section 5 concludes our work. 2. Related Work Machine learning and natural language processing approaches have made a breakthrough in detect- ing hate speech on web platforms. Many scientific studies have been dedicated to using Machine Learning (ML) [1, 2] and Deep Learning (DL) [3, 4] methods for automated hate speech and offensive content detection. The features used in traditional machine learning approaches are word-level and character-level n-grams, etc. Although supervised machine learning-based approaches have used different text mining-based features such as surface features, sentiment analysis, lexical resources, linguistic features, knowledge-based features, or user-based and platform-based metadata, they ne- cessitate a well-defined feature extraction approach. Nowadays, the neural network models apply text representation and deep learning approaches such as Convolutional Neural Networks (CNNs) [5], Bi-directional Long Short-Term Memory Networks (LSTMs) [6], and BERT [7] to improve the performance of hate speech and offensive content detection models. 3. Methodology Here, we use the pre-trained BERT transformer model for hate speech and offensive content detec- tion. Figure 1 depicts the abstract view of BERT model that is used for hate speech detection and offensive language identification. Bidirectional Encoder Representations from Transformers (BERT) is a transformer Encoder stack trained on the large English corpus. It has 2 models, 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒 and 𝐵𝐸𝑅𝑇𝑙𝑎𝑟𝑔𝑒 . These model sizes have a large number of transformer layers. The 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒 version has 12 transformer layers and the 𝐵𝐸𝑅𝑇𝑙𝑎𝑟𝑔𝑒 has 24. These also have larger feed-forward networks with 768 and 1024 hidden representations, and attention heads are 12 and 16 for the respective models. Like the vanilla transformer model [8], BERT takes a sequence of words as input. Each layer applies self-attention, passes its results through a feed-forward network, and then hands it off to the next encoder. Embeddings from 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒 have 768 hidden units. The BERT configuration model takes a sequence of words/tokens at a maximum length of 512 and produces an encoded representation of dimensionality 768. The pre-trained BERT models have a better word representation as they are trained on a large Wikipedia and book corpus. As the pre-trained BERT model is trained on generic corpora, we need to fine-tune the model for the downstream tasks. During fine-tuning, the pre-trained BERT model parameters are updated when trained on the labeled hate speech and offensive content dataset. When fine-tuned on the downstream sentence classification task, a very few changes are applied to the 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒 configuration. In this architecture, only the [CLS] (classification) token output provided by BERT is used. The [CLS] output is the output of the 12th transformer encoder with a dimensionality of 768. It is given as input to a fully connected neural network, and the softmax activation function 2 http://edinburghnlp.inf.ed.ac.uk/workshops/OSACT4/ Figure 1: BERT model for sequence classification on Hate Speech Data. is applied to the neural network to classify the given sentence. Thus, BERT learns to predict whether a tweet can be classified as a hate speech or offensive content. Apart from 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒 model, we used the pre-trained multilingual 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒 model, as our data consisted of German and Hindi multilingual languages. The multilingual BERT and vanilla BERT models’ architecture is the same, but the pre- trained multilingual BERT model is trained on multilingual Wikipedia language sources. 4. Experiment Initially, we introduce datasets used, the task description, and then review the BERT model’s perfor- mance on hate speech and offensive content detection. We also include our implementation details and error analysis in the subsequent sections. 4.1. Dataset We used the dataset provided by the organizers of HASOC FIRE-2020 [9] and FIRE-2019 [10]. The HASOC dataset was subsequently sampled from Twitter and partially from Facebook for English, German, and Hindi languages. The tweets were acquired using hashtags and keywords that contained offensive content. The statistics of FIRE 2020 and 2019 datasets are given in the Table 1. 4.2. Task description The following tasks are in HASOC 2020. Table 1 Data Statistics Language Train Sentences Test Sentences English (HASOC 2019) 5852 1153 German (HASOC 2019) 3819 850 Hindi (HASOC 2019) 4665 1318 English (HASOC 2020) 3708 814 German (HASOC 2020) 2373 526 Hindi (HASOC 2020) 2963 663 Sub-task A focuses on coarse-grained Hate speech detection in all three languages. The task is to classify tweets into two classes: • (NOT) Non Hate-Offensive - Post does not contain any Hate speech, profane, offensive content. • (HOF) Hate and Offensive - Post contains Hate, offensive, and profane content. Sub-task B represents a fine-grained classification. Hate-speech and offensive posts from the sub- task A are further classified into three categories. The task is to classify the tweets into three classes: • (HATE) Hate speech - Post contains Hate speech content. • (OFFN) Offenive - Post contains offensive content such as insulting, degrading, dehumanizing and threatening. • (PRFN) Profane - Post contains profane words. This typically concerns the usage of swearwords and cursing. 4.3. Implementation For the implementation, we used the transformers library provided by HuggingFace [11]. The Hug- gingFace transformers package is a python library providing pre-trained and configurable transformer models useful for a variety of NLP tasks. It contains the pre-trained BERT and multilingual BERT, and other models suitable for downstream tasks. As the implementation environment, we use the PyTorch library that supports GPU processing. The BERT models were run on NVIDIA RTX 2070 graphics card with an 8 GB graphics card. We trained our classifier with a batch size of 64 for 5 to 10 epochs based on our experiments. The dropout is set to 0.1, and the Adam optimizer is used with a learning rate of 2e-5. We used the hugging face transformers pre-trained BERT tokenizer for tokenization. We used the BertForSequenceClassification module provided by the HuggingFace library during finetuning and sequence classification. 4.4. Baseline models Here, we compared the BERT model with other machine learning algorithms. 4.4.1. SVM with TF_IDF text representation We chose Support Vector Machines (SVM) for hate speech and offensive content detection. The to- kenizer used is SentencePiece [12]. SentencePiece is a commonly used technique to segment words into a subword-level. In both cases, the vocabulary is initialized with all the individual characters in Table 2 macro F1 and Accuracy on English Subtasks A and B Hate speech Detection Offensive Content Identification Model macro F1 Accuracy macro F1 Accuracy SVM 81.56% 81.57% 47.49% 76.78% ELMO + SVM 82.43% 83.78% 49.62% 79.54% BERT 88.33% 88.33% 54.44% 81.57% Table 3 macro F1 and Accuracy on German Subtasks A and B Hate speech Detection Offensive Content Identification Model macro F1 Accuracy macro F1 Accuracy SVM 73.29% 79.27% 45.54% 77.94% ELMO + SVM 71.73% 80.42% 45.94% 78.21% multilingual-BERT 77.91% 82.51% 47.78% 80.42% Table 4 macro F1 and Accuracy on Hindi Subtasks A and B Hate speech Detection Offensive Content Identification Model macro F1 Accuracy macro F1 Accuracy SVM 59.73% 70.13% 36.78% 72.39% ELMO + SVM 60.91% 71.47% 39.89% 72.76% multilingual-BERT 63.54% 74.96% 49.71% 73.15% the language, and then the most frequent or likely combinations of the symbols are iteratively added to the vocabulary. 4.4.2. ELMO embeddings with SVM model ELMO(Embeddings from Language Models) [13] deals with contextual embeddings. Contextual word- embeddings are born to capture the word meaning in its context. Instead of using a fixed embedding for each word, ELMO looks at the word’s context, i.e., the word’s entire sentence, before assigning embedding to the word. It uses a bi-LSTM trained on a specific task to be able to create those em- beddings. We used the ELMO model present on tensorflow hub (https://tfhub.dev/google/elmo/2) to obtain the ELMO embeddings on the hate speech data for all the languages. After obtaining the em- beddings, we take the mean of embeddings and apply an SVM classifier to classify the given sentence into hate speech or offensive content. We used the SentencePiece tokenizer. 5. Results The results are tabulated in Tables 2, 3 and 4. We evaluated the performance of the method using macro F1 and accuracy. The BERT model performed well when compared to the other SVM with TF-IDF and ELMO text representations. Given all the languages and both the subtasks A and B, we have observed an increase of 1-2% in classification metrics for ELMO embeddings + SVM classifier compared to the baseline SVM classifier. However, BERT showed an increase of 5-7% in classification metrics compared to ELMO and SVM models. It shows the pre-trained BERT model’s capability, which learnt better text representations from the generic data. The state of the art transformer architecture (a) (b) (c) (d) (e) (f) Figure 2: Confusion matrix on the given test data for the English, German and Hindi languages given subtask A: Hate Speech Detection and subtask B: Offensive Content Identification used in the BERT model helped the model learn better parameter weights in hate speech and offensive content detection. 6. Error Analysis The confusion matrix of BERT model for subtasks A and B for the english, german and hindi datasets is given in the Figure 2. For the binary classification, the best-performed model was for English sub- task A. The binary classification for the Hindi model is not helpful. The model misclassified most of the hate-speech labels. It can be seen in subfigure 2(e). For offensive content evaluation, the model performed better on English subtask B. It correctly classified "NONE (not offensive)" and "PROF (pro- fane)" but was unable to classify "HATE (hate speech)" and "OFFN (offensive)" and misunderstood most of them as "PROF". The multilingual-BERT model misclassified most of the hate speech and offensive content labels for the German and Hindi languages as "NONE" and didn’t perform well on those datasets. 7. Conclusion and Future work We used pre-trained bi-directional encoder representations using transformers (BERT) and multilingual- BERT for hate speech and offensive content detection for English, German, and Hindi languages. We compared the BERT with other machine learning and neural network classification methods. Our analysis showed that using the pre-trained BERT and multilingual BERT models and finetuning it for downstream hate-speech text classification tasks showed an increase in macro F1 score and accuracy metrics compared to traditional word-based machine learning approaches. The given data has both hate speech and offensive content labeled for a given same sentence. It implies that both tasks are related. In such a scenario, we can use joint learning models to help obtain a strong relationship between the two tasks. Which, in turn, helps a deep joint classification model to understand the given datasets better. References [1] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the problem of offensive language, arXiv preprint arXiv:1703.04009 (2017). [2] A. Gaydhani, V. Doma, S. Kendre, L. Bhagwat, Detecting hate speech and offensive lan- guage on twitter using machine learning: An n-gram and tfidf based approach, arXiv preprint arXiv:1809.08651 (2018). [3] B. Gambäck, U. K. Sikdar, Using convolutional neural networks to classify hate-speech, in: Proceedings of the first workshop on abusive language online, 2017, pp. 85–90. [4] P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection in tweets, in: Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 759–760. [5] Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint arXiv:1408.5882 (2014). [6] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (1997) 1735– 1780. [7] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transform- ers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in neural information processing systems, 2017, pp. 5998–6008. [9] T. Mandl, S. Modha, G. K. Shahi, A. K. Jaiswal, D. Nandini, D. Patel, P. Majumder, J. Schäfer, Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages), in: Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, CEUR, 2020. [10] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, A. Patel, Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages, in: Proceedings of the 11th Forum for Information Retrieval Evaluation, 2019, pp. 14–17. [11] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Fun- towicz, et al., Huggingface’s transformers: State-of-the-art natural language processing, ArXiv (2019) arXiv–1910. [12] T. Kudo, J. Richardson, Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing, arXiv preprint arXiv:1808.06226 (2018). [13] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep contex- tualized word representations, arXiv preprint arXiv:1802.05365 (2018).