TheNorth at HASOC 2019: Hate Speech Detection in Social Media Data Pedro Alonso, Rajkumar Saini, and György Kovács Luleå University of Technology, Sweden {pedro.alonso, rajkumar.saini, gyorgy.kovacs}@ltu.se Abstract. The detection of hate speech in social media is a crucial task. The uncontrolled spread of hate speech can be detrimental to maintain- ing the peace and harmony in society. Particularly when hate speech is spread with the intention to defame people, or spoil the image of a per- son, a community, or a nation. A major ground for spreading hate speech is that of social media. This significantly contributes to the difficulty of the task, as social media posts not only include paralinguistic tools (e.g. emoticons, and hashtags), their linguistic content contains plenty of poorly written text that does not adhere to grammar rules. With the recent development in Natural Language Processing (NLP), particularly with deep architecture, it is now possible to anlayze unstructured com- posite natural language text. For this reason, we propose a deep NLP model for the detection of automatic hate speech in social media data. We have applied our model on the HASOC2019 hate speech corpus, and attained a macro F1 score of 0.63 in the detection of hate speech. 1 Introduction In the course of our lifetime, we have experienced an increase in social media usage [3]. Social media, when used with care, can be beneficial for its users, but it can also be a hotbed for bullying, online harassment, and the spread of hate speech. All these factors can severely impact both individual users and society in a negative way. For this reason, it is becoming more and more important to pro- vide automatic hate speech detection tools, which can help curb its appearance on social media (Twitter in this case). Therefore, it is of the utmost importance to have an ability to monitor the offensive content being published and let the moderators take the steps they deem necessary. This is especially important when trying to protect vulnerable groups of people like immigrants [2], women, members of the LBTQ community, or members of any other group that is the target of hate. While several attempts have been made to detect hate speech in comments, [5], [4], [6]. The models still could use some more fine-tuning, so our model could be considered an addendum to the pool of existing ones aimed at increasing the accuracy of hate speech detection in the wild. Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). FIRE 2019, 12-15 December 2019, Kolkata, India. Hate Speech Detection in Social Media Data Table 1: Some samples from the English language training data along with their ground truth labels. Post Ground Truth New logo for world Cup designed by ICC NOT/NONE/NONE #ShameOnICC https://t.co/AtFL15Gt9B @TheRealOJ32 The world will rejoice when you die. HOF/OFFN/TIN #DoctorsFightBack we want justice https://t.co/ONUdOhagX3 HOF/HATE/UNT Just watched this guy spray the crap from his curb to the curb of HOF/PRFN/TIN his next door neighbor. #DickHead 2 Hate speech detection on twitter Here, the task to be undertaken is that of “Hate Speech and Offensive Content Identification in Indo-European Languages” challenge, a task inspired by similar prior challenges [9,10]. Ask the task is detailed in the accompanying overview paper [8], here we only discuss it briefly. HASOC2019 data consists of social media posts from Twitter and Facebook in a tab-separated format. The data in the dataset is available in three different languages, namely English, German, and code-mixed Hindi. Here, we exclusively process English language posts. The training data for English consists of 6358 instances. Some examples of the train- ing set (along with their ground truth labels) are shown in Table 1. The HASOC2019 task description [7,8] defines three sub-tasks in the hate speech detection challenge. These sub-tasks are as follows: Sub-task A: The task we tackle in our experiments is that of task A. The task here is a more general binary classification of social media posts into two categories, specifically the “Hate and Offensive” category (HOF) and the “Non- Hate and offensive” category (NOT). – NOT: These are posts without sentences considered to be hate speech or offensive in content. – HOF: These posts are considered to contain hateful, offensive or profane language. Sub-task B: This task is concerned with a more detailed classification for the post in the previous category HOF, this time divided into three categories. – Hate speech (HATE): Posts that belong in this class contain hate speech sentences. These include, description of negative attributes or ascribing de- ficiencies to individuals because they belong to a certain group (e.g. poor people are dumb). Can also comprise hateful comments geared to certain groups of people based on their race, political opinion, sexual orientation, gender, social status, or health condition. – Offensive (OFFN): Posts that belong in this class contain offensive content. This means posts that degrade, dehumanize, or insult an individual. Posts that threaten individuals with violent acts re also categorized into this class. Hate Speech Detection in Social Media Data – Profane (PRFN): Posts that belong in this class contain profane words, or unacceptable language but without directed insults or abuse. This class typ- ically concerns the use of swearwords (e.g. shit, fuck), and cursing. Sub-task C: A fine-level classification of social media posts in the HOF category from a different perspective. Here, differentiation between hateful posts are made on the ground of the post containing directed hate, or hate/offensive language in general (e.g. Who the fuck voted for a no deal?”) – Targeted Insult (TIN): Posts deemed insulting or threatening towards an individual, group, or others. – Un-targeted (UNT): Posts deemed to no be targeted towards a specific in- dividual or group, but still contain unacceptable language. Table 2: The name of the Indian prime minister used in pop-cultural references Post Ground Truth Modi Ji will never give you up Modi ji will never give you down NOT Modi Ji knows Coca Cola’s secret ingredient NOT Modi Ji knows why is Gamora HOF Modi Ji knows who let the dogs out HOF 2.1 Difficulties One degree of difficulty emerges from the nature of social media posts. Namely, that textual content shared on social media is rarely well-formed, and often contains paralinguistic elements, such as URLs, emoticons, and other special characters. Another degree of difficulty is due to the inherent unbalanced nature of hate speech detection. As the majority of social media posts contain no hate speech or profanity. Lastly, a third degree of difficulty emerges from the subjec- tive nature of hate speech labeling. For example, the training set of HASOC2019 contains a serious of pop-cultural references that include the prime minister of India, Narendra Modi (Modi Ji). And while on the surface these tweets (see Ta- ble 2) are all innocuous, some are classified as hateful, while others are classified as non-hateful, without any clear logic. Another example to the subjective na- ture of decisions about hate speech and offensive content is how people react very differently to the use of the word “fuck” when it is used as part of a hashtag, as opposed to when it is used without a hashtag. In the training set of HASOC2019 the number of tweets that contain the word fuck with, and without a hashtag is 1159 and 215 respectively. After eliminating those tweets that contain both, these numbers decrease to 1072 and 128. These two categories are very much different, as when the word is used without a hashtag alone in a tweet, more than 97% of the tweets are considered hateful. However, for the hashtagged version, this number is only approximately 41% (while for tweets that do not contain either, this is approximately 38%). This indicated that while the use of the word fuck in and of itself highly increases the probability of a tweet deemed as hateful, tweets with fuck in a hashtag are only slightly more likely to be treated the same way. Hate Speech Detection in Social Media Data 3 Experimental setup In this section we describe the model we applied for the task, and also shortly describe our method for training said model. 4 Hate speech detection model Now, we present the architectural details of the proposed system. The system architecture is shown in Fig. 1. The following figure shows the deep neural net- work used in our approach. Our approach is similar to [1] and [11], where they showed that with convolution layers at the beginning, the top could vary and get accurate results. Therefore our model follows the same principle of, stacking a few convolutions at the top, and the varied the intermediate layers, in our case we chose a Bi-LSTM, to contrast with the LSTM and GRU used in the papers. We started with an input layer with the number of batches times the length of the text (in our case fifty), then we used an embedding layer which was self-trained. The next stage is made up of convolutions of sixteen, eight and four to reduce the size of the input as much as we could without losing too much information, we use a max-pooling layer of size 4 at the end. Next stage we, use three bi-directional LSTM with one thousand six hundred neurons for the final classification part. Then we use a combination of dense and dropout layers, where the dropout probability is set to be 0.5. Lastly, we use a soft-max layer with two neurons corresponding to the two classes (NOT, HOF). Input Layer BiDirectionalGRU Dropout layer (batch, 50) layer(6,1600) (0.5) Embedding layer BiDirectionalGRU Dense layer (6000, 50) layer (6, 1600) (200, 200) Conv1D layer BiDirectionalGRU Dropout layer (35, 16, 64) layer (1600) (0.5) Conv1D layer Dense layer Dense Output (28, 8, 32) (1600, 200) (2-3) Conv1D layer Dropout layer Classification (25, 4, 16) (0.5) SoftMax MaxPool layer Dense layer (6, 4, 16) (200, 200) Fig. 1: Architecture of the proposed hate speech detection system. 4.1 Model training For training our models, we first partitioned the labeled data available into two sets using 10% and 90% of the instances. The former we used for model evaluation Hate Speech Detection in Social Media Data (and will reference it in this paper as the evaluation set), while the latter we partitioned again in the same ratio. The bigger set resulting we used for training our models (and will reference it in this paper as the training set), while the smaller set we used for early stopping (and will reference it in this paper as the validation set). Then we trained our model for at most a hundred epochs using the samples from the training set. After each epoch we only kept the changes if there was an improvement in the macro F1 score attained on the validation set. Otherwise we did reset the weights to the result of the last successful epoch, and continued the process of training. If there were three consecutive epochs where the macro F1 score did not improve on the validation set, we stopped the process, and saved our final model. 5 Results and Discussion For this paper we carried out all our experiments on Sub-task A using only the English language posts. These experiments have been conducted in two runs. Figure 3 shows the statistics for the Sub-task A for both of these runs (run1 and run2). The overall average F 1 and weighted average F 1 scores in run1 (run2) as 0.6279 (0.6094), and 0.6963 (0.6779) have been recorded respectively on Sub- task A. As we see in 3 the precision and recall are higher for the NOT class, than for the offensive one. While in Figure 2, we present the results as a confusion matrix. In this figure we can again see that one weak point of our model is its sensitivity to HOF. This also shows that classifying offensive language is still a difficult task for the algorithm. 6 Conclusion The detection of hate speech requires more attention in the age of the Internet, as it can now spread faster. Hate speech can cause severe social/moral damage to our society. In this paper, we investigate the HASOC2019 hate speech detection dataset. Alhough the dataset contains three languages (English, German, and code-mixed Hindi), we have worked only with English data. The proposed system works relatively well on Sub-task A. The weighted average F 1-scores of 0.6963, and 0.6779 have been recorded on Sub-task A in run 1, and run 2 respectively. Table 3: Results of Sub-task A (run1/run2) precision recall F1-score support HOF 0.41/0.38 0.62/0.61 0.49/0.47 288 NOT 0.85/0.84 0.70/0.67 0.76/0.75 865 accuracy – – 0.68/0.66 1153 macro average 0.63/0.61 0.66/0.64 0.63/0.61 1153 weighted average 0.74/0.73 0.68/0.66 0.70/0.68 1153 Hate Speech Detection in Social Media Data Fig. 2: Confusion matrix of results on the test set, produced by our first model trained for Task A) In the future, we are planning to try different architectures with varying degrees of complexity to get better performance for the task here described. Also, we shall try to gather more data to be sure our model developed has a sufficient amount of samples to work efficiently. References 1. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion. pp. 759–760. WWW ’17 Companion (2017) 2. Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel Pardo, F.M., Rosso, P., Sanguinetti, M.: SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation. pp. 54–63. Association for Computational Linguistics, Minneapolis, Minnesota, USA (Jun 2019), https://www.aclweb.org/ anthology/S19-2007 3. Chou, W.y.S., Hunt, Y.M., Beckjord, E.B., Moser, R.P., Hesse, B.W.: Social media use in the united states: Implications for health communication. J Med Internet Res 11(4) (Nov 2009) 4. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh international AAAI conference on web and social media (2017) 5. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th Inter- national Conference on World Wide Web. pp. 29–30. WWW ’15 Companion, ACM, New York, NY, USA (2015), http://doi.acm.org/10.1145/2740908.2742760 6. Gambäck, B., Sikdar, U.K.: Using convolutional neural networks to classify hate- speech. In: Proceedings of the First Workshop on Abusive Language Online. pp. 85–90. Association for Computational Linguistics, Vancouver, BC, Canada (Aug 2017), https://www.aclweb.org/anthology/W17-3013 7. Mandl, T., Modha, S., Mandlia, C., Patel, D., Patel, A., Dave, M.: HASOC - hate speech and offensive content identification in indo-european languages. https: //hasoc2019.github.io/call_for_participation.html, accessed: 2019-09-20 Hate Speech Detection in Social Media Data 8. Modha, S., Mandl, T., Majumder, P., Patel, D.: Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages. In: Proceedings of the 11th annual meeting of the Forum for Informa- tion Retrieval Evaluation (2019) 9. Wiegand, M., Siegel, M., Ruppenhofer, J.: Overview of the germeval 2018 shared task on the identification of offensive language (2018) 10. Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Semeval- 2019 task 6: Identifying and categorizing offensive language in social media (offen- seval). In: Proceedings of the 13th International Workshop on Semantic Evaluation. pp. 75–86 (2019) 11. Zhang, Z., Robinson, D., Tepper, J.: Detecting hate speech on twitter using a convolution-gru based deep neural network. In: Gangemi, A., Navigli, R., Vidal, M.E., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., Alam, M. (eds.) The Semantic Web. pp. 745–760. Springer International Publishing, Cham (2018)