Introduction

FIRE

Team FalsePostive at HASOC 2019: Transfer-Learning for Detection and Classi cation of Hate Speech

0 Indian Institute of Information Technology , Guwahati Bongora, Assam , India

2019

12 12 15

This paper presents the results obtained by using a convolutional neural network (CNN) and bi-directional long short-term memory (BiLSTM) network on the three di erent datasets provided in HASOC 2019. The neural networks presented in this paper were rst trained to classify whether the document provided in the dataset is hate speech or not. And then, the networks were ne-tuned for the subsequent subtasks of ne-grained classi cation of hate speech and nding out the type of o ense.

Transfer learning Neural network Hate speech

Introduction

Social media platforms such as Facebook and Twitter have increased in popularity over the recent decades and only continue to rise in popularity. While such platforms are used by people for constructive purposes, they have also become a convenient medium for dissemination of hate speech. As such, there is a dire need to possibly limit and curb the spread of hate speech in online social media platforms [ 2 ].

According to Davidson et al. [ 6 ] hate speech is language that is used to expresses hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group. In their work, they conclude that classifying hate speech is a di cult task, as we tend to classify it based on our own subjective biases.

The HASOC1 2019 [ 10 ] event presents the shared task of classifying hate speech. The shared task is further divided into three subtask A, B and C. Subtask A comprised of classifying the document into two categories, Hate and O ensive and Non - Hate and o ensive. The items classi ed as hateful and o ensive in subtask A are to be further classi ed into three ne-grained classes namely Hate speech, O ensive, Profane for subtask B. In subtask C, we further classify the hateful and o ensive items of subtask A into two types of o enses, Targeted Insult and Untargeted. For the above tasks, the organizers provided datasets created from Facebook and Twitter posts. The datasets were provided in three di erent languages, English, German and code-mixed Hindi. The organizer de ne not hateful or o ensive (NOT) as posts that do not contain any hate speech or o ensive content. Hate and O ensive (HOF) as posts that contain hate, o ensive and profane content. Hate speech (HATE) as posts that contain hate speech. Profane(PRFN) when post contain profane words. When posts contain threats contain insult/threat to an individual, group or others, it is labelled Targeted (TIN). And when the post contains general profanity that are not targeted but contain non-acceptable language, it is labelled Untargeted (UNT). 2

Related Work

Baruah et al. [ 1 ] trained BiLSTM models with and without attention to detect hate speech against immigrants and women on twitter. This task was a part of the SemEval 2019 workshop [ 2 ]. They found that BiLSTM without attention performed better than the one with attention. Indurthi et al. [ 8 ] evaluated the performance of various sentence level embeddings for the same task. They trained various simple machine learning models on these embeddings and found that Google's Universal Sentence Encoder [ 4 ] coupled with SVM (with RBF Kernel) outperformed all other models for the task. For the same task Ding et al. [ 7 ] used a capsule network on top of a stacked BiGRU network. They used the word level embeddings provided by FastText [ 9 ] to rst convert the words into vector representations. Nobata et al. [ 11 ] used a regression model and studied the performance of di erent features like word2vec, word n-grams, etc. for detecting hate speech. They also developed an abusive language corpus from annotated user comments. Davidson et al. [ 6 ] worked in identifying the challenges in hate speech detection such as detecting hate speech when hate words might not be used in the text. 3

Data

This section discusses more details about the datasets mentioned in Section 1. The number of posts provided per label for English, German and code-mixed Hindi is given in Table 1. The posts marked as HATE is further categorized as given in Table 2 and Table 3. As apparent from the above-mentioned tables, the datasets given is imbalanced.

To balanced the datasets, it was shu ed once and then balanced using a simple interweaving algorithm. The interweaving algorithm for two labels is given in Algorithm 1. The dataset is not shu ed again. This was done with the intuition that when feeding data in batches to the models, the number of samples of each category would be balanced in each epoch so that the model learns from each class evenly. Before training any models, preprocessing was applied on the dataset. The preprocessing applied is similar to the one applied by Davidson et al. [ 6 ]. The preprocessing steps are as follows. 1. Hashtags were segmented.

For example #buildthewall is segmented into build the wall. 2. URLs were removed. 3. Redundant symbols were removed. 4. Redundant whitespaces were removed. 5. The @ symbol from user handles was removed. 6. RT pre xes were removed.

Algorithm 1: Interweaving input : M output: B

List of majority samples , m

Interweaved balanced dataset Init B Empty list ; for i 1 to jM j do

Append M [i] to B; if i >= jmj then

Init r random integer between 1 to jmj ;

Append m[r] to B; else

Append m[i] to B; return B

List of minority samples

Experimental Settings

Embeddings The preprocessed posts mentioned in Section 3 were then converted into vector representations using pre-trained embeddings via an embedding layer. If any words in the posts were not found in the embeddings, a zero vector of appropriate dimensions was used. For tokenizing the sentences, NLTK's [ 3 ] TweetTokenizer was used. The embedding used for the three languages is given in Table 4. In this section, the neural network models used are discussed. All the models have been implemented using Keras [ 5 ]. And for other tasks such as evaluating the performance of the models and generating train-validation-test splits, ScikitLearn has been used.

Stacked BiLSTM This model was used for all the subtasks of the English dataset. It was rst trained for subtask A. And then the model was further netuned for subtask B and subtask C. The model architecture is given in Table 5. The bidirectional outputs were merged by multiplying. Fine-tuning was done by resetting the weights of the 9th and 10th layer and replacing the nal output layer with a new dense layer of appropriate units. All other layers were frozen. The model was trained using Adam optimizer coupled with a cross-entropy loss function for 20 epochs. The batch size used was 32.

CNN This model was used for all subtasks of the German and code-mixed Hindi dataset. Similar to the stacked BiLSTM, this model was rst trained for subtask A and then ne-tuned for subtask B and C. This time, ne-tuning was done by replacing the nal layer by a dense layer of the appropriate number of units, i.e, number of categories to classify. Then all the layers were retrained using a small learning rate ( 0:0005 ) and decay ( 0:000005 ). The optimizer used was rmsprop and the same cross-entropy loss function. The batch size used is 32 and trained for 20 epochs. 5

Results and Discussion

The primary and secondary metric of evaluation for HASOC is weighted and macro f1. The o cial results on the test set as published by the organisers for each dataset is given in Table 12, Table 13 and Table 14. For the English dataset, the BiLSTM model performed quite poorly achieving a macro average f1 of only 0:61. The model became biased on the posts that were not hate-speech as is apparent from the confusion matrix shown in Table 7. On further ne-tuning this model for subtask B and C, its performance degraded even further scoring macro average f1's of 0:28 and 0:36 respectively. The CNN model used for the Hindi subtasks achieved a macro average f1 of 0:76 which is around 0:05 shy from the best performing model. However, this model also lost a lot of performance on ne-tuning for subtask B and subtask C, achieving macro average f1 of 0:26 and 0:58 respectively. Both models su ered from false positives and false negatives. NLTK's TweetTokenizer is not built for the Hindi language, hence it seems to perform character-level tokenization on the Hindi words. In the subtasks for the German language, the CNN model achieved a macro average f1 of 0:52, while the best-achieved score is 0:61.The model was able to classify only 20 samples as HOF. This could be due to the huge skew in the training set as shown in Table 1. Oversampling by interweaving did not prove to be an e ective method for balancing the dataset, especially when the imbalance is large. In subtask B, the models performed poorly in classifying the OFFEN and PRFN labels. In the German dataset, it completely failed to classify the PRFN class and performed poorly on the OFFEN class. The same performance degradation can be observed in subtask B and subtask C. The problem of hate speech has become increasingly more prevalent. People that post hateful tweets always nd news ways to skirt around detection systems. With the velocity at which content is generated on social media, it is not feasible to manually ag every post for toxicity. As such, it is of utmost importance to develop automated systems that detect and purge hate speech and other toxic content. It is hoped that the system models developed in this study can shed some light on the task of detection of hate speech and its ner modalities. But it leaves much to be desired.

1. Baruah , A. , Barbhuiya , F.A. , Dey , K. : Abaruah at semeval -2019 task 5 : Bidirectional lstm for hate speech detection . In: SemEval@ NAACL-HLT ( 2019 )

2. Basile , V. , Bosco , C. , Fersini , E. , Nozza , D. , Patti , V. , Pardo , F.M.R. , Rosso , P. , Sanguinetti , M. : Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter . In: SemEval@ NAACL-HLT ( 2019 )

3. Bird , S. , Klein , E. , Loper , E.: Natural Language Processing with Python. O'Reilly Media ( 2009 )

4. Cer , D. , Yang , Y. , yi Kong , S. , Hua , N. , Limtiaco , N. , John , R.S., Constant , N. , Guajardo-Cespedes , M. , Yuan , S. , Tar , C. , Sung , Y.H. , Strope , B. , Kurzweil , R.: Universal sentence encoder . ArXiv abs/ 1803 .11175 ( 2018 )

5. Chollet , F. , et al.: Keras. https://keras.io ( 2015 )

6. Davidson , T. , Warmsley , D. , Macy , M. , Weber , I. : Automated hate speech detection and the problem of o ensive language . In: Proceedings of the 11th International AAAI Conference on Web and Social Media . pp. 512 { 515 . ICWSM ' 17 ( 2017 )

7. Ding , Y. , Zhou , X. , Zhang , X. : Ynudyx at semeval-2019 task 5: A stacked bigru model based on capsule network in detection of hate . In: SemEval@ NAACL-HLT ( 2019 )

8. Indurthi , V. , Syed , B. , Shrivastava , M. , Chakravartula , N. , Gupta , M. , Varma , V.K. : Fermi at semeval -2019 task 5: Using sentence embeddings to identify hate speech against immigrants and women in twitter . In: SemEval@ NAACL-HLT ( 2019 )

9. Mikolov , T. , Grave , E. , Bojanowski , P. , Puhrsch , C. , Joulin , A. : Advances in pretraining distributed word representations . In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018 ) ( 2018 )

10. Modha , S. , Mandl , T. , Majumder , P. , Patel , D. : Overview of the HASOC track at FIRE 2019: Hate Speech and O ensive Content Identi cation in Indo-European Languages . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

11. Nobata , C. , Tetreault , J.R. , Thomas , A.O. , Mehdad , Y. , Chang , Y. : Abusive language detection in online user content . In: WWW ( 2016 )

12. Pennington , J. , Socher , R. , Manning , C.D.: Glove: Global vectors for word representation . In: Empirical Methods in Natural Language Processing (EMNLP) . pp. 1532 { 1543 ( 2014 ), http://www.aclweb.org/anthology/D14-1162