1. Introduction

Detecting Hate Speech Spreaders on Twitter using LSTM and BERT in English and Spanish

Moshe Uzan

Yaakov HaCohen-Kerner

1 0 Computer Science Department, Bar Ilan University , Ramat-Gan 5290002 , Israel 1 Computer Science Department, Jerusalem College of Technology (Lev Academic Center) , Jerusalem 9116001 , Israel

2021

In this paper, we describe our submissions for PAN at CLEF 2021 contest. We tackled the subtask “Proifling Hate Speech Spreaders on Twitter”. We developed diferent models for English and Spanish languages, using classic machine learning methods like Support Vector Classifier, Multi-Layer Perceptron, Logistic Regression, Random Forest, Ada-Boost Classifier and K-Neighbors Classifier to more recent deep learning methods like BERT and Bidirectional LSTM.

eol>Author Profiling Hate Speech Twitter Spanish English BERT LSTM Logistic Regression SVM MLP Random Forest Ada-Boost Classifier

1. Introduction 2. Background

Early works [ 4, 5 ] referred to hate speech as abusive and hostile messages or flames . Recent authors [ 6, 7, 8 ] preferred to employ the term cyberbullying. However, more terms related to hate speech are often used in the NLP community, such as: discrimination, flaming, abusive language, profanity, toxic language or comment [ 9 ]. But, in defining this phenomenon, the words hate speech tends to be used the most [ 10 ].

Identifying if a text contains hateful language is not an easy task, even not for humans. However, there is not one formal definition of hate speech, a common definition is given by [ 11 ] as any communication that disparages a person or a group on the basis of some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic [ 9, 10, 12, 13, 14, 15 ]. Some examples are given by Biere et al. [ 10 ] and de Gilbert et al. [ 13 ]: 1. God bless them all, to hell with the black. 2. Wipe out the Jews. 3. Women are like grass, they need to be beaten/cut regularly.

Fortuna and Nunes [16] noted in their survey paper that for hate speech detection the most used approach is the supervised one with a focus on support vector machines (SVM) ([17], [18], [19]) followed by Random Forests [20], and Decision Trees [21]. Schmidt and Wiegand [ 9 ] found that recurrent neural networks (RNN) are also very common [22].

Badjatiya et al. [23] proposed a deep learning approach and obtained very good results using word embeddings. Zampieri et al. [24] showed that n-grams can perform well for hate speech detection using SVMs with diferent surface-level features, such as surface n-grams, word skip-grams, and word representation n-grams induced with Brown clustering. They also noticed that these features reached their limits for more complex tasks, e.g., distinguishing profanity and hate speech. In such tasks, more in-depth linguistic characteristics may be required. But with the recent arrival of attention mechanism [25] and Transfomers [26] in NLP and especially with the development of language representation like BERT [27].

Schmidt and Wiegand [ 9 ] noted that in addition to the absence of conventional terminology issue, mentioned above, the lack of common datasets, to conduct research on it, is a challenging obstacle to progress in this area. Indeed making judgements about the general efectiveness or non-efectiveness of research conducted on various datasets can be inconsistent. For better consistency and comparability of diferent features and developed methods, they argue for a benchmark datasets for hate speech detection. This is the approach suggested by competition such as PAN at CLEF 2021 [ 2 ] which provide the same dataset to all the participants and publish the method and the results of each participant method of detection according to this benchmark dataset.

3. Experimental Results and Submitted Models 3.1. Task dataset

PAN at CLEF 2021 with the subtask “Profiling Hate Speech Spreaders on Twitter” proposed an original task by asking a model that classify a user to hate speach spreader instead of predicting if a post is hateful. For each user we were given 200 tweets and we need to classify it as hate speach spreader or not. The complexity of the task follows from the fact that only 200 users tweets was given as training set meaning we 200 cluster of 200 tweets and a label for each cluster. This task must be performed on 2 languages English and Spanish increasing the dificulty since models giving good results in one language will give less good ones in the other.

3.2. Basic models

First, we split the tweets written 200 users to train and validation set with 20 percent of the given data what give us 160 labeled users for the train and 40 ones for validation (with 200 tweets for each user). Like in our precedent work [28], we began with basic model like Support Vector Classifier (SVC), Multi-Layer Perceptron (MLP) or Logistic Regression but also more sophisticated one like Random Forest (RF), Ada-Boost Classifier (ABF) and K-Neighbors Classifier (KN) using classical feature like char ngram features and word ngram features. Some model gave us very good accuracy but given that the dataset is relatively small this was not representative. So we retry this experiments using 10 cross-fold validation. We get less good result but it seems more representative.

3.3. Deep learning models

We realized that using basic model can lead in a significantly lower accuracy on the test set compared to its cross-validation results so we try going beyond and experiment more deep approaches using Bert as language representation of tweets. We used pretrained Bert model and in English we used [27] and for spanish we used [29]. For each tweet we get the corresponding BERT representation. From there, we tried diferent method.

Spanish Table 1: Accuracy results of our first models.

First one was by feding into two successive relu-activated dense layers first with 256 outfeatures and second with 64 out-features the 200 representated tweets. After that we obtain one 64 vector using a mean operation on 200 vectors. Finally we have a relu-activated dense layers that classify this to hateful or not.

The second model we developped take the 200 Bert representation vectors and fed them into a Bi-LSTM with 2 x 32 features in hidden layer. Finally we have a relu-activated dense layers that classify the 64 feature output to hateful or not. We use Adamw[30] variant of Adam [31] algorithm as optimizer for each model 2. After our first submission we noticed that there was a rather big gap in English between the result obtained on our development set and the final result on the test set. So we decided to increase dropout rates and use a BERT model that had been trained on tweets [32].

3.4. Experimental Results

Firstly, we submit two models the averaging one for English and the one using LSTM for Spanish. For the English we get an accuracy of 0.70 in our splitted set and 81 for the Spanish one. We get an accuracy of 0.62 in English and 0.70 in Spanish giving an overall accuracy of 0.66. We then submitted second time two model for the English one we keep the same but for Spanish we switch to the averaging model with diferent training parameter. Surprisingly, the final results 2For more precise details about dropout or batch used we publish the code in github https://github.com/ machouz/pan_transformers 3. showed that, contrary to our observations, traditional methods give very good results (see Table 2). The best result was obtened by SiinoDiNuovo getting an accuracy of 0.73 in English and 85 in Spanish. We tied for 43rd with this result.

Model SiinoDiNuovo

char nGrams+Logistic AveragingBERT MBERT-LSTM Bi-LSTM-BERT TFIDF-LSTM

4. Conclusions and Future Work

In this paper, we described the submitted models for the Profiling Hate Speech Spreaders on Twitter task at PAN 2021. Originally, we looked at a number of machine learning models using basic features. However, we finally turned to more deep learning models. These deep learning models generally do well in the tasks to which they are submitted and this is what we observed through our research. Our final model consist of using Bert as language representation, and Average or LSTM to make the classification. The dificulty here was to deal with the limited amount of given data. Our overall accuracy in our first submission was 69. Classifying a tweet post still remain a dificult task considering Twitter-style informal written genres.

Many tweets contain acronyms that can be presented in diferent forms. These acronyms can lead to ambiguity. Future research may look for other ways to lessen this ambiguity. Acronym disambiguation [33], will extend and enrich the tweet’s text and might enable better classification. We also suggest examining the usefulness of skip character n-grams because they serve as generalized ngrams that allow us to overcome problems such as noise and sparse data [34]. Other ideas that may lead to better classification are to use stylistic feature sets [ 35 ], key phrases [ 36 ], and summaries [ 37 ].

Final result shows that more traditional methods may turn out more relevant. These methods can be combined with k-fold cross-validation (see [ 38 ]), especially when, like in this contest, available data is limited [28].

Acknowledgments

We thank the Bar-Ilan Data Science Institute for kindly providing server for training our models. Without their support, this research would not have been possible. We are also grateful to the organizers and reviewers who gave us the opportunity to do this research.

3To see the whole table of results https://pan.webis.de/clef21/pan21-web/author-profiling.html#results tail on twitter, Semantic Web 10 (2019) 925–945. [16] P. Fortuna, S. Nunes, A survey on automatic detection of hate speech in text, ACM

Computing Surveys (CSUR) 51 (2018) 1–30. [17] S. Malmasi, M. Zampieri, Detecting hate speech in social media, arXiv preprint arXiv:1712.06427 (2017). [18] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the problem of ofensive language, in: Eleventh international aaai conference on web and social media, 2017. [19] D. Robinson, Z. Zhang, J. Tepper, Hate speech detection on twitter: feature engineering vs feature selection, in: European Semantic Web Conference, Springer, 2018, pp. 46–49. [20] P. Burnap, M. L. Williams, Us and them: identifying cyber hate on twitter across multiple protected characteristics, EPJ Data science 5 (2016) 11. [21] P. Burnap, M. L. Williams, Hate speech, machine classification and statistical modelling of information flows on twitter: Interpretation and communication for policy decision making (2014). [22] J. Pavlopoulos, P. Malakasiotis, I. Androutsopoulos, Deep learning for user comment moderation, arXiv preprint arXiv:1705.09993 (2017). [23] P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection in tweets, in: Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 759–760. [24] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Predicting the type and target of ofensive posts in social media, arXiv preprint arXiv:1902.09666 (2019). [25] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014. arXiv:1409.0473. [26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, 2017. arXiv:1706.03762. [27] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018. arXiv:1810.04805. [28] M. Uzan, Y. HaCohen-Kerner, Jct at semeval-2020 task 12: Ofensive language detection in tweets using preprocessing methods, character and word n-grams, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020, pp. 2017–2022. [29] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and evaluation data, in: PML4DC at ICLR 2020, 2020. [30] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101 (2017). [31] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014). [32] D. Q. Nguyen, T. Vu, A. T. Nguyen, BERTweet: A pre-trained language model for English Tweets, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020. [33] Y. HaCohen-Kerner, H. Beck, E. Yehudai, D. Mughaz, Stylistic feature sets as classifiers of documents according to their historical period and ethnic origin, Applied Artificial Intelligence 24 (2010) 847–862. [34] Y. HaCohen-Kerner, Z. Ido, R. Ya’akobov, Stance classification of tweets using skip char

A. Online Resources The sources for this work are available via

• GitHub,

[1]

Bevendorf ,

Chulvi ,

G. L. D. L. P.

Sarracén ,

Kestemont ,

Manjavacas , I. Markov,

Mayerl ,

Potthast ,

Rangel ,

Rosso ,

Stamatatos ,

Stein ,

Wiegmann ,

Wolska , , E. Zangerle, Overview of PAN 2021: Authorship Verification,Profiling Hate Speech Spreaders on Twitter,and Style Change Detection , in: 12th International Conference of the CLEF Association (CLEF 2021 ), Springer, 2021 .

[2]

Rangel ,

G. L. D. L. P.

Sarracén ,

Chulvi , E. Fersini,

Rosso , Profiling Hate Speech Spreaders on Twitter Task at PAN 2021 , in: CLEF 2021 Labs and Workshops, Notebook Papers, CEUR-WS .org, 2021 .

[3]

Potthast ,

Gollub ,

Wiegmann ,

Stein , TIRA Integrated Research Architecture, in: N. Ferro , C. Peters (Eds.), Information Retrieval Evaluation in a Changing World, The Information Retrieval Series , Springer, Berlin Heidelberg New York, 2019 . doi: 10 .1007/ 978-3- 030 -22948-1\_5.

[4]

Spertus , Smokey: Automatic recognition of hostile messages , in: Aaai/iaai, 1997 , pp. 1058 - 1065 .

[5]

Kaufer , Flaming: A white paper , Department of English, Carnegie Mellon University, Retrieved July 20 ( 2000 ) 2012 .

[6] J.-M. Xu , K.-S.

Jun , X.

Zhu , A.

Bellmore , Learning from bullying traces in social media, in: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies , 2012 , pp. 656 - 666 .

[7]

Hosseinmardi ,

S. A.

Mattson , R. I. Rafiq , R. Han,

Lv ,

Mishra , Detection of cyberbullying incidents on the instagram social network , arXiv preprint arXiv:1503.03909 ( 2015 ).

[8]

Zhong ,

Li ,

A. C.

Squicciarini ,

S. M.

Rajtmajer ,

Grifin ,

D. J.

Miller ,

Caragea , Content-driven detection of cyberbullying on the instagram social network ., in: IJCAI , 2016 , pp. 3952 - 3958 .

[9]

Schmidt ,

Wiegand , A survey on hate speech detection using natural language processing , in: Proceedings of the Fifth International workshop on natural language processing for social media , 2017 , pp. 1 - 10 .

[10]

Biere ,

Bhulai ,

M. B.

Analytics , Hate speech detection using natural language processing techniques , Master Business AnalyticsDepartment of Mathematics Faculty of Science ( 2018 ).

[11]

J. T.

Nockleby , Hate speech, Encyclopedia of the American constitution 3 ( 2000 ) 1277 - 1279 .

[12]

Zhang ,

Robinson ,

Tepper , Detecting hate speech on twitter using a convolution-gru based deep neural network , in: European semantic web conference , Springer, 2018 , pp. 745 - 760 .

[13] O. de Gibert , N.

Perez , A.

García-Pablos , M.

Cuadros , Hate speech dataset from a white supremacy forum , arXiv preprint arXiv: 1809 . 04444 ( 2018 ).

[14]

Basile ,

Bosco ,

Fersini ,

Debora ,

Patti ,

F. M. R.

Pardo ,

Rosso ,

Sanguinetti , et al., Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter , in: 13th International Workshop on Semantic Evaluation, Association for Computational Linguistics , 2019 , pp. 54 - 63 .

[15]

Zhang , L. Luo, Hate speech detection: A solved problem? the challenging case of long ngrams , in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases , Springer, 2017 , pp. 266 - 278 .

[35]

HaCohen-Kerner ,

Kass ,

Peretz , Haads: A hebrew aramaic abbreviation disambiguation system , Journal of the American Society for Information Science and Technology 61 ( 2010 ) 1923 - 1932 .

[36]

HaCohen-Kerner ,

Stern ,

Korkus , E. Fredj, Automatic machine learning of keyphrase extraction from short html documents written in hebrew , Cybernetics and Systems: An International Journal 38 ( 2007 ) 1 - 21 .

[37]

HaCohen-Kerner ,

Malin , I. Chasson , Summarization of jewish law articles in hebrew ., in: CAINE , 2003 , pp. 172 - 177 .

[38]

Bengio ,

Grandvalet , No unbiased estimator of the variance of k-fold cross-validation , Journal of machine learning research 5 ( 2004 ) 1089 - 1105 .