=Paper=
{{Paper
|id=Vol-2936/paper-194
|storemode=property
|title=Detecting Hate Speech Spreaders on Twitter using LSTM and BERT in English and Spanish
|pdfUrl=https://ceur-ws.org/Vol-2936/paper-194.pdf
|volume=Vol-2936
|authors=Moshe Uzan,Yaakov Hacohen-Kerner
|dblpUrl=https://dblp.org/rec/conf/clef/UzanH21
}}
==Detecting Hate Speech Spreaders on Twitter using LSTM and BERT in English and Spanish==
<pdf width="1500px">https://ceur-ws.org/Vol-2936/paper-194.pdf</pdf>
<pre>
Detecting Hate Speech Spreaders on Twitter using
LSTM and BERT in English and Spanish
Notebook for PAN at CLEF 2021

Moshe Uzan1 , Yaakov HaCohen-Kerner2
1
    Computer Science Department, Bar Ilan University, Ramat-Gan 5290002, Israel
2
    Computer Science Department, Jerusalem College of Technology (Lev Academic Center), Jerusalem 9116001, Israel


                                         Abstract
                                         In this paper, we describe our submissions for PAN at CLEF 2021 contest. We tackled the subtask “Pro-
                                         filing Hate Speech Spreaders on Twitter”. We developed different models for English and Spanish lan-
                                         guages, using classic machine learning methods like Support Vector Classifier, Multi-Layer Perceptron,
                                         Logistic Regression, Random Forest, Ada-Boost Classifier and K-Neighbors Classifier to more recent
                                         deep learning methods like BERT and Bidirectional LSTM.

                                         Keywords
                                         Author Profiling · Hate Speech · Twitter · Spanish · English · BERT · LSTM · Logistic Regression · SVM
                                         · MLP · Random Forest · Ada-Boost Classifier


1. Introduction
In recent years, with the increasing use of social media, we have seen an increase in the spread
of hateful content. Indeed the anonymity given by these social media allow any user to post
what he or she wants without having to fear about consequences. This bullying, trolling, and
harassment content can be very serious, in several cases might lead to suicide of the victim1 .
Following various pressures, the companies concerned are looking for more and more efficient
solutions to deal with this problem. Considering the huge quantity of text posted every day,
the need of an automatic and scalable detection system become a priority. The use of machine
learning (ML) and natural language processing (NLP) solutions to find this offensive content
has been surprisingly useful. Still, the detection of offensive language from social media is not
an easy research problem due to the different levels of ambiguities present in natural language
and the noisy nature of social media language. In addition, social media subscribers come from
linguistically diverse communities. PAN at CLEF 2021 with “Profiling Hate Speech Spreaders
on Twitter” [1, 2] deals with the detection of hate speech spreaders in two languages English
and Spanish meaning that classification need to be done at the user level and not at the post
level. The submission was done using TIRA automates software submission [3].


CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
" uzanmacho@gmail.com (M. Uzan); kerner@jct.ac.il (Y. HaCohen-Kerner)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings           CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073


                  1
                      https://en.wikipedia.org/wiki/List_of_suicides_that_have_been_attributed_to_bullying
2. Background
Early works [4, 5] referred to hate speech as abusive and hostile messages or flames. Recent
authors [6, 7, 8] preferred to employ the term cyberbullying. However, more terms related to
hate speech are often used in the NLP community, such as: discrimination, flaming, abusive
language, profanity, toxic language or comment [9]. But, in defining this phenomenon, the words
hate speech tends to be used the most [10].
  Identifying if a text contains hateful language is not an easy task, even not for humans.
However, there is not one formal definition of hate speech, a common definition is given
by [11] as any communication that disparages a person or a group on the basis of some
characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or
other characteristic [9, 10, 12, 13, 14, 15]. Some examples are given by Biere et al. [10] and de
Gilbert et al. [13]:
   1. God bless them all, to hell with the black.
   2. Wipe out the Jews.
   3. Women are like grass, they need to be beaten/cut regularly.
   Fortuna and Nunes [16] noted in their survey paper that for hate speech detection the most
used approach is the supervised one with a focus on support vector machines (SVM) ([17], [18],
[19]) followed by Random Forests [20], and Decision Trees [21]. Schmidt and Wiegand [9]
found that recurrent neural networks (RNN) are also very common [22].
Badjatiya et al. [23] proposed a deep learning approach and obtained very good results using
word embeddings. Zampieri et al. [24] showed that n-grams can perform well for hate speech
detection using SVMs with different surface-level features, such as surface n-grams, word
skip-grams, and word representation n-grams induced with Brown clustering. They also noticed
that these features reached their limits for more complex tasks, e.g., distinguishing profanity
and hate speech. In such tasks, more in-depth linguistic characteristics may be required. But
with the recent arrival of attention mechanism [25] and Transfomers [26] in NLP and especially
with the development of language representation like BERT [27].
   Schmidt and Wiegand [9] noted that in addition to the absence of conventional terminology
issue, mentioned above, the lack of common datasets, to conduct research on it, is a challenging
obstacle to progress in this area. Indeed making judgements about the general effectiveness
or non-effectiveness of research conducted on various datasets can be inconsistent. For better
consistency and comparability of different features and developed methods, they argue for a
benchmark datasets for hate speech detection. This is the approach suggested by competition
such as PAN at CLEF 2021 [2] which provide the same dataset to all the participants and publish
the method and the results of each participant method of detection according to this benchmark
dataset.
3. Experimental Results and Submitted Models
3.1. Task dataset
PAN at CLEF 2021 with the subtask “Profiling Hate Speech Spreaders on Twitter” proposed an
original task by asking a model that classify a user to hate speach spreader instead of predicting
if a post is hateful. For each user we were given 200 tweets and we need to classify it as hate
speach spreader or not. The complexity of the task follows from the fact that only 200 users
tweets was given as training set meaning we 200 cluster of 200 tweets and a label for each cluster.
This task must be performed on 2 languages English and Spanish increasing the difficulty since
models giving good results in one language will give less good ones in the other.

     Figure 1: PAN at CLEF 2021: “Profiling Hate Speech Spreaders on Twitter” dataset.


3.2. Basic models
First, we split the tweets written 200 users to train and validation set with 20 percent of the
given data what give us 160 labeled users for the train and 40 ones for validation (with 200
tweets for each user). Like in our precedent work [28], we began with basic model like Support
Vector Classifier (SVC), Multi-Layer Perceptron (MLP) or Logistic Regression but also more
sophisticated one like Random Forest (RF), Ada-Boost Classifier (ABF) and K-Neighbors Clas-
sifier (KN) using classical feature like char ngram features and word ngram features. Some
model gave us very good accuracy but given that the dataset is relatively small this was not
representative. So we retry this experiments using 10 cross-fold validation. We get less good
result but it seems more representative.


3.3. Deep learning models
We realized that using basic model can lead in a significantly lower accuracy on the test set
compared to its cross-validation results so we try going beyond and experiment more deep
approaches using Bert as language representation of tweets. We used pretrained Bert model and
in English we used [27] and for spanish we used [29]. For each tweet we get the corresponding
BERT representation. From there, we tried different method.
                      Table 1: Accuracy results of our first models.
             Language     ML Method          Features                              Result
             English            RF             2000 char 5-grams                    0.665
                                RF             1,000 char 4-grams                   0.655
                               ABF             20,000 char 4-grams                  0.64
                       Majority Baseline                                            0.50
             Spanish            RF             17,000 word 1-grams                  0.81
                                               1,500 char 4-grams
                                LR                                                   0.78
                                               2,500 word 1-grams
                               ABF             3,500 word 1-grams                    0.775
                       Majority Baseline                                             0.50


   First one was by feding into two successive relu-activated dense layers first with 256 out-
features and second with 64 out-features the 200 representated tweets. After that we obtain one
64 vector using a mean operation on 200 vectors. Finally we have a relu-activated dense layers
that classify this to hateful or not.
   The second model we developped take the 200 Bert representation vectors and fed them into
a Bi-LSTM with 2 x 32 features in hidden layer. Finally we have a relu-activated dense layers
that classify the 64 feature output to hateful or not. We use Adamw[30] variant of Adam [31]
algorithm as optimizer for each model 2 . After our first submission we noticed that there was
a rather big gap in English between the result obtained on our development set and the final
result on the test set. So we decided to increase dropout rates and use a BERT model that had
been trained on tweets [32].

                                Figure 2: Description of our models.


3.4. Experimental Results
Firstly, we submit two models the averaging one for English and the one using LSTM for Spanish.
For the English we get an accuracy of 0.70 in our splitted set and 81 for the Spanish one. We get
an accuracy of 0.62 in English and 0.70 in Spanish giving an overall accuracy of 0.66. We then
submitted second time two model for the English one we keep the same but for Spanish we
switch to the averaging model with different training parameter. Surprisingly, the final results

    2
    For more precise details about dropout or batch used we publish the code in github https://github.com/
machouz/pan_transformers
3 . showed that, contrary to our observations, traditional methods give very good results (see

Table 2). The best result was obtened by SiinoDiNuovo getting an accuracy of 0.73 in English
and 85 in Spanish. We tied for 43rd with this result.

          Table 2: Accuracy results of baselines (in bold) and submitted models.
 Model                      English Accuracy       Spanish Accuracy           Average
                           Dev         Test        Dev        Test      Dev        Test
 SiinoDiNuovo                          0.73                   0.85                 0.79
 char nGrams+Logistic                   0.69                   0.83                 0.76
 AveragingBERT             0.72        0.62        0.87       0.76      0.795      0.69
 MBERT-LSTM                             0.59                   0.75                0.67
 Bi-LSTM-BERT              0.62        0.44        0.81       0.74      0.715      0.59
 TFIDF-LSTM                             0.61                   0.51                0.56


4. Conclusions and Future Work
In this paper, we described the submitted models for the Profiling Hate Speech Spreaders on
Twitter task at PAN 2021. Originally, we looked at a number of machine learning models using
basic features. However, we finally turned to more deep learning models. These deep learning
models generally do well in the tasks to which they are submitted and this is what we observed
through our research. Our final model consist of using Bert as language representation, and
Average or LSTM to make the classification. The difficulty here was to deal with the limited
amount of given data. Our overall accuracy in our first submission was 69. Classifying a tweet
post still remain a difficult task considering Twitter-style informal written genres.
   Many tweets contain acronyms that can be presented in different forms. These acronyms
can lead to ambiguity. Future research may look for other ways to lessen this ambiguity.
Acronym disambiguation [33], will extend and enrich the tweet’s text and might enable better
classification. We also suggest examining the usefulness of skip character n-grams because they
serve as generalized ngrams that allow us to overcome problems such as noise and sparse data
[34]. Other ideas that may lead to better classification are to use stylistic feature sets [35], key
phrases [36], and summaries [37].
   Final result shows that more traditional methods may turn out more relevant. These methods
can be combined with k-fold cross-validation (see [38]), especially when, like in this contest,
available data is limited [28].


Acknowledgments
We thank the Bar-Ilan Data Science Institute for kindly providing server for training our models.
Without their support, this research would not have been possible. We are also grateful to the
organizers and reviewers who gave us the opportunity to do this research.
   3
       To see the whole table of results https://pan.webis.de/clef21/pan21-web/author-profiling.html#results
References
 [1] J. Bevendorff, B. Chulvi, G. L. D. L. P. Sarracén, M. Kestemont, E. Manjavacas, I. Markov,
     M. Mayerl, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann, M. Wol-
     ska, , E. Zangerle, Overview of PAN 2021: Authorship Verification,Profiling Hate Speech
     Spreaders on Twitter,and Style Change Detection, in: 12th International Conference of
     the CLEF Association (CLEF 2021), Springer, 2021.
 [2] F. Rangel, G. L. D. L. P. Sarracén, B. Chulvi, E. Fersini, P. Rosso, Profiling Hate Speech
     Spreaders on Twitter Task at PAN 2021, in: CLEF 2021 Labs and Workshops, Notebook
     Papers, CEUR-WS.org, 2021.
 [3] M. Potthast, T. Gollub, M. Wiegmann, B. Stein, TIRA Integrated Research Architecture,
     in: N. Ferro, C. Peters (Eds.), Information Retrieval Evaluation in a Changing World, The
     Information Retrieval Series, Springer, Berlin Heidelberg New York, 2019. doi:10.1007/
     978-3-030-22948-1\_5.
 [4] E. Spertus, Smokey: Automatic recognition of hostile messages, in: Aaai/iaai, 1997, pp.
     1058–1065.
 [5] D. Kaufer, Flaming: A white paper, Department of English, Carnegie Mellon University,
     Retrieved July 20 (2000) 2012.
 [6] J.-M. Xu, K.-S. Jun, X. Zhu, A. Bellmore, Learning from bullying traces in social media, in:
     Proceedings of the 2012 conference of the North American chapter of the association for
     computational linguistics: Human language technologies, 2012, pp. 656–666.
 [7] H. Hosseinmardi, S. A. Mattson, R. I. Rafiq, R. Han, Q. Lv, S. Mishra, Detection of cy-
     berbullying incidents on the instagram social network, arXiv preprint arXiv:1503.03909
     (2015).
 [8] H. Zhong, H. Li, A. C. Squicciarini, S. M. Rajtmajer, C. Griffin, D. J. Miller, C. Caragea,
     Content-driven detection of cyberbullying on the instagram social network., in: IJCAI,
     2016, pp. 3952–3958.
 [9] A. Schmidt, M. Wiegand, A survey on hate speech detection using natural language
     processing, in: Proceedings of the Fifth International workshop on natural language
     processing for social media, 2017, pp. 1–10.
[10] S. Biere, S. Bhulai, M. B. Analytics, Hate speech detection using natural language processing
     techniques, Master Business AnalyticsDepartment of Mathematics Faculty of Science
     (2018).
[11] J. T. Nockleby, Hate speech, Encyclopedia of the American constitution 3 (2000) 1277–1279.
[12] Z. Zhang, D. Robinson, J. Tepper, Detecting hate speech on twitter using a convolution-gru
     based deep neural network, in: European semantic web conference, Springer, 2018, pp.
     745–760.
[13] O. de Gibert, N. Perez, A. García-Pablos, M. Cuadros, Hate speech dataset from a white
     supremacy forum, arXiv preprint arXiv:1809.04444 (2018).
[14] V. Basile, C. Bosco, E. Fersini, N. Debora, V. Patti, F. M. R. Pardo, P. Rosso, M. Sanguinetti,
     et al., Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and
     women in twitter, in: 13th International Workshop on Semantic Evaluation, Association
     for Computational Linguistics, 2019, pp. 54–63.
[15] Z. Zhang, L. Luo, Hate speech detection: A solved problem? the challenging case of long
     tail on twitter, Semantic Web 10 (2019) 925–945.
[16] P. Fortuna, S. Nunes, A survey on automatic detection of hate speech in text, ACM
     Computing Surveys (CSUR) 51 (2018) 1–30.
[17] S. Malmasi, M. Zampieri, Detecting hate speech in social media, arXiv preprint
     arXiv:1712.06427 (2017).
[18] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the
     problem of offensive language, in: Eleventh international aaai conference on web and
     social media, 2017.
[19] D. Robinson, Z. Zhang, J. Tepper, Hate speech detection on twitter: feature engineering vs
     feature selection, in: European Semantic Web Conference, Springer, 2018, pp. 46–49.
[20] P. Burnap, M. L. Williams, Us and them: identifying cyber hate on twitter across multiple
     protected characteristics, EPJ Data science 5 (2016) 11.
[21] P. Burnap, M. L. Williams, Hate speech, machine classification and statistical modelling
     of information flows on twitter: Interpretation and communication for policy decision
     making (2014).
[22] J. Pavlopoulos, P. Malakasiotis, I. Androutsopoulos, Deep learning for user comment
     moderation, arXiv preprint arXiv:1705.09993 (2017).
[23] P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection
     in tweets, in: Proceedings of the 26th International Conference on World Wide Web
     Companion, 2017, pp. 759–760.
[24] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Predicting the type
     and target of offensive posts in social media, arXiv preprint arXiv:1902.09666 (2019).
[25] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align
     and translate, 2014. arXiv:1409.0473.
[26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polo-
     sukhin, Attention is all you need, 2017. arXiv:1706.03762.
[27] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, 2018. arXiv:1810.04805.
[28] M. Uzan, Y. HaCohen-Kerner, Jct at semeval-2020 task 12: Offensive language detection in
     tweets using preprocessing methods, character and word n-grams, in: Proceedings of the
     Fourteenth Workshop on Semantic Evaluation, 2020, pp. 2017–2022.
[29] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert
     model and evaluation data, in: PML4DC at ICLR 2020, 2020.
[30] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint
     arXiv:1711.05101 (2017).
[31] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint
     arXiv:1412.6980 (2014).
[32] D. Q. Nguyen, T. Vu, A. T. Nguyen, BERTweet: A pre-trained language model for English
     Tweets, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language
     Processing: System Demonstrations, 2020.
[33] Y. HaCohen-Kerner, H. Beck, E. Yehudai, D. Mughaz, Stylistic feature sets as classifiers
     of documents according to their historical period and ethnic origin, Applied Artificial
     Intelligence 24 (2010) 847–862.
[34] Y. HaCohen-Kerner, Z. Ido, R. Ya’akobov, Stance classification of tweets using skip char
     ngrams, in: Joint European Conference on Machine Learning and Knowledge Discovery
     in Databases, Springer, 2017, pp. 266–278.
[35] Y. HaCohen-Kerner, A. Kass, A. Peretz, Haads: A hebrew aramaic abbreviation disambigua-
     tion system, Journal of the American Society for Information Science and Technology 61
     (2010) 1923–1932.
[36] Y. HaCohen-Kerner, I. Stern, D. Korkus, E. Fredj, Automatic machine learning of keyphrase
     extraction from short html documents written in hebrew, Cybernetics and Systems: An
     International Journal 38 (2007) 1–21.
[37] Y. HaCohen-Kerner, E. Malin, I. Chasson, Summarization of jewish law articles in hebrew.,
     in: CAINE, 2003, pp. 172–177.
[38] Y. Bengio, Y. Grandvalet, No unbiased estimator of the variance of k-fold cross-validation,
     Journal of machine learning research 5 (2004) 1089–1105.


A. Online Resources
The sources for this work are available via

    • GitHub,

</pre>