=Paper=
{{Paper
|id=Vol-2936/paper-167
|storemode=property
|title=Hate speech spreader detection using contextualized word embeddings
|pdfUrl=https://ceur-ws.org/Vol-2936/paper-167.pdf
|volume=Vol-2936
|authors=Evgeny Finogeev,Mariam Kaprielova,Artem Chashchin,Kirill Grashchenkov,George Gorbachev,Oleg Bakhteev
|dblpUrl=https://dblp.org/rec/conf/clef/FinogeevKCGGB21
}}
==Hate speech spreader detection using contextualized word embeddings==
<pdf width="1500px">https://ceur-ws.org/Vol-2936/paper-167.pdf</pdf>
<pre>
Hate speech spreader detection using contextualized
word embeddings
Notebook for PAN at CLEF 2021

Evgeny Finogeev1 , Mariam Kaprielova1,2 , Artem Chashchin1,3 ,
Kirill Grashchenkov1,3,4 , George Gorbachev1 and Oleg Bakhteev1,2,5
1
  Antiplagiat, Moscow, Russia
2
  Moscow Institute of Physics and Technology (MIPT), Moscow, Russia
3
  Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia
4
  Institute of Oceanology (IO RAS), Moscow, Russia
5
  Dorodnicyn Computing Center RAS, Moscow, Russia


                                         Abstract
                                         he paper presents a method of hate speech spreaders recognition developed for the task of “Profiling
                                         Hate Speech Spreaders on Twitter” at the PAN@CLEF Conference 2021. Hate speech is an increasing
                                         problem nowadays. Due to the spread of the internet and the rise of social media, people with hostile
                                         views towards certain groups post hateful messages on social media resources. In this paper, we present
                                         a model to detect hate speech spreaders based on their Twitter posts. We aggregate contextualized
                                         embeddings of single tweets to form a vector representation for every user and employ classification
                                         methods to find users spreading hate speech. We analyze different embedding models based on BERT
                                         architecture for the problem. The submitted model achieves 67% in terms of accuracy for the English
                                         part of the dataset and 83% for the Spanish part of the dataset.

                                         Keywords
                                         BERT, contextualized word embeddings, deep learning,


1. Introduction
Hate speech detection in social networks is a significant social issue nowadays. A large number
of works is devoted to hate speech detection in the social media domain [1, 2, 3]. There are
variety of methods of hate speech detection. Most of them consider the problem as a supervised
document classification task [1]. The approaches can be divided into two categories: either based
on manual feature engineering [2, 3] and using classic methods or deep learning-based models
which employ neural networks to automatically learn abstract features from raw data [4, 5].
   Hate Speech Spreaders profiling task considers the problem of hate speech detection in social
networks [6, 7]: given a set of tweets written by a user, one should establish, whether the user
can potentially spread hate speech or not. The tweets are written in two languages: English

CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
   finogeev@ap-team.ru (E. Finogeev); kaprielova@ap-team.ru (M. Kaprielova); chashchin@ap-team.ru
(A. Chashchin); grashchenkov@ap-team.ru (K. Grashchenkov); gorbachev@ap-team.ru (G. Gorbachev);
bakhteev@ap-team.ru (O. Bakhteev)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
and Spanish. The training dataset contains 200 sets of tweets for each language. There are 100
sets for users that can spread hate speech and 100 sets for ordinary users. The performance
metric for this task is accuracy.
   The main challenge of the current task is that the analysed object is not a single tweet, but a
collection of messages posted by a user. Even if the developed method detects a content without
hate speech in some of the user’s posts, the same user still can be classified as a potential hate
speech spreader. In this way, hate speech detection is similar to the previous fake news detection
task [8]. The key idea for many approaches to such tasks is to aggregate information contained
in all the texts written by the author. We employ a neural network-based approach for such
aggregation.
   Recent studies show that approaches, based on embedding generation and their further
classification can be quite effective in hate speech detection. For example, in [9] authors present
a solution involving hybrid embeddings based on TF-IDF for word-level feature extraction,
LSTM for sentence-level feature extraction and also naïve Bayes to extract topics from tweets.
Those embeddings are then used as an input for improved cuckoo search neural network. In [10]
authors proposed an approach which first used LSTM to perform word embeddings and then
CNN was used for hate speech classification. One of the most impressive deep learning-based
architectures to obtain semantic information stored in short texts is BERT [11]. The embeddings
obtained from BERT are used to classify whether a user can spread hate speech or not. In this
paper, we employ different BERT-based architectures to obtain embeddings for every tweet in
the collection. We also analyze model performance after fine-tuning on the external corpus of
tweets [12].


2. Methodology
The following section describes the proposed method of hate speech spreader detection.
  Similar to the previous year’s task [13], we consider the current task as a classification
problem, where the classified object is a collection of tweets.
  There is a given dataset D = (𝑥𝑖 , 𝑦𝑖 ):

              𝑥𝑖 = {𝑥1𝑖 , . . . , 𝑥𝑚
                                   𝑖 },   𝑥𝑗𝑖 ∈ W+ ,     𝑗 ∈ {1, . . . , 𝑚},   𝑦𝑖 ∈ {0, 1},

where W+ corresponds to all the possible strings written in the given language. The label 𝑦𝑖 = 1
corresponds to users that are likely to spread hate speech, 𝑦𝑖 = 0 corresponds to ordinary users.
  Formally, the task is to find the binary classifier that minimizes an empirical risk on the
dataset D:                                      ∑︁
                                𝑓 = arg min          [𝑓 (𝑥𝑖 ) ̸= 𝑦𝑖 ],
                                           𝑓 ∈F   𝑥𝑖 ,𝑦𝑖 ∈D

where F is a set of all considered classification models.
  For the proposed problem solution we employ a deep learning-based approach: each tweet is
vectorized using a deep learning model. After that, we aggregate the obtained vectors and use
the resulting averaged vector as a feature set for the classifier.
  The proposed method is illustrated in Figure 1. The method consists of 3 main components:
Table 1
Overall results for the experiments.
                                    English part of the dataset
 Model                                               Cross-validation accuracy   Test accuracy
 Character n-grams, 𝑑 = 20, 𝑛 = 3                    0.68                        -
 BERT-multilingual, SVM (kernel — RBF)               0.69                        0.58
 BERT-multilingual, Logistic regression              0.64                        -
 LaBSE, SVM (kernel — RBF)                           0.63                        -
 BERT-base, SVM (kernel — RBF)                       0.70                        -
 BERT-base, Logistic regression                      0.68                        -
 BERT-base with fine-tuning, SVM (kernel — RBF) 0.72                             0.67
 BERT-base with fine-tuning, Logistic regression     0.68                        -
                                    Spanish part of the dataset
 Model                                               Cross-validation accuracy   Test accuracy
 Character n-grams, 𝑑 = 50, 𝑛 = 6                    0.79                        -
 BERT-multilingual, SVM (kernel — RBF)               0.80                        0.83
 BERT-multilingual, Logistic regression              0.79                        -
 LaBSE, SVM (kernel — RBF)                           0.8                         -


   1. BERT model that extracts embedding vector e𝑗𝑖 from every tweet 𝑥𝑗𝑖 .
   2. Aggregation operation that averages the information across all the tweets of the author:
                                                      𝑚
                                                   1 ∑︁
                                           e𝑗𝑖 =        e𝑖 .
                                                   𝑚
                                                     𝑗=1

   3. The resulting classifier that uses averaged tweet vector e𝑖 as an input feature set.


Figure 1: The scheme of the proposed approach.


3. Experiment Details
To validate our models we conducted a computational experiment. Since we use a standard
BERT tokenizer, we did not use any special preprocessing for our method. For all the analyzed
models we used 𝑘-fold cross-validation with 𝑘 = 10. The results are presented in Table 1. The
test accuracy was evaluated using TIRA environment [14].
   Inspired by the author profiling task from the previous year [8] we used a char 𝑛-gram based
model as a baseline. We used principal component analysis [15] with component number 𝑑 to
reduce the feature set and make the 𝑛-gram feature space denser. The n-gram order 𝑛 and the
component number 𝑑 was selected using cross-validation.
   Since the dataset contains texts in two languages, we initially decided to use a pretrained
multilingual BERT model [11] for the vectorization. We analyzed its performance with two
classifiers: support vector machine [16] and logistic regression. For each classification algorithm,
the optimal parameters for cross-validation were selected using GridSearch. Based on the results
in Table 1, it is clear that this approach works well for Spanish, but shows poor performance
on English data. We decided to use Multilingual BERT for the processing of Spanish and use a
different model for English. In order to improve our quality in the English part of the dataset
we analyzed the following models:

    • LaBSE [17], a multilingual model with BERT architecture. The main feature of this model
      is that all the languages supported by LaBSE share a common hidden space that can
      potentially improve our model generalization.
    • BERT-Base, the pretrained model that was trained only on the English texts;
    • BERT-base tuned on the external corpus of the tweets. The main idea of this approach is
      to fine-tune the lexical and semantic properties of the model on the tweet corpus in order
      to make the model more receptive to the specific Twitter lexicon.

   Note that since the training dataset has rather small information for BERT model tuning,
we did not consider the idea of fine-tuning straightforwardly on the current training dataset.
Therefore we only tuned the model on the external tweet collection in an unsupervised manner
it was originally trained. Some tokens in the sentences were masked, and the model tried to
predict them. The predicted tokens were fed into a softmax layer to get the output words. Such
procedure does not require additional labeling of external data and allows BERT to focus more
on the Twitter posts than the English and Spanish texts in general.

LaBSE vectorization. Although the LaBSE model shares the common hidden space for all
the languages including Spanish and English, we did not find any quality improvement in
comparison to the basic architecture and other methods.

BERT-base. The pretrained BERT-base model was trained only on English texts, so we suggest
that using it as a vectorizer for English tweets might improve accuracy. Our assumption that
BERT-base single-language model shows better performance than the multilingual model on an
English-only task is confirmed by cross-validation results. When using this model, we did not
make any changes to the architecture.

BERT-based model fine-tuned on the external tweet corpus. Given that BERT model
base was trained on a large collection of English texts from Wikipedia , we assume that it needs
fine-tuning on the data related to our problem, since tweets often use specific slang, syntax,
emoji etc. In order to improve the quality of vector representation we took a large corpus
of tweets [12], which contains 1,600,000 tweets in English, and fine-tuned the model on this
dataset in masked language model task regime. For all 1,600,000 tweets from the additional
dataset, we replaced links, hashtags and user mentions with tokens from the training dataset
(#URL#, #HASHTAG#, #USER# respectively). We also added these tokens to the BERT tokenizer
dictionary. The use of additional training improved the accuracy.
   As we can see, the resulting model shows a competitive performance in comparison to the
baseline model. The cross-validation accuracy has increased from 0.68 to 0.72 on English part of
the dataset and from 0.79 to 0.8 on its Spanish part. We got a rather significant improvement on
English data by tuning the model in an unsupervised manner on the external corpus of tweets.
We also observe a slight increase in performance on Spanish tweets.
   The submitted English model achieves 67% in terms of accuracy on the test set, while the
Spanish model shows the accuracy 83%.
   For further analysis, we made a t-SNE transformation [18] for each averaged user vector e𝑖
in two dimensional space and visualized them according to their class. The result is shown in
Figure 2,3 for English and Spanish, respectively.


Figure 2: The t-SNE projection for the English part of the dataset.


   As we can see, the obtained projection poorly separates averaged user vectors, which indicates
that this method needs to be potentially improved.
   We also built an LDA topic model [19] for the English part of the dataset in order to analyze
the topic distributions across the tweets. For the topic model, we used 10 topics. The resulting
t-SNE projection is shown in Figure 3. We analyzed the words mostly corresponding to each
topic. Although we could not find a total interpretability of the obtained topics we noticed that
some of them correspond to the US politics (“Biden”, “Trump”, “President”, “Wall”, “American”
in the top words of the topic), US elections ( “Biden”, “Trump”, “Election”, “Voters”, “Democrats”,
“Republicans”, “Court” in the top words of the topic), COVID-19 (“Covid”, “19”, “Virus” in the
top words of the topic). This makes us believe that clustering tweets using some topic modeling
approach and employing information about topic distribution across the user tweets for the
decision about hate speech spreading.
Figure 3: The t-SNE projection for the Spanish part of the dataset.


Figure 4: The t-SNE projection for topics obtained from the English part of the dataset. The light-green
points correspond to the tweets about US politics, the black points correspond to the tweets about the
US presidential elections, the orange points correspond to the tweets about COVID-19.
4. Conclusion
The paper describes an approach to the problem of hate speech spreaders detection. We proposed
a model to detect hate speech spreaders using contextualized embeddings of single tweets. We
aggregate the obtained vectors and use their average vector as a feature set for the further
classification. We also provide an analysis of different vectorization models based on the BERT
architecture. The resulting model shows an accuracy of about 67% for the English test dataset
and 83% for the Spanish test dataset. The future work includes a detailed analysis of topic
distribution across the tweets and usage of this information in the final decision rule.


Acknowledgments
This research is funded by RFBR, grant 19-29-14100.


References
 [1] Z. Zhang, L. Luo, Hate speech detection: A solved problem? the challenging case of long
     tail on twitter, Semantic Web 10 (2019) 925–945.
 [2] P. Burnap, M. L. Williams, Cyber hate speech on twitter: An application of machine
     classification and statistical modeling for policy and decision making, Policy & internet 7
     (2015) 223–242.
 [3] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the
     problem of offensive language, in: Proceedings of the International AAAI Conference on
     Web and Social Media, volume 11, 2017.
 [4] B. Gambäck, U. K. Sikdar, Using convolutional neural networks to classify hate-speech, in:
     Proceedings of the first workshop on abusive language online, 2017, pp. 85–90.
 [5] K. Miok, B. Škrlj, D. Zaharie, M. Robnik-Šikonja, To ban or not to ban: Bayesian attention
     networks for reliable hate speech detection, Cognitive Computation (2021) 1–19.
 [6] J. Bevendorff, B. Chulvi, G. L. D. L. P. Sarracén, M. Kestemont, E. Manjavacas, I. Markov,
     M. Mayerl, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann, M. Wol-
     ska, , E. Zangerle, Overview of PAN 2021: Authorship Verification,Profiling Hate Speech
     Spreaders on Twitter,and Style Change Detection, in: 12th International Conference of
     the CLEF Association (CLEF 2021), Springer, 2021.
 [7] F. Rangel, G. L. D. L. P. Sarracén, B. Chulvi, E. Fersini, P. Rosso, Profiling Hate Speech
     Spreaders on Twitter Task at PAN 2021, in: CLEF 2021 Labs and Workshops, Notebook
     Papers, CEUR-WS.org, 2021.
 [8] F. Rangel, A. Giachanou, B. Ghanem, P. Rosso, Overview of the 8th author profiling task
     at pan 2020: Profiling fake news spreaders on twitter, in: CLEF, 2020.
 [9] F. E. Ayo, O. Folorunso, F. T. Ibharalu, I. A. Osinuga, Hate speech detection in twitter using
     hybrid embeddings and improved cuckoo search-based neural networks, International
     Journal of Intelligent Computing and Cybernetics (2020).
[10] H. Faris., I. Aljarah., M. Habib., P. Castillo., Hate speech detection using word embedding
     and deep learning in the arabic language context, in: Proceedings of the 9th Interna-
     tional Conference on Pattern Recognition Applications and Methods - ICPRAM„ INSTICC,
     SciTePress, 2020, pp. 453–460. doi:10.5220/0008954004530460.
[11] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[12] A. Go, R. Bhayani, L. Huang, Twitter sentiment classification using distant supervision,
     CS224N project report, Stanford 1 (2009) 2009.
[13] O. Bakhteev, A. Ogaltsov, P. Ostroukhov, Fake news spreader detection using neural tweet
     aggregation, in: CLEF, 2020.
[14] M. Potthast, T. Gollub, M. Wiegmann, B. Stein, TIRA Integrated Research Architecture,
     in: N. Ferro, C. Peters (Eds.), Information Retrieval Evaluation in a Changing World, The
     Information Retrieval Series, Springer, Berlin Heidelberg New York, 2019. doi:10.1007/
     978-3-030-22948-1\_5.
[15] C. M. Bishop, Pattern recognition and machine learning, springer, 2006.
[16] C. Cortes, V. Vapnik, Support vector machine, Machine learning 20 (1995) 273–297.
[17] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, W. Wang, Language-agnostic bert sentence
     embedding, arXiv preprint arXiv:2007.01852 (2020).
[18] L. Van der Maaten, G. Hinton, Visualizing data using t-sne., Journal of machine learning
     research 9 (2008).
[19] K. Vorontsov, O. Frei, M. Apishev, P. Romov, M. Dudarenko, Bigartm: Open source
     library for regularized multimodal topic modeling of large collections, in: International
     Conference on Analysis of Images, Social Networks and Texts, Springer, 2015, pp. 370–381.

</pre>