=Paper= {{Paper |id=Vol-2936/paper-196 |storemode=property |title=Profiling Hate Speech Spreaders on Twitter: SVM vs. Bi-LSTM |pdfUrl=https://ceur-ws.org/Vol-2936/paper-196.pdf |volume=Vol-2936 |authors=Inna Vogel,Meghana Meghana |dblpUrl=https://dblp.org/rec/conf/clef/VogelM21 }} ==Profiling Hate Speech Spreaders on Twitter: SVM vs. Bi-LSTM== https://ceur-ws.org/Vol-2936/paper-196.pdf
Profiling Hate Speech Spreaders on Twitter: SVM vs.
Bi-LSTM
(Notebook for PAN at CLEF 2021)

Inna Vogel1 , Meghana Meghana1
1
    Fraunhofer Institute for Secure Information Technology SIT, Rheinstrasse 75, Darmstadt, 64295, Germany


                                         Abstract
                                         Hate speech is a crime that has been growing in recent years, especially in online communication. It
                                         can harm the individual or a group of people by targeting their conscious or unconscious intrinsic
                                         characteristics. Additionally, the psychological burden of manual moderation has necessitated the need
                                         for automated hate speech detection methods. In this notebook, we describe our profiling system to the
                                         PAN at CLEF 2021 lab “Profiling Hate Speech Spreaders on Twitter”. The aim of the task is to determine
                                         whether it is possible to identify hate speech spreaders on Twitter automatically. Our final submitted
                                         system uses character 𝑛-grams as features in combination with an SVM and achieves an overall average
                                         accuracy of 69.5% for the English and Spanish datasets. Additionally, we experimented with a Bi-LSTM
                                         model and trained it with Sentence-BERT, achieving slightly worse performance results. The experiments
                                         show that it is difficult to detect solidly hate speech spreaders on Twitter as hate speech is not only the
                                         use of profanity.

                                         Keywords
                                         Author Profiling, Hate Speech Spreaders, SVM, Bi-LSTM




1. Introduction
The Cambridge Dictionary defines hate speech as abusive or threatening speech or writing
that expresses hate or prejudice towards a person or a particular group1 , especially based on
ethnicity, religion, sex, or sexual orientation. Thus said, any characteristics of an individual can
become the target of hate be it gender, nationality, or even educational background. The Internet
and the possibility of communicating anonymously made it additionally an effective vehicle
for spreading hateful and offensive content at an unprecedented rate [1]. Moreover, studies
have highlighted a connection between the spread of hate speech and hate-related crimes [2].
That means, the spread of hate speech has the potential to damage our society, and cause severe
harm to people or entire groups.
   Currently, social media companies such as Twitter and Facebook use human annotators to
manually detect hateful comments and posts2 . Additionally, users are encouraged to report
offensive and potentially harmful content. Given the high volume of messages posted on
social media websites, these methods are time-consuming, expensive, and depend on human
CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
" inna.vogel@sit.fraunhofer.de (I. Vogel); meghana.meghana@sit.fraunhofer.de (M. Meghana)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings         CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073




                  1
                    https://dictionary.cambridge.org/de/worterbuch/englisch/hate-speech
                  2
                    https://www.cnbc.com/2021/02/27/content-moderation-on-social-media.html
judgment. The evident harm and volume of the uncontrolled spread of hate speech [3] and the
psychological burden of manual moderation3 have necessitated the development of automated
hate speech detection methods.
   This problem of detecting hate speech is addressed in this year’s author profiling shared
task of PAN at CLEF 2021 lab4 [4, 5]. Author profiling is the analysis of people’s writing in an
attempt to identify demographic aspects such as age, gender, language variety, or psychographic
aspects such as an author’s personality type [6, 7]. Given a Twitter feed, the final goal of this
year’s challenge is to identify possible hate speech spreaders on Twitter as a first step towards
preventing hate speech from being propagated among online users.
   We propose two different learning experiments. Our final submitted system uses TF-IDF
weighted character 𝑛-grams as features in combination with an SVM. As recurrent neural
networks (RNN) can preserve sequence information over time, and thereby integrate contextual
information better in classification tasks, we additionally experimented with a bidirectional
LSTM (Bi-LSTM) and trained it with Sentence-BERT (SBERT), a modification of the BERT
network. SBERT uses siamese and triplet network structures to derive semantically meaningful
sentence embeddings [8]. Both models were trained on the PAN 2021 corpus provided by
the organizers [9]. The corpus covers two languages: English (EN) and Spanish (ES). The
performance of the systems is ranked by accuracy. Both models have achieved almost the
same classification results. The SVM model performed slightly better than the Bi-LSTM model
achieving an overall accuracy of 64% and 75% on the English and Spanish corpus, respectively
(average 69.5%). The Bi-LSTM model achieved an overall average accuracy of 69%. The results
show that it is not an easy task to differentiate solidly Twitter users who spread hate speech
from those who for the most part follow the platform’s policies and guidelines.
   In the following sections, we describe our approach for the author profiling task at PAN 2021.
After a brief review of related work in Section 2, Section 3 details the Twitter data provided
by the PAN 2021 organizers. Additionally, we show some key statistics observed in the tweets.
Section 4 details the preprocessing steps and features used to train our models. The methodology
and classification results are discussed in Section 5. The last Section 6 concludes our work.


2. Related Work
Mutanga et al. [10] investigated in their study different transformer-based methods for hate
speech detection in Twitter texts. They used a publicly available multi-class hate speech corpus
containing 24,783 tweets. The dataset is highly imbalanced with 77.4% of the tweets labeled
as “neutral”, 16.8% as “Offensive”, and 5.8% as “Hate”. DistilBERT, a distilled version of BERT,
outperformed all other trained methods such as XLNet, RoBERTa or attention-based LSTM
achieving an 𝐹 1-score of 75%.
   Kovács et al. [3] used a combination of Convolutional and Long Short-Term Memory (LSTM)
neural networks to detect hate speech in social media. The model was applied to the HASOC2019
corpus and attained a macro 𝐹 1-score of 63%. The authors also conducted experiments with

    3
     https://www.theguardian.com/technology/2019/sep/17/revealed-catastrophic-effects-working-facebook-moderator
    4
     PAN at CLEF 2021 “Profiling Hate Speech Spreaders on Twitter”: https://pan.webis.de/clef21/pan21-web/
author-profiling.html
RoBERTa and FastText as feature extractors. As the training data was limited, different
methods for expanding resources, such as leveraging unlabeled data or similarly labeled corpora,
were explored. Their results show that classification results could be significantly increased by
leveraging additional data.
   A major challenge for the automatic detection of hate speech on social media is the separation
between hate speech and instances of offensive language. Davidson et al. [11] first collected
tweets using hate speech keywords. Crowdsourcing was used to label the tweets into the
following three categories: “hate speech”, “offensive language”, and “neither”. A multi-class
classifier was then trained to distinguish between the three categories. The best performing
model achieved an overall 𝐹 1-score of 90%. However, the confusion matrix revealed that almost
40% of the hate speech tweets were misclassified.


3. Dataset and Corpus Analysis
To train our system, we used the PAN 2021 author profiling corpus5 proposed by Rangel et al. [9].
The corpus consists of 200 English (EN) and Spanish (ES) Twitter authors each. The tweets are
stored in an XML file containing 200 tweets per author. Every tweet is stored in a 
XML tag. The dataset is balanced, which means the data refers to an equal distribution of class
instances. Half of the documents per language folder are authors that have been identified
sharing hate speech. The other half are texts from users who may share offensive tweets but
could not be identified as hate speech spreaders. Table 1 shows excerpts from the corpus6 .
Every author received an alphanumeric author-ID which is stored in a separate text file together
with the corresponding class affiliation. For training and testing, we split the data in the ratio
of 70/30. The gold standard can only be accessed through the TIRA [12] evaluation platform
provided by the PAN organizers. The results are hidden from the participants and can only be
unblinded by the organisers.
   It is important to note that the classes are not predefined by the organizers. We assume that
class 0 refers to hate speech spreaders. Nevertheless, since the organisers do not explicitly
define classes 0 and 1, we have kept the class names as originally proposed. As can be seen in
Table 1, the Twitter-specific tokens such as hashtags, URLs, and user mentions were replaced by
the providers with the following placeholders: #HASHTAG#, #URL# and #USER#. The examples
provided in Table 1 were chosen carefully to show that insults and profanities are used by
hate speech spreaders as well as by other users. Additionally, Twitter-specific text significantly
contributes to the difficulty of automatic hate speech detection, as the posts contain plenty of
poorly written text and paralinguistic signals such as emoticons, @-mentions, and hashtags.
Prior to feature engineering (described in Section 4), we analysed the distribution of different
tokens. Table 2 shows some key insights for both languages.
   We observed the distribution of specific tokens to see whether we could use these for the
features engineering process. Unfortunately, we could not spot any significant differences
between the classes. Therefore, to train our model, we did not use features mentioned in Table 2.

    5
     https://zenodo.org/record/4603578#.YKZKqKgzZaQ
    6
     The selected tweets are used for demonstration and research purposes only and do not reflect the opinion of
the authors.
Table 1
English (EN) and Spanish (ES) excerpts from the PAN 2021 “Hate Speech Spreaders on Twitter” data.
               Class 0 Tweets (EN & ES)                     Class 1 Tweets (EN & ES)
       “#USER# #USER# Trump, that mother- “Kappa They gon be beating my fodder
       fucker is guilty of cowardice while being ninjas asses weak ass punks and i wont
       Commander-in-Chief #HASHTAG#.”                 even be laughing on the outside :-)”
       “RT #USER#: If a nigga taking care of me “Shut your fucking mouth i have no ill will
       i’m fasho taking care of him. it’s really that towards Kaep but he’s not even close lmao
       simple.”                                       #URL#”
       “RT #USER#: Celebrities are so useless and “#USER# All the people shit talkin this are
       corny B*tch what the fuck does this even trippin, i’d pipe tf out if an old lady if she
       mean?”                                         was payin for all my shit”
       “#USER# #USER# Mordes la mano de “#USER# Pos pa tu tierra sucnormal hi-
       quien de da. De comer eres un cancer para jadeputa”
       nuestro pais #URL#”
       “Los varones opinando sobre el feminismo “#USER# Ostia tio que palo metió el jo-
       #HASHTAG#. Nos sorprende? No nos sor- dido”
       prende”
       “Que pinches perras ganas de estar co- “RT #USER#: Qué horror. Condenado a 15
       giendo con Ale”                                años de prisión por dejar embarazada a su
                                                      hija tras un año de violaciones #URL#”

Table 2
Feature distribution of the PAN 2021 “Hate Speech Spreaders” dataset
                                             English                           Spanish
   Features                      Class 0          Class 1          Class 0          Class 1
   Unique Tokens                 20,280           19,298           28,806           28,761
   Emojis Total                  8,465            7,201            7,942            7,949
   Emojis Unique                 531              540              546              449
   Uppercased Tokens Total       44,316           42,135           34,172           41,950
   Uppercased Phrases Total      1,026            1,243            1,792            1,871
   #URL# Token                   8,556            6,759            5,865            6,897
   #HASHTAG# Token               3,644            3,290            1,864            1,658
   #USER# Token                  17,250           17,585           16,014           22,088
   Retweets (RT)                 7,731            6,159            6,824            7,084


4. Preprocessing and Feature Extraction
The preprocessing pipeline to clean and structure the data was performed for both languages
(EN and ES) and models as follows:

    • The text from the original XML document was extracted and all 200 tweets per author
      were concatenated to one text.
    • The white-space between the tokens has been reduced to a single space.
    • The placeholders #USER#, #URL#, #HASHTAG#, and RT were removed.
    • HTML characters were converted to Unicode characters (e.g.: “>”, “<”, “&” to “&”,
      “>”, “<”).
    • Emojis were converted to text format by using Python’s emoji library.
    • The text was lowercased.
    • Irrelevant signs, e.g. “+,*,/” were deleted.
    • Alphanumeric tokens were separated (e.g. “Berlin2018” to “Berlin 2018”).
    • Sequences of repeated characters with a length greater than three were normalized to a
      maximum of two letters (e.g. “LOOOOOOOOL” to “LOOL”).
    • Words with less than three characters were ignored (except for the Bi-LSTM model for
      the English language).
    • Stopwords were deleted (except for the Bi-LSTM model for the English language).
    • As the last step, we lemmatized the English tweets for the TF-IDF character 𝑛-gram SVM
      model using WordNetLemmatizer.

Besides the different preprocessing steps, we also experimented with different vectorization
techniques and hyperparameter tuning by employing scikit-learn’s grid search function. The
hyperparameters were tuned separately for English and Spanish. We experimented with emo-
tional signals and lists of hate words as handcrafted features as well as with automatically
learned features. The best results were achieved by using Scikit-learn’s term frequency-inverse
document frequency (TF-IDF) vectorizer and Sentence-BERT (SBERT), a BERT model modifi-
cation that uses siamese and triplet network structures to generate semantically meaningful
sentence embeddings [8]. For the English language, we used the sentence transformer model
stsb-distilbert-base7 and for Spanish distiluse-base-multilingual-cased-v1,
a multilingual knowledge distilled version of multilingual Universal Sentence Encoder [13]. The
models were trained with a maximum of 200 sentences per author, based on the 200 tweets per
author and file.
   For the SVM model, we employed TF-IDF weighted character 𝑛-grams. In English, the best
results were achieved using a maximum of 1,250 features (min_df=5) and character 𝑛-grams
with range [3;7]. For Spanish, we used top 2,350 features (min_df=5) and character 𝑛-grams
with range [2;7].


5. Methodology
We defined this year’s PAN author profiling task “Hate Speech Spreaders on Twitter” as a binary
classification problem. For each language (EN and ES) we trained two different models. We
tested different features and vectorization techniques with a Support Vector Machine (SVM).
Additionally, we experimented with bidirectional LSTM (Bi-LSTM) models as recurrent neural
networks (RNN) have shown that they can preserve sequence information over time and thereby
integrate contextual information in classification tasks.
   For the final SVM model, we trained a linear kernel and set the penalty parameter C=10 for
the English data. For the Spanish corpus, we trained the SVM with the radial basis function

   7
       https://huggingface.co/sentence-transformers/stsb-distilbert-base
Table 3
Hyperparameters for the Bi-LSTM model
                        Bi-LSTM                        English               Spanish
          Bi-LSTM layer memory units                      32                     7
          Dropout                                        0.2                     0
          First dense layer memory units                  32                     7
          Activation function                           ReLU                   ReLU
          Second dense layer memory units                 24                     5
          Activation function                           ReLU                   ReLU
          Activation function in output layer         Sigmoid                Sigmoid
          Loss Function                          Binary crossentropy    Binary crossentropy
          Optimizer                                     Adam                   Adam

Table 4
Accuracy (Acc.) scores of the final systems on the official PAN 2021 test dataset on Tira
   Model      Features                                                   Language     Acc.    Av. Acc.
  SVM         TF-IDF char 𝑛-grams [3;7], 1,250 features                     EN        64%
                                                                                              69.5%
  SVM         TF-IDF char 𝑛-grams [2;7], 2,350 features                     ES        75%
  Bi-LSMT     SBERT stsb-distilbert-base                                    EN        59%
                                                                                               69%
  Bi-LSTM     SBERT distiluse-base-multilingual-cased-v1                    ES        79%


kernel (RBF) and C=5. The performance was ranked by accuracy. Table 4 shows the scores
for our final system performed on the official PAN 2021 test set on the TIRA platform [12].
Accuracy scores are calculated individually for each language by discriminating between two
classes. Each model was trained on 70% of the training data provided by the organizers. On
the remaining 30% split hyperparameters were tuned. The highest accuracy on the test set
using SVM with TF-IDF weighted character 𝑛-grams was 64% for the English dataset and 75%
for the Spanish dataset. The accuracy dropped to 59% for the English dataset using Bi-LSTM
in combination with SBERT, while it increased by 4% achieving 79% accuracy on the Spanish
dataset. Therefore, we submitted the SVM model as our final hate speech detection system as
it achieved an overall average accuracy of 69.5% performing slightly better than the Bi-LSTM
model which achieved an average accuracy of 69% for both languages. The final accuracy scores
of both systems are listed in Table 4. To make our Bi-LSTM model reproducible, we have listed
all hyperparameters used to train the Bi-LSTM model in Table 3.


6. Discussion and Conclusion
In this paper, we described our participation in the PAN 2021 author profiling task. The
goal was to develop a system that can detect Twitter users who spread hate speech on a
regular basis. First, we observed the distribution of specific tokens in the tweets like the usage
of emojis or user mentions to see whether we could use these for the feature engineering
process. Unfortunately, we could not spot any significant differences between the two classes.
Furthermore, we experimented with emotional signals and dictionaries listing hate words as
handcrafted features in addition to automatically learned features. In relation to this, we could
not detect any difference in emotions between the two classes and have shown that insults and
profanities are not a discriminative features of hate speech spreaders and other users.
   Our final submitted system uses an SVM with TF-IDF weighted character 𝑛-grams. This model
performed best for the English language. To detect hate speech spreaders in Spanish tweets, a
bidirectional LSTM (Bi-LSTM) trained with Sentence-BERT achieved better classification results.
The SVM model achieved an average accuracy of 69.5% for both languages which is slightly
better than the Bi-LSTM model (69%).
   The experiments show that it is challenging to detect hate speech spreaders on Twitter. It
is challenging in different ways. First, we have shown that insults and profanities are not
only used by hate speech spreaders, but also by users who do not offend other individuals or
groups. Additionally, Twitter posts contain plenty of poorly written text (spelling mistakes,
abbreviations, etc.) and paralinguistic signals such as emoticons, @-mentions, and hashtags. In
the future, we want to make the classification results interpretable to analyse how hate words
and the context in which they are expressed contribute to the classification.


Acknowledgements
This work was supported by the German Federal Ministry of Education and Research and the
Hessen State Ministry for Higher Education, Research and the Arts within their joint support of
the National Research Center for Applied Cybersecurity ATHENE and under grant agreement
"Lernlabor Cybersicherheit" (LLCS) for cyber security research and training.


References
 [1] M. Mohiyaddeen, S. Siddiqui, Automatic hate speech detection: A literature review,
     International Journal of Engineering and Management Research 11 (2021) 116–121. URL:
     https://www.ijemr.net/ojs/index.php/ojs/article/view/766.
     doi:10.31033/ijemr.11.2.17.
 [2] S. Agarwal, A. Sureka, Using knn and svm based one-class classifier for detecting online
     radicalization on twitter, in: R. Natarajan, G. Barua, M. R. Patra (Eds.), Distributed
     Computing and Internet Technology, Springer International Publishing, Cham, 2015, pp.
     431–442.
 [3] G. Kovács, P. Alonso, R. Saini, Challenges of hate speech detection in social media, SN
     Computer Science 2 (2021). doi:10.1007/s42979-021-00457-3.
 [4] J. Bevendorff, B. Chulvi, G. L. Sarracén, M. Kestemont, E. Manjavacas, I. Markov,
     M. Mayerl, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann,
     M. Wolska, E. Zangerle, Overview of pan 2021: Authorship verification, profiling hate
     speech spreaders on twitter, and style change detection, in: 12th International Conference
     of the CLEF Association (CLEF 2021), Springer, 2021.
 [5] F. Rangel, G. L. Sarracén, B. Chulvi, E. Fersini, P. Rosso, Profiling hate speech spreaders on
     twitter task at pan 2021, in: A. J. M. M. F. P. Guglielmo Faggioli, Nicola Ferro (Ed.), CLEF
     2021 Labs and Workshops, Notebook Papers, CEUR-WS.org, 2021.
 [6] P. Rosso, F. Rangel Pardo, Author profiling tracks at fire, SN Computer Science 1 (2020).
     doi:10.1007/s42979-020-0073-1.
 [7] C. A. Russell, B. H. Miller, Profile of a terrorist, Studies in conflict & terrorism 1 (1977)
     17–34. URL: https://doi.org/10.1080/10576107708435394.
     arXiv:https://doi.org/10.1080/10576107708435394.
 [8] N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using Siamese
     BERT-networks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural
     Language Processing and the 9th International Joint Conference on Natural Language
     Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong,
     China, 2019, pp. 3982–3992. URL: https://www.aclweb.org/anthology/D19-1410.
     doi:10.18653/v1/D19-1410.
 [9] F. Rangel, G. L. Sarracén, B. Chulvi, E. Fersini, P. Rosso, Profiling hate speech spreaders on
     twitter, 2021. URL: https://doi.org/10.5281/zenodo.4603578.
     doi:10.5281/zenodo.4603578.
[10] R. Mutanga, N. Naicker, O. O. Olugbara, Hate speech detection in twitter using
     transformer methods, International Journal of Advanced Computer Science and
     Applications 11 (2020). doi:10.14569/IJACSA.2020.0110972.
[11] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the
     problem of offensive language, in: Proceedings of the International AAAI Conference on
     Web and Social Media, volume 11, 2017.
[12] M. Potthast, T. Gollub, M. Wiegmann, B. Stein, Tira integrated research architecture, in:
     N. Ferro, C. Peters (Eds.), Information Retrieval Evaluation in a Changing World, The
     Information Retrieval Series, Springer, Berlin Heidelberg New York, 2019.
     doi:10.1007/978-3-030-22948-1\_5.
[13] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller,
     faster, cheaper and lighter, ArXiv abs/1910.01108 (2019).