=Paper= {{Paper |id=Vol-3033/paper38 |storemode=property |title=Policycorpus XL: An Italian Corpus for the Detection of Hate Speech Against Politics |pdfUrl=https://ceur-ws.org/Vol-3033/paper38.pdf |volume=Vol-3033 |authors=Fabio Celli,Mirko Lai,Armend Duzha,Cristina Bosco,Viviana Patti |dblpUrl=https://dblp.org/rec/conf/clic-it/CelliLDBP21 }} ==Policycorpus XL: An Italian Corpus for the Detection of Hate Speech Against Politics== https://ceur-ws.org/Vol-3033/paper38.pdf
                             Policycorpus XL:
     An Italian Corpus for the Detection of Hate Speech Against Politics
          Fabio Celli1 , Mirko Lai2 , Armend Duzha1 , Cristina Bosco2 , Viviana Patti2
                       1. Research & Development, Gruppo Maggioli, Italy
                         2. Dept. of Informatics, University of Turin, Italy
                               fabio.celli@maggioli.it, mirko.lai@unito.it,
                  armend.duzha@maggioli.it, bosco@di.unito.it, patti@di.unito.it



                        Abstract                                Hate speech is defined as any expression that is
                                                                abusive, insulting, intimidating, harassing, and/or
    In this paper we describe the largest cor-                  incites, supports and facilitates violence, hatred,
    pus annotated with hate speech in the po-                   or discrimination. It is directed against people
    litical domain in Italian. Policycorpus XL                  (individuals or groups) on the basis of their race,
    has 7000 tweets, manually annotated, and                    ethnic origin, religion, gender, age, physical con-
    a presence of hate labels above 40%, while                  dition, disability, sexual orientation, political con-
    in other corpora of the same type is usu-                   viction, and so forth (Erjavec and Kovačič, 2012).
    ally below 30%. Here we describe the                        In response to the growing number of hate mes-
    collection of data and test some baseline                   sages, the Natural language Processing (NLP)
    with simple classification algorithms, ob-                  community focused on the classification of hate
    taining promising results. We suggest that                  speech (Badjatiya et al., 2017) and the analysis
    the high amount of hate labels boosts the                   of online debates (Celli et al., 2014). In particu-
    performance of classifiers, and we plan to                  lar, many worked on systems to detect offensive
    release the dataset in a future evaluation                  language against specific vulnerable groups (e.g.,
    campaign.                                                   immigrants, LGBTQ communities among others)
                                                                (Poletto et al., 2017) (Poletto et al., 2021), as well
                                                                as aggressive language against women (Saha et
1    Introduction and Background                                al., 2018). An under-researched - yet important -
                                                                area of investigation is anti-politics hate: the hate
In recent years, computer mediated communica-                   speech against politicians, policy makers and laws
tion on social media and microblogging websites                 at any level (national, regional and local). While
has become more and more aggressive (Watanabe                   anti-policy hate speech has been addressed in Ara-
et al., 2018). It is well known that people use so-             bic (Guellil et al., 2020) and German (Jaki and
cial media like Twitter for a variety of purposes               De Smedt, 2019), most European languages have
like keeping in touch with friends, raising the vis-            been under-researched. The bottleneck in this field
ibility of their interests, gathering useful informa-           of research is the availability of data to train good
tion, seeking help and release stress (Zhao and                 hate speech detection models. In recent years, sci-
Rosson, 2009), but the spread of fake news (Shu                 entific research contributed to the automatic detec-
et al., 2019; Alam et al., 2016) has exacerbated a              tion of hate speech from text with datasets anno-
cultural clash between social classes that emerged              tated with hate labels, aggressiveness, offensive-
at least since after the debate about Brexit (Celli             ness, and other related dimensions (Sanguinetti et
et al., 2016) and more recently during the pan-                 al., 2018). Scholars have presented systems for the
demics (Oliver et al., 2020). Despite the fact that             detection of hate speech in social media focused
the behavior online is different from the behav-                on specific targets, such as immigrants (Del Vi-
ior offline (Celli and Polonio, 2015), we observe               gna et al., 2017), and language domains, such as
more and more hate speech in social media, to the               racism (Kwok and Wang, 2013), misogyny (Basile
point where it has become a serious problem for                 et al., 2019) or cyberbullying (Menini et al., 2019).
free speech and social cohesion.                                Each type of hate speech has its own vocabulary
     Copyright © 2021 for this paper by its authors. Use per-   and its own dynamics, thus the selection of a spe-
mitted under Creative Commons License Attribution 4.0 In-       cific domain is crucial to obtain clean data and
ternational (CC BY 4.0)
to restrict the scope of experiments and learning           The Italian HS corpus is a collection of more
tasks.                                                   than 5700 tweets manually annotated with hate
In this paper we present a new corpus, called Poli-      speech, aggressiveness, irony and other forms
cycorpus XL, for hate speech detection from Twit-        of potentially harassing communication. The
ter in Italian. This corpus is an extension of the       HaSpeeDe-tw corpora are two collections of 4000
Policycorpus (Duzha et al., 2021). We selected           and 8100 tweets respectively, manually annotated
Twitter as the source of data and Italian as the tar-    with hate speech labels and containing mainly
get language because Italy has, at least since the       anti-immigration hate (Bosco et al., 2018). The
elections in 2018, a large audience that pays at-        Policycorpus is a collection of 1260 tweets manu-
tention to hyper-partisan sources on Twitter that        ally annotated with hate speech labels against pol-
are prone to produce and retweet messages of hate        itics and politicians. We decided to expand it and
against policy making (Giglietto et al., 2019).          produce a new dataset.
The paper is structured as follows: after a litera-         Hate speech is hard to annotate and hard to
ture review (Section 2), we describe how we col-         model, with the risk of creating data that is bi-
lected and annotated the data (Section 3), we eval-      ased and making the models prone to overfitting.
uate some baselines (Section 4), and we pave the         In addition to this, literature also reports cases
way for future work (Section 5).                         of annotators’ insensitivity to differences in di-
                                                         alect that can lead to racial bias in automatic hate
2    Related Work                                        speech detection models, potentially amplifying
                                                         harm against minority populations. It is the case of
Hate Speech in social media is a complex phe-
                                                         African American English (Sap et al., 2019) but it
nomenon, whose detection has recently gained
                                                         potentially applies to Italian as well, as it is a lan-
significant traction in the Natural Language Pro-
                                                         guage full of dialects and regional offenses.
cessing community, as attested by several recent
review works (Poletto et al., 2021). High-quality           Hate speech is intrinsically associated to rela-
annotated corpora and benchmarks are key re-             tionships between groups, and also relying in lan-
sources for hate speech detection and haters pro-        guage nuances. There are many definitions of hate
filing in general (Jain et al., 2021), considering the   speech from different sources, such as European
vast number of supervised approaches that have           Union Commission, International minorities asso-
been proposed (MacAvaney et al., 2019).                  ciations (ILGA) and social media policies (For-
    Early datasets on Hate Speech, especially in En-     tuna and Nunes, 2018). In most definitions, hate
glish, were produced outside any evaluation cam-         speech has specific targets based on specific char-
paigns (Waseem and Hovy, 2016), (Founta et al.,          acteristics of groups. Hate speech is to incite vio-
2018) as well as inside such competitions. These         lence, usually towards a minority. Moreover, hate
include SemEval 2019, where a multilingual hate          speech is to attack or diminish. Additionally, hu-
speech corpus against immigrants and women in            mour has a specific status in hate speech, and it
English and Spanish (Basile et al., 2019) was re-        makes more difficult to understand the boundaries
leased, and PAN 2021, that provided a dataset for        about what is hate and what is not.
the detection of hate spreader authors in English           In the political domain we find all of these
and Spanish (Rangel et al., 2021). Most Italian          aspects, especially messages against a minority
datasets in the field of hate speech have been re-       (politicians) to attack or diminish. We think that
leased during competitions and evaluation cam-           more resources are needed for the classification
paigns. There are:                                       of hate speech in Italian in the political domain,
                                                         hence we decided to collect and annotate more
    • the Italian HS corpus (Poletto et al., 2017),      data for this task.
                                                            In the next section, we describe how we created
    • HaSpeeDe-tw2018 and HaSpeeDe-tw2020,               the dataset and annotated it with hate speech la-
      the datasets released during the EVALITA           bels.
      campaigns (Sanguinetti et al., 2020),
                                                         3   Data Collection and Annotation
    • the Policycorpus (Duzha et al., 2021), the
      only dataset in Italian that is annotated with     Starting from the Policycorpus, we expanded it
      hate speech in the political domain.               from 1260 to 7000 tweets in Italian, collected us-
ing snowball sampling from Twitter APIs. As ini-
tial seeds, we used the same set of hashtags used
for the Policycorpus, for instance: #dpcm (decree
of the president of the council of ministers), #legge
(law) and #leggedibilancio (budget law). We re-
moved duplicates, retweets and tweets containing
only hashtags and urls. At the end of the sam-
pling process, the list of seeds included about 6000
hashtags that co-occurred with the initial ones.
We grouped the hashtags into the following cat-
egories:

   • Laws, such as #decretorilancio (#relaunchde-
     cree), #leggelettorale (#electorallaw), #de-
     cretosicurezza (#securitydecree)

   • Politicians and policy makers, such as
     #Salvini, #decretoSalvini (#Salvinidecree),
     #Renzi, #Meloni, #DraghiPremier

   • Political parties, such as #lega (#league), #pd
     (#Democratic Party)
                                                         Figure 1: Wordclouds of the hashtags collected with fre-
   • Political tv shows, such as #ottoemezzo,            quency higher than 2.
     #nonelarena, #noneladurso, #Piazzapulita

   • Topics of the public debate, such as #COVID,        number of tweets ever posted, the user’s descrip-
     #precari (#precariousworkers), #sicurezza           tion and location, the number of her/his followers
     (#security), #giustizia (#justice), #ItalExit       and of her/his friends, the number of public lists
                                                         that this user is a member of and the date her/his
   • Hyper-partisan slogans, such as #vergog-
                                                         account has been created.
     naConte (#shameonConte), #contedimet-
                                                            All these contextual information are respec-
     titi (#ConteResign) or #noicontrosalvini
                                                         tively part of the “root-level” attributes of the
     (#WeareagainstSalvini)
                                                         Tweets and Users objects that Twitter returns in
Examples of collected hashtags are reported in           JSON format through its APIs. Additionally, we
Figure 1                                                 planned to explore the interests of the author col-
   Recent shared tasks (Agerri et al., 2021;             lecting the list of her/his following (the users
Cignarella et al., 2020; Aker et al., 2016) pro-         she/he follows) employing the following API end-
moted the use of contextual information about the        point. Moreover, for exploring the author’s social
tweet and its author (including his/her social me-       interactions, we used the Academic Full Search
dia network) for improving the performance of            API for recovering the list of the users that she/he
stance detection. Here, with the aim to stimu-           has retweeted to and replied to in the last two
late the exploration of data augmentation on hate        years.
speech detection, we shared additional contextual           The enhanced Policycorpus has been finally
information based on the post such as: the number        anonymised mapping each tweet id, users id, and
of retweets and the number of favours (the number        mention with a randomly generated ID. To pro-
of tweets that given user has marked as favorite -       duce gold standard labels, we asked two Italian na-
favours count field) the tweet received, the device      tive speakers, experts of communication, to man-
used for posting it (e.g. iOS or Android), the post-     ually label the tweets in the corpus, distinguishing
ing date and location, and an attribute that states if   between hate and normal tweets according to the
the post is a tweet, a retweet, a reply, or a quote.     following guidelines: By definition, hate speech
Furthermore, we collected contextual information         is any expression that is abusive, insulting, intim-
related to the authors of these posts such as: the       idating, harassing, and/or incites to violence, ha-
tred, or discrimination. It is directed against peo-             not directed against people on the basis of their
ple on the basis of their race, ethnic origin, re-               race, ethnic origin, religion, gender, age, physical
ligion, gender, age, physical condition, disabil-                condition, disability, sexual orientation or political
ity, sexual orientation, political conviction, and               conviction.
so forth. (Erjavec and Kovačič, 2012). Below                      The Inter-Annotator Agreement is k=0.53.
We provide some examples with translation in En-                 Although this score is not high, it is in line with
glish:                                                           the score reported in the literature for hate speech
                                                                 against immigrants (k=0.54) (Poletto et al., 2017)
  1. “Un chiaro #NO all #Olanda che ci vor-                      and indicates that the detection of hate speech is a
     rebbe sı̀ utilizzatori delle risorse economiche             hard task for humans.
     del #MES ma in cambio della rinuncia dell                      All the examples in disagreement were dis-
     Italia alla propria autonomia di bilancio. All              cussed and an agreement was reached between the
     Olanda diciamo: grazie e arrivederci NON                    annotators, with the help of a third supervisor. The
     CI INTERESSA!!”1                                            cases of disagreements occurred more often when
The first example is normal because it does not                  the sentiment of the tweet was negative, this was
contain hate, insults, intimidation, violence or dis-            mainly due to:
crimination.                                                        • The use of vulgar expressions not explicitly
  2. “...Sta settimanale passerella dello #scia-                      directed against specific people but generi-
     callo #no #proprioNo! Ascoltare un #pagli-                       cally against political choices.
     accio padano dopo un vero PATRIOTA un                          • The negative interpretation of hyper-partisan
     medico di #Bergamo non si può reggere                           hashtags, such as #contedimettiti (#ConteRe-
     ne vedere ne ascoltare. Giletti dovrebbe                         sign) or #noicontrosalvini (#Weareagainst-
     smetterla di invitare certi CAZZARIPADANI!                       Salvini), in tweets without explicit insults or
     #COVID-19 #NonelArena”2                                          abusive language.
The second example contains hate speech, includ-
                                                                    • The substitution of explicit insults with
ing insults like #clown and #jackal.
                                                                      derogatory words, such as the word “circus”
  3. “Dico la mia...      #Draghi è un grande                        instead of “clowns”.
     economista ma a noi non serve un
                                                                    The amount of hate labels in the original Pol-
     economista stile #Monti...      A noi non
                                                                 icycorpus was 11% (1124 normal and 140 hate
     serve un altro #governo tecnico per ubbidire
                                                                 tweets), strongly unbalanced like the Italian HS
     alla lobby delle banche! A noi serve un
                                                                 corpus (17% of hate tweets), because it reflects
     leader politico! A noi serve un #ItalExit! A
                                                                 the raw distribution of hate tweets in Twitter. The
     noi serve la #Lira! #No a #DraghiPremier”3
                                                                 HaSpeeDe-tw corpus (32% of hate tweets) instead
The last example is a normal case, despite the                   has a distribution that oversamples hate tweets and
strong negative sentiment. It might be contro-                   it is better for training hate speech models. Fol-
versial for the presence of the term lobby, often                lowing the HaSpeeDe-tw example, in Policycor-
used in abusive contexts, but in this case, it is                pus XL we collected more tweets of hate, ran-
    1
                                                                 domly discarding normal tweets to reach at least
      a clear #NO to the #Netherlands that would like us to be
users of the #MES economic resources but in exchange for         40% of hate tweets in the corpus. In the end we
Italy’s renunciation of its budgetary autonomy. To Nether-       have 40.6% of hate labels and 59.4% of normal
lands we say: thank you and goodbye, WE ARE NOT IN-              labels, distributed between training and test set as
TERESTED !!
    2
      ... There is a weekly catwalk of the #jackal #no #no-
                                                                 shown in figure 2.
tAtAll! Listening to a Padanian #clown after a true PATRIOT         We note in the style of these tweets that there
a doctor from #Bergamo cannot be held, seen or heard. Giletti    is a substantial overlap among the top unigrams in
should stop inviting certain SLACKERS FROM THE PO
VALLEY! #COVID-19 #NonelArena                                    the two classes, as shown in Figure 3. We suggest
    3
      I have my say ... #Draghi is a great economist but we      that weak signals, like less frequent words, are key
don’t need a #Monti-style economist ... We don’t need an-        features for the classification task.
other technical #government to obey the banking lobby! We
need a political leader! We need a #ItalExit! We need the           In the next section, we report and discuss the
#Lira! #No to #DraghiPremier                                     results of classification experiments.
                                                                Figure 3: Wordclouds of the unigrams most associated to
Figure 2: Distribution of classes in Policycorpus-XL train-     the normal and hate classes respectively. It shows a substan-
ing and test sets.                                              tial overlap among the top unigrams in the two classes.


4     Baselines                                                 with the scores obtained by the systems on the
                                                                HaSpeeDe-tw 2020 dataset at EVALITA, and we
In order to set the baselines for the hate speech               believe that there is still great room for improve-
classification task on Policycorpus-XL, we tested               ment with the Policycorpus-XL, as we exploited
different classification algorithms. We are using               very simple and limited features.
a 70 train and 30 test percentage split, the train-
ing set shape is 4900 instances and 300 features,               5    Conclusion and Future Work
while the test set shape is 2100 instances and 300
features. The 300 features are the normalized fre-              We presented a large corpus of Twitter data in Ital-
quencies of the 300 most frequent words extracted               ian, manually annotated with hate speech labels.
from tweets without removing the stopwords. Ta-                 The corpus is an extension of a previous one, the
ble 1 reports the result of classification.                     first corpus annotated with hate speech in the po-
                                                                litical domain in Italian.
    algorithm             balanced acc        macro F1              Given the rising amount of hate messages on-
    majority baseline     0.500               0.37              line, not just against minorities but more and more
    naive bayes           0.783               0.78              against policies and policymakers, it is urgent to
    decision trees        0.763               0.76              understand the phenomenon and train classifiers
    SVMs                  0.788               0.79              that could prevent people to disseminate hate in
                                                                the public debate. This is very important to keep
Table 1: Results of classification with different algorithms.   democracies alive and grant a free speech that is
                                                                respectful of other people’s freedom.
   We used Scikit-Learn to compute a majority                       We plan to distribute the corpus in the next edi-
baseline with a dummy classifier, that assigns all              tion of EVALITA for a specific HaSpeeDe-tw task.
the instances to the most frequent class (normal
                                                                Acknowledgments
tweets), a naive bayes classifier, a decision tree
and Support Vector Machines (SVMs). The best                    The research leading to the results presented in
performance for the classification of hate speech               this paper has received funding from the Poli-
has been achieved with the SVM classifier, that                 cyCLOUD project, supported by the European
has a very high precision (0.94) and poor recall                Union’s Horizon 2020 research and innovation
(0.60). All the algorithms a The results are in line            programme under Grant Agreement no 870675.
References                                                Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta,
                                                            Marinella Petrocchi, and Maurizio Tesconi. 2017.
Rodrigo Agerri, Roberto Centeno, Marı́a Espinosa,           Hate me, hate me not: Hate speech detection on
  Joseba Fernandez de Landa, and Alvaro Rodrigo.            facebook. In Proceedings of the First Italian Con-
  2021. VaxxStance@IberLEF 2021: Going Beyond               ference on Cybersecurity (ITASEC17), pages 86–95.
  Text in Crosslingual Stance Detection. In Proceed-
  ings of the Iberian Languages Evaluation Forum          Armend Duzha, Cristiano Casadei, Michael Tosi, and
  (IberLEF 2021). CEUR-WS.org.                              Fabio Celli. 2021. Hate versus politics: detection
Ahmet Aker, Fabio Celli, Adam Funk, Emina Kur-              of hate against policy makers in italian tweets. SN
  tic, Mark Hepple, and Rob Gaizauskas. 2016.               Social Sciences, 1(9):1–15.
  Sheffield-trento system for sentiment and argument
                                                          Karmen Erjavec and Melita Poler Kovačič. 2012. “you
  structure enhanced comment-to-article linking in the
                                                            don’t understand, this is a new war!” analysis of hate
  online news domain.
                                                            speech in news web sites’ comments. Mass Commu-
Firoj Alam, Fabio Celli, Evgeny Stepanov, Arindam           nication and Society, 15(6):899–920.
   Ghosh, and Giuseppe Riccardi. 2016. The social
   mood of news: self-reported annotations to design      Paula Fortuna and Sérgio Nunes. 2018. A survey on
   automatic mood detection systems. In Proceedings         automatic detection of hate speech in text. ACM
   of the Workshop on Computational Modeling of Peo-        Computing Surveys (CSUR), 51(4):1–30.
   ple’s Opinions, Personality, and Emotions in Social
   Media (PEOPLES), pages 143–152.                        Antigoni Maria Founta, Constantinos Djouvas, De-
                                                            spoina Chatzakou, Ilias Leontiadis, Jeremy Black-
Pinkesh Badjatiya, Shashank Gupta, Manish Gupta,            burn, Gianluca Stringhini, Athena Vakali, Michael
   and Vasudeva Varma. 2017. Deep learning for hate         Sirivianos, and Nicolas Kourtellis. 2018. Large
   speech detection in tweets. In Proceedings of the        scale crowdsourcing and characterization of twit-
   26th International Conference on World Wide Web          ter abusive behavior. In Twelfth International AAAI
   Companion, pages 759–760.                                Conference on Web and Social Media.
Valerio Basile, Cristina Bosco, Elisabetta Fersini,       Fabio Giglietto, Nicola Righetti, Giada Marino, and
  Nozza Debora,         Viviana Patti,      Francisco       Luca Rossi. 2019. Multi-party media partisan-
  Manuel Rangel Pardo, Paolo Rosso, Manuela                 ship attention score. estimating partisan attention of
  Sanguinetti, et al. 2019. Semeval-2019 task 5:            news media sources using twitter data in the lead-
  Multilingual detection of hate speech against immi-       up to 2018 italian election. Comunicazione politica,
  grants and women in twitter. In 13th International        20(1):85–108.
  Workshop on Semantic Evaluation, pages 54–63.
  Association for Computational Linguistics.              Imane Guellil, Ahsan Adeel, Faical Azouaou, Sara
                                                            Chennoufi, Hanene Maafi, and Thinhinane Hami-
Cristina Bosco, Felice Dell’Orletta, Fabio Poletto,         touche. 2020. Detecting hate speech against politi-
  Manuela Sanguinetti, and Maurizio Tesconi. 2018.          cians in arabic community on social media. Interna-
  Overview of the evalita 2018 hate speech detection        tional Journal of Web Information Systems.
  task. In EVALITA 2018-Sixth Evaluation Campaign
  of Natural Language Processing and Speech Tools         Rakshita Jain, Devanshi Goel, Prashant Sahu, Abhinav
  for Italian, volume 2263, pages 1–9. CEUR.                Kumar, and Jyoti Prakash Singh. 2021. Profiling
                                                            hate speech spreaders on twitter. In CLEF.
Fabio Celli and Luca Polonio. 2015. Facebook and the
  real world: Correlations between online and offline     Sylvia Jaki and Tom De Smedt. 2019. Right-wing ger-
  conversations. CLiC it, page 82.                          man hate speech on twitter: Analysis and automatic
Fabio Celli, Giuseppe Riccardi, and Arindam Ghosh.          detection. arXiv preprint arXiv:1910.07518.
  2014. Corea: Italian news corpus with emotions and
  agreement. In Proceedings of CLIC-it 2014, pages        Irene Kwok and Yuzhou Wang. 2013. Locate the hate:
  98–102.                                                    Detecting tweets against blacks. In Proceedings of
                                                             the twenty-seventh AAAI conference on artificial in-
Fabio Celli, Evgeny A Stepanov, Massimo Poesio, and          telligence, pages 1621–1622.
  Giuseppe Riccardi. 2016. Predicting brexit: Clas-
  sifying agreement is better than sentiment and poll-    Sean MacAvaney, Hao-Ren Yao, Eugene Yang, Katina
  sters. In PEOPLES@ COLING, pages 110–118.                 Russell, Nazli Goharian, and Ophir Frieder. 2019.
                                                            Hate speech detection: Challenges and solutions.
Alessandra Teresa Cignarella, Mirko Lai, Cristina           PloS one, 14(8):e0221152.
  Bosco, Viviana Patti, and Paolo Rosso. 2020.
  Sardistance@evalita2020: Overview of the task on        Stefano Menini, Giovanni Moretti, Michele Corazza,
  stance detection in italian tweets. In Proceedings of      Elena Cabrio, Sara Tonelli, and Serena Villata.
  the 7th Evaluation Campaign of Natural Language            2019. A system to monitor cyberbullying based on
  Processing and Speech Tools for Italian (EVALITA           message classification and social network analysis.
  2020), volume 2765 of CEUR Workshop Proceed-               In Proceedings of the Third Workshop on Abusive
  ings, Aachen, Germany, December. CEUR-WS.org.              Language Online, pages 105–110.
Nuria Oliver, Bruno Lepri, Harald Sterly, Renaud Lam-      Hajime Watanabe, Mondher Bouazizi, and Tomoaki
  biotte, Sébastien Deletaille, Marco De Nadai, Em-         Ohtsuki. 2018. Hate speech on twitter: A prag-
  manuel Letouzé, Albert Ali Salah, Richard Ben-            matic approach to collect hateful and offensive ex-
  jamins, Ciro Cattuto, et al. 2020. Mobile phone            pressions and perform hate speech detection. IEEE
  data for informing public health actions across the        access, 6:13825–13835.
  covid-19 pandemic life cycle.
                                                           Dejin Zhao and Mary Beth Rosson. 2009. How and
Fabio Poletto, Marco Stranisci, Manuela Sanguinetti,         why people twitter: the role that micro-blogging
  Viviana Patti, and Cristina Bosco. 2017. Hate              plays in informal communication at work. In Pro-
  speech annotation: Analysis of an italian twitter cor-     ceedings of the ACM 2009 international conference
  pus. In 4th Italian Conference on Computational            on Supporting group work, pages 243–252. ACM.
  Linguistics, CLiC-it 2017, volume 2006, pages 1–6.
  CEUR-WS.

Fabio Poletto, Valerio Basile, Manuela Sanguinetti,
  Cristina Bosco, and Viviana Patti. 2021. Resources
  and benchmark corpora for hate speech detection: a
  systematic review. Language Resources & Evalua-
  tion, 55:477–523.

Francisco Rangel, GLDLP Sarracén, BERTa Chulvi,
  Elisabetta Fersini, and Paolo Rosso. 2021. Profiling
  hate speech spreaders on twitter task at pan 2021. In
  CLEF.

Punyajoy Saha, Binny Mathew, Pawan Goyal, and
  Animesh Mukherjee. 2018. Hateminers: detect-
  ing hate speech against women. arXiv preprint
  arXiv:1812.06700.

Manuela Sanguinetti, Fabio Poletto, Cristina Bosco,
 Viviana Patti, and Marco Stranisci. 2018. An ital-
 ian twitter corpus of hate speech against immigrants.
 In Proceedings of the Eleventh International Confer-
 ence on Language Resources and Evaluation (LREC
 2018).

Manuela Sanguinetti, Gloria Comandini, Elisa
 Di Nuovo, Simona Frenda, Marco Stranisci,
 Cristina Bosco, Tommaso Caselli, Viviana Patti,
 and Irene Russo. 2020. Overview of the evalita
 2020 second hate speech detection task (haspeede
 2). In Valerio Basile, Danilo Croce, Maria Di Maro,
 and Lucia C. Passaro, editors, Proceedings of the
 7th evaluation campaign of Natural Language
 Processing and Speech tools for Italian (EVALITA
 2020), Online. CEUR.org.

Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi,
 and Noah A Smith. 2019. The risk of racial bias in
 hate speech detection. In Proceedings of the 57th
 Annual Meeting of the Association for Computa-
 tional Linguistics, pages 1668–1678.

Kai Shu, Xinyi Zhou, Suhang Wang, Reza Zafarani,
  and Huan Liu. 2019. The role of user profiles for
  fake news detection. In Proceedings of the 2019
  IEEE/ACM international conference on advances in
  social networks analysis and mining, pages 436–
  439.

Zeerak Waseem and Dirk Hovy. 2016. Hateful sym-
  bols or hateful people? predictive features for hate
  speech detection on twitter. In Proceedings of the
  NAACL student research workshop, pages 88–93.