<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HaSpeeDe3 at EVALITA 2023: Overview of the Political and Religious Hate Speech Detection task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mirko Lai</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Celli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Ramponi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Tonelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristina Bosco</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viviana Patti</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler (FBK)</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Maggioli s.p.a., University of Trento</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università degli Studi di Torino</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Hate Speech Detection (HaSpeeDe3) task is the third edition of a shared task on the detection of hateful content in Italian tweets. It difers from the previous editions while maintaining continuity in analysing and contrasting hate speech (HS) on social media. While HaSpeeDe and HaSpeeDe2 were focused on HS against immigrants, Muslims and Roms, HaSpeeDe3 explores hate speech in strong polarised debates, concerning in particular politics and religion. It is articulated in two diferent tasks: A) In-domain political hate speech detection and B) Cross-domain hate speech detection about political and religious tweets. Task A consists in two diferent subtasks for which participants i) can only use the provided textual content of the tweet, or ii) can additionally employ contextual information about the tweet and its author. In Task B, that consists in two subtasks, participants are allowed to use any kind of external data for detecting hate speech in tweets about i) politics and ii) religion. Six teams from both academia and industry participated in the evaluation, with a total of 13 submitted runs for Task A and 16 for Task B.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hate speech detection</kwd>
        <kwd>social media analysis</kwd>
        <kwd>polarised debates</kwd>
        <kwd>political hate speech</kwd>
        <kwd>religious hate speech</kwd>
        <kwd>shared task</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>
        gressions and online hate are exacerbated by the
ideological segregation present on social media, where social
Social media play an important role in public debates, homophily, as well as personalising and recommending
especially concerning politics. On the one hand, political algorithms, facilitate the creation of echo chambers and
leaders use social media as a vehicle for political and elec- filter bubbles [7, 8]. The “others” are frequently targeted
toral propaganda. On the other hand, they provide news because of characteristics such as gender, sexual
orientato a significant part of the population that takes part tion, ethnicity, and religion [9, 10, 11].
in the discussion, supporting or criticising political deci- In the last years, to address these problems posed
sions [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Social media are also the place where debates by the widespread use of abusive language online, the
on sensitive topics, such as religious beliefs and practices, NLP community has focused on the detection of hate
are rather common and sometimes are intertwined with speech [12] and the analysis of online debates [13, 14].
public discussions on political matters. In particular, many researchers have worked on systems
      </p>
      <p>
        Unfortunately, such discussions often trigger verbal to detect ofensive language against specific vulnerable
aggressions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], especially after some polarising events groups, e.g., women, immigrants, LGBTQ+ community,
in Europe and beyond such as Brexit [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the Covid-19 among others [11, 15, 16, 17]. An under-researched – yet
pandemics [5] and the Russo-Ukrainian conflict [ 6]. Ag- important – area of investigation is anti-politics hate, i.e.,
hate speech against politicians, policy makers and laws
EVALITA 2023: 8th Evaluation Campaign of Natural Language Pro- at any level (national, regional and local). While
anticessing and Speech Tools for Italian, Sep 7 – 8, Parma, IT policy hate speech has been addressed in Arabic [18] and
* Corresponding author. German [19], most European languages have been
undera$lrammirpkoon.lia@i @fbukn.eiuto(.iAt .( MRa. mLapio);nFi)a;bsiaot.oCneelllil@i@mfbakg.geiuol(iS.i.tT(Fo.nCelelil)li;); researched. As regards religious hate, instead, annotated
cristina.bosco@unito.it (C. Bosco); viviana.patti@unito.it (V. Patti) corpora have been created for English, Arabic, Bengali,
 http://www.di.unito.it/~lai/ (M. Lai); French, Portuguese, and Italian, among others (for an
https://dh.fbk.eu/author/alramponi/ (A. Ramponi); overview of works, see [15] and [20]). However, none
https://dh.fbk.eu/author/sara/ (S. Tonelli); of them share contextual information about the authors
hhttttpp::////wwwwww..ddii..uunniittoo..iitt//~~bpoatstcio//( V(C..PBatotsi)co); of the tweets, neither about their social media network,
0000-0003-1042-0861 (M. Lai); 0000-0002-7309-5886 (F. Celli); although religious self-identification may lead to hard
0000-0002-4305-2404 (A. Ramponi); 0000-0001-8010-6689 conflict with the members of other worships.
(S. Tonelli); 0000-0002-8857-4484 (C. Bosco); 0000-0001-5991-370X For this shared task organised within EVALITA 2023
(V. Patti) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License [21], we introduce a new corpus, called PolicyCorpusXL,
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) containing Italian tweets related to political topics, where
– XPoliticalHate: the test set consists of
tweets from PolicyCorpusXL (as in both
the in-domain subtasks above);
– XReligiousHate: the test set consists of
tweets from the ReligiousHate corpus
(Section 3), for which no development data is
provided to participants.
      </p>
      <sec id="sec-1-1">
        <title>Moreover, participants are allowed to use any kind of external data (e.g., datasets for other hate domains) and textual and contextual PolicyCorpusXL development data.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Dataset and Format</title>
      <sec id="sec-2-1">
        <title>In this section, we describe the dataset creation process (Section 3.1), including data collection, annotation, enrichment, and label distribution. Then, we outline the format used for sharing data to participants (Section 3.2).</title>
        <sec id="sec-2-1-1">
          <title>3.1. Dataset Creation</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>We collected data from Twitter after selecting it among</title>
        <p>existing social media platforms where hatred content
could be present. There are two main reasons for this
choice. One the one hand, Twitter easily allows the
retrieval of a high volume of textual content by using APIs.
On the other hand, additional metadata about tweets
themselves and their authors can be collected.
Furthermore, Twitter users can perform asynchronous actions
such as retweeting, replying, and following. This latter
aspect allows us to share with HaSpeeDe3 participants
not only the text of the tweets and their metadata but
also contextual information about the network where the
participants of the online debate are situated.
hateful messages have been manually annotated. This
corpus is an extension of PolicyCorpus [22]. We selected
Twitter as the source of data and Italian as the target
language because Italy has, at least since the elections
in 2018, a large audience that pays attention to
hyperpartisan sources on Twitter. These users are prone to
produce and retweet messages of hate against
policymaking [23]. We also provide the Italian portion of the
ReligiousHate dataset [20] as a test set, in which hateful
tweets concerning Christianity, Islam and Judaism have
been manually labeled. Our goal is to test the in-domain
performance of systems for political hate speech
detection, as well as the out-of-domain performance on a test
set about religion.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Definition of the Task</title>
      <p>HaSpeeDe3 focuses on detecting hate speech in strong
polarised debates on social media, in particular debates
on Twitter about political and religious topics. With this
task, we invite participants to explore not only features
based on the textual content of the tweet, but also features
based on contextual information such as metadata that
describe both the tweet and the author, or information
about the social media community of the participants of
the debate.</p>
      <p>We propose two tasks, A and B, that in the rest of the
paper will be referred also as in-domain and cross-domain
tracks. Both tasks aim at tackling binary classification
problems, and thus participants’ systems have to
predict whether a tweet contains hatred or not. Each task
consists of two subtasks:
• Task A – (In-domain) political hate speech
detection: a binary classification task aimed at
determining whether a message contains hate
speech or not. The task is based on the Policy- 3.1.1. Data Collection
CorpusXL dataset (Section 3) and comprises the
following subtasks:</p>
      <sec id="sec-3-1">
        <title>ReligiousHate We use the Italian portion of the reli</title>
        <p>gious hate speech corpus introduced in [20]. The dataset
is composed of 3,000 tweets collected between December
2020 and August 2021 with keywords that refer to the
three main monotheistic religions, namely Christianity,
Islam and Judaism.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Due to the diferent nature of the political and religious</title>
        <p>topics, the protocols used for data collection are not the
same; however, in both cases, ofensive words have not
been used as query terms to minimise biased dataset
composition and potential learning shortcuts [25, 26].
3.1.2. Data Annotation</p>
      </sec>
      <sec id="sec-3-3">
        <title>We summarise the annotation procedure followed for</title>
        <p>PolicyCorpusXL and ReligiousHate below.</p>
        <p>PolicyCorpusXL Two Italian experts of
communication annotated the entire dataset. The training set has
been additionally annotated by a third expert in case of
disagreement. 1,000 tweets have been finally discarded
in order to artificially augment the portion of hate tweets
and provide more information for the classifiers. With
this strategy, the number of tweets containing hate
increased from 11.8% (a typical percentage obtained with
random sampling) to 40.6%.</p>
      </sec>
      <sec id="sec-3-4">
        <title>ReligiousHate Three native speakers of Italian with</title>
        <p>a background in linguistics and computer science
annotated 3,000 tweets about religion that have been collected
as described in Section 3. Annotation was performed
following a protocol for experts that foresaw in-person
discussion rounds and adjudication sessions.</p>
      </sec>
      <sec id="sec-3-5">
        <title>The Inter Annotation Agreement is similar for both</title>
        <p>the PolicyCorpusXL (Fleiss’  = 0.53) and ReligiousHate
(Cohen’s  = 0.57) datasets.
3.1.3. Data Enrichment</p>
      </sec>
      <sec id="sec-3-6">
        <title>Using the Twitter stream APIs we retrieve tweets but we</title>
        <p>miss their subsequent history in the micro-blog platform.
Indeed, since tweets are retrieved at posting time, we
are not able to know what happens to them afterwards.
In order to follow up the impact of a tweet on the user
community after the posting time, we, therefore, use
Twitter’s APIs also to retrieve information about each tweet
a posteriori. This makes it possible to check, for example,
the number of times that the tweet has been retweeted
or liked over the weeks after its posting time. We also
collected a variety of additional information about the
author, such as the list of friends and the users that each
author retweeted and replied to since about 2018.</p>
      </sec>
      <sec id="sec-3-7">
        <title>Statistics of the two HaSpeeDe3’s datasets are sum</title>
        <p>marised in Table 1. PolicyCorpusXL consists of 7,000
tweets about political debates (5,600 in the development
set and 1,400 in the test set), whereas ReligiousHate
comprises 3,000 tweets, all belonging to the test set.
• anonymized_tweet_id: A pseudo-random integer
that identifies the specific tweet and replaces the
original tweet id.
• created_at: The posting date of the tweet.
• retweet_count: The number of times the tweet
has been retweeted.
• favorite_count: It indicates approximately how
many times this tweet has been liked by Twitter
users1.
• source: The source used for posting the tweet
(e.g., Android, iOS, web).
• is_reply: 1 if the tweet is a reply, 0 otherwise.
• is_retweet: 1 if the tweet is a retweet, 0
otherwise.
• is_quote: 1 if the tweet is a quote, 0 otherwise.
• anonymized_user_id: The original author id (if
known), replaced by a pseudo-random integer.
• user_created_at: The date when the author
created the account.
• statuses_count: The number of tweets posted by
the author.
• followers_count: The number of Twitter users
that follow the author.
• friends_count: The number of Twitter users that
the author follows.
• anonymized_description: The self-description
of the author of the tweet. We applied the
same anonymisation strategy applied to the field
anonymized_text of the file train_textual.csv
described above.</p>
        <p>The value of some fields could be unavailable or set
to 0 if we were unable to recover the metadata of the
tweet in 2022 (many months after the posting date), for
example, because the tweet has been removed by Twitter,
deleted, or made unavailable by the author.
training|test_contextual_friends.csv
• source: A user, identified by anonymized_user_id,
that follows the target.
• target: A user, identified by anonymized_user_id,
that is followed by the source.
training|test_contextual_retweet|reply.csv
• source: A user, identified by anonymized_user_id,
that retweeted target.
1Twitter released a number that “indicates approximately
how many times th[e] Tweet has been liked by Twitter
users”: https://developer.twitter.com/en/docs/twitter-api/v1/
data-dictionary/object-model/tweet
• target: A user, identified by anonymized_user_id,
that has been retweeted by source.
• date: The day when source retweeted target.
• count: The number of times the source retweeted
the target that day.</p>
      </sec>
      <sec id="sec-3-8">
        <title>All sources are authors of at least one tweet in the training corpus, but some authors are missing in this file since it was not possible to recover their friend list.</title>
      </sec>
      <sec id="sec-3-9">
        <title>All files described above are available at the oficial</title>
        <p>GitHub page of the task2.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation Measures</title>
      <sec id="sec-4-1">
        <title>We provide four separate oficial rankings, one for each</title>
        <p>subtask. Participants can submit two runs for each
subtask. However, participants are not required to
participate in all subtasks or to submit 2 runs for each of them.</p>
        <p>Systems are evaluated using 1-score computed over
the two binary classes, i.e., hate speech (HS) or
nonhate speech (¬HS). Therefore, submissions are ranked by
averaged 1-score over the two classes, according to the
following equation:</p>
        <p>1() = (1 + 1¬ )/2</p>
        <sec id="sec-4-1-1">
          <title>4.1. Baselines</title>
          <p>We computed baselines using a simple machine learning
model. For Task A - Textual, we employed a Support
Vector Classifier trained with a unigram representation of the
textual content of the tweet. For Task A - Contextual, we
devise a baseline using the same classifier as above, based
on a unigram representation of the textual content of the
tweet, plus the number of retweets and favourites
received by the tweet (retweet_count and favourite_count,
see Section 3.2), the author degree computed from the
friends network, and the author eigenvector centrality
computed from the friends network. A last baseline for
both the cross-domain hate speech subtasks employs a
Support Vector Classifier with a unigram representation
of the textual content of the tweet, trained with the
XPoliticalHate and HaSpeeDe2 training sets [27].</p>
          <p>In Table 2 we present the results obtained by the
baselines on the four subtasks.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Task Overview: Participation and Results</title>
      <sec id="sec-5-1">
        <title>A total of six teams participated in the HaSpeeDe3 task.</title>
        <p>We summarise their contribution below.</p>
      </sec>
      <sec id="sec-5-2">
        <title>2https://github.com/mirkolai/EVALITA2023-HaSpeeDe3</title>
      </sec>
      <sec id="sec-5-3">
        <title>BERTicelli [28] The team submitted results for all the</title>
        <p>tasks and used all the provided sets of information. They
exploited two pre-trained cased LLMs for Italian, namely
UmBERTo and Italian BERT. In the pre-processing phase,
they turned hashtags into words to reduce noise, they
performed fine-tuning and used a 5-fold cross-validation
for the Textual subtask, obtaining high scores. For the
Contextual subtask, the team adopted an ensemble
approach, wherein additional features were added to the
ifne-tuned models through a GradientBoosterClassifier
algorithm. UmBERTo performed competitively in both
Textual and Contextual subtasks but the model did not
benefit from the addition of contextual features. Italian
BERT, on the other hand, performed above the baselines
but significantly lower than the task average. Overall,
the team performed above the average in the political
hate domain and below the average in the religious hate
domain.</p>
      </sec>
      <sec id="sec-5-4">
        <title>CHILab [29] The team participated only to the Task</title>
        <p>A - Textual, i.e., addressing only in-domain political hate
speech detection using the provided textual content of
the tweets from PolicyCorpusXL for development. They
submitted two runs that employ two diferent models
based on BiLSTM. The first one generates embeddings of
768 tokens from AlBERTo and the second one employs
fastText for generating 300-dimensional token
embeddings. Particular attention was paid to pre-processing.
The [URL] tag, mention references, and retweet notes
were removed since they were not considered relevant.
Case sensitivity has been preserved as well as emojis due
to the fact that they convey a specific meaning in social
media communication in terms of prosody and emotions.
sults than the IT5 one on Task A - Contextual, whereas
the IT5 model achieved better results on the remainder
subtasks.</p>
      </sec>
      <sec id="sec-5-5">
        <title>INGEOTEC The team did not submit a system descrip</title>
        <p>tion report; therefore, we are unable to discuss and
analyse their approach. They participated to the Task A
Textual and to the Task B - XReligiousHate considering
both the evaluation settings.</p>
      </sec>
      <sec id="sec-5-6">
        <title>LMU [31] The team participated only to the Task B</title>
        <p>XReligiousHate considering both the evaluation settings
with multitask prompt-training systems. Their systems
consist of two steps in which models are i) pre-finetuned
on external datasets in Italian and English from various
domains, ii) fine-tuned on the target domain (only
applicable to PolicyCorpusXL). As a backbone of their systems,
they experimented with both Italian and multilingual
pre-trained language models (PLMs). They showed that
Italian datasets are more beneficial than the combination
of Italian and English ones and that systems based on
both Italian and multilingual PLMs achieved similar
performance. Their best runs for the political and religious
domains are ensembles of prompt-training systems based
on Italian and multilingual PLMs.
odang4 [32] The team participated in both Tasks A
and B, using only textual information in the former. They
based their approach on the assumption that a relation
between named entities and abusive language exists.
They submitted two diferent runs. The first one
employs enhanced-ALBERTo with triple verbalisation from
the Ontology of Dangerous Speech [33] with prompting
Davinci model. The second one applies a majority
voting criteria among ALBERTo, the enhanced-ALBERTo
with triple verbalisation from the Ontology of
Dangerous Speech, and the enhanced-ALBERTo with prompting
Davinci. For what concerns Task B - XReligiousHate, the
multilingual expert-based hate speech/counter-narrative
pairs dataset on Islamophobia (CONAN) [34] has been
employed too.</p>
        <sec id="sec-5-6-1">
          <title>5.1. Final Ranking</title>
          <p>extremITA [30] The team addressed all the tasks us- Table 3 shows the results obtained by the participants for
ing all the provided sets of information made available by each of the four subtasks. The runs submitted by each
the organisers. They also made use of data from all the participant are highlighted in green. However, when a
EVALITA 2023 challenges to build monolithic architec- team submits a run to Task A - Textual, the submission
tures to tackle all the tasks. Their approaches are based on satisfies also Task A - contextual and Task B -
XPoliticali) the IT5 encoder-decoder model, and ii) an instruction- Hate requirements, therefore it is included in the final
tuned large language model built upon LLaMA. To the ranking. Likewise, when a team submits a run to Task
goal, for both models, they devised natural language in- A - contextual, the submission satisfies Task B -
XPolitstructions and output templates for each EVALITA task, icalHate requirements too. The best results in Task A
including HaSpeeDe3. Among their submissions, we Textual, Task A - contextual, and Task B - XPoliticalHate
observe that the LLaMA-based model achieved better
reare achieved by the odang4 team with 1() = 0.912, happens because their system was built to address all
employing the same model without taking advantage of EVALITA challenges, and the only task-specific
adaptacontextual information nor using external data sources. tion is the use of instructions for HaSpeeDe3. Overall,
Only extremITA and LMU (the latter exclusively par- out-of-domain settings still challenge hate speech
detecticipated to Task B - XPoliticalHate) reached 1() &gt; tion capabilities and still represent a research direction
0.9 with at least one of their runs. to investigate. Furthermore, approaches that tackle well
extremITA and LMU are the only two teams that in-domain hate do not seem to suit the out-of-domain
reached 1() &gt; 0.6 in Task B - XReligiousHate. In setting, for which diferent strategies should be pursued.
particular, extremITA obtained 1() = 0.6525, with a
remarkable improvement with respect to other teams.</p>
          <p>All participating systems showed an improvement over Acknowledgments
the baselines employed for the in-domain political hate
speech detection tasks, whereas only two teams
outperformed the baseline for Task B - XReligiousHate, proving
the complexity of the cross-domain task (Section 5.1).</p>
        </sec>
      </sec>
      <sec id="sec-5-7">
        <title>This work has received financial support from the European Union’s Horizon Europe research and innovation program under grant agreement No 101070190 (AI4Trust).</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion and Conclusion</title>
      <sec id="sec-6-1">
        <title>Results show that the run #1 submitted by the odang4</title>
        <p>team achieves the best scores across all in-domain tasks.</p>
        <p>In particular, their approach combining prompting, the
Ontology of Dangerous Speech, and the ALBERTo model
proved particularly efective in the political domain.</p>
        <p>However, none of the participants seems to have found a
way to efectively exploit contextual information
yielding an improvement over textual-only models. This is
in line with past studies showing the challenges of
embedding contextual information in hate speech detection
systems [35].</p>
        <p>While the best performance for the in-domain task
confirms the state-of-the-art results obtained in
similar settings [36], we observe a significant drop in
performance (around − 0.30 1 score on average) for the
out-of-domain task. Among the systems, extremITA
shows a better generalisation capability and yields the
best results in this setting. We hypothesise that this
[5] N. Oliver, B. Lepri, H. Sterly, R. Lambiotte, S. Dele- Resources &amp; Evaluation 55 (2021) 477–523.
taille, M. De Nadai, E. Letouzé, A. A. Salah, R. Ben- [16] P. Saha, B. Mathew, P. Goyal, A. Mukherjee,
Hatemjamins, C. Cattuto, et al., Mobile phone data for iners: detecting hate speech against women, arXiv
informing public health actions across the covid-19 preprint arXiv:1812.06700 (2018).</p>
        <p>pandemic life cycle, 2020. [17] E. W. Pamungkas, V. Basile, V. Patti, Misogyny
de[6] M. Caprolu, A. Sadighian, R. Di Pietro, Charac- tection in twitter: a multilingual and cross-domain
terizing the 2022 russo-ukrainian conflict through study, Inf. Process. Manag. 57 (2020) 102360. URL:
the lenses of aspect-based sentiment analysis: https://doi.org/10.1016/j.ipm.2020.102360. doi:10.
Dataset, methodology, and preliminary findings, 1016/j.ipm.2020.102360.
2022. URL: https://arxiv.org/abs/2208.04903. doi:10. [18] I. Guellil, A. Adeel, F. Azouaou, S. Chennoufi,
48550/ARXIV.2208.04903. H. Maafi, T. Hamitouche, Detecting hate speech
[7] E. Elejalde, L. Ferres, E. Herder, The nature of real against politicians in arabic community on social
and perceived bias in chilean media, in: Proceed- media, International Journal of Web Information
ings of the 28th ACM Conference on Hypertext Systems (2020).
and Social Media, HT, Association for Computing [19] S. Jaki, T. De Smedt, Right-wing german hate
Machinery, New York, NY, USA, 2017, pp. 95–104. speech on twitter: Analysis and automatic
detecURL: http://doi.acm.org/10.1145/3078714.3078724. tion, arXiv preprint arXiv:1910.07518 (2019).
doi:10.1145/3078714.3078724. [20] A. Ramponi, B. Testa, S. Tonelli, E. Jezek,
Ad[8] Y. Theocharis, W. Lowe, Does Facebook increase dressing religious hate online: from taxonomy
crepolitical participation? Evidence from a field exper- ation to automated detection, PeerJ Computer
Sciiment, Information, Communication &amp; Society 19 ence 8 (2022) e1128. URL: https://doi.org/10.7717/
(2016) 1465–1486. peerj-cs.1128. doi:https://doi.org/10.7717/
[9] O. S,tefănit, ă, D.-M. Buf, Hate speech in social media peerj-cs.1128.</p>
        <p>and its efects on the lgbt community: A review of [21] M. Lai, S. Menini, M. Polignano, V. Russo, R.
Sprugthe current research, Romanian Journal of Commu- noli, G. Venturi, Evalita 2023: Overview of the 8th
nication and Public Relations 23 (2021). evaluation campaign of natural language
process[10] E. Fersini, D. Nozza, P. Rosso, Overview of the ing and speech tools for italian, in: Proceedings
evalita 2018 task on automatic misogyny identifica- of the Eighth Evaluation Campaign of Natural
Lantion (ami), in: Evaluation Campaign of Natural Lan- guage Processing and Speech Tools for Italian. Final
guage Processing and Speech Tools for Italian. Fi- Workshop (EVALITA 2023), CEUR.org, Parma, Italy,
nal Workshop, EVALITA 2018, volume 2263, CEUR, 2023.</p>
        <p>2018. [22] A. Duzha, C. Casadei, M. Tosi, F. Celli, Hate versus
[11] F. Poletto, M. Stranisci, M. Sanguinetti, V. Patti, politics: detection of hate against policy makers in
C. Bosco, Hate speech annotation: Analysis of an italian tweets, SN Social Sciences 1 (2021) 1–15.
italian twitter corpus, in: 4th Italian Conference on [23] F. Giglietto, N. Righetti, G. Marino, L. Rossi,
MultiComputational Linguistics, CLiC-it 2017, volume party media partisanship attention score.
estimat2006, CEUR-WS, 2017, pp. 1–6. ing partisan attention of news media sources using
[12] P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep twitter data in the lead-up to 2018 italian election,
learning for hate speech detection in tweets, in: Comunicazione politica 20 (2019) 85–108.
Proceedings of the 26th International Conference [24] F. Celli, M. Lai, A. Duzha, C. Bosco, V. Patti,
Polion World Wide Web Companion, 2017, pp. 759–760. cycorpus xl: An italian corpus for the detection of
[13] F. Celli, G. Riccardi, A. Ghosh, Corea: Italian news hate speech against politics, in: Proceedings of the
corpus with emotions and agreement., in: Proceed- Eighth Italian Conference on Computational
Linings of CLIC-it 2014, 2014, pp. 98–102. guistics (CLiC-it 2021), volume 3033 of CEUR
Work[14] M. Lai, M. Tambuscio, V. Patti, G. Rufo, shop Proceedings, CEUR-WS.org, Aachen, Germany,
P. Rosso, Stance polarity in political de- 2022. URL: http://ceur-ws.org/Vol-3033/paper38.
bates: A diachronic perspective of network pdf.
homophily and conversations on twitter, Data [25] M. Wiegand, J. Ruppenhofer, T. Kleinbauer,
De&amp; Knowledge Engineering 124 (2019) 101738. tection of Abusive Language: the Problem of
BiURL: https://www.sciencedirect.com/science/ ased Datasets, in: Proceedings of the 2019
Conarticle/pii/S0169023X19300187. doi:https: ference of the North American Chapter of the
As//doi.org/10.1016/j.datak.2019.101738. sociation for Computational Linguistics: Human
[15] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, Language Technologies, Volume 1 (Long and Short
V. Patti, Resources and benchmark corpora for hate Papers), Association for Computational Linguistics,
speech detection: a systematic review, Language Minneapolis, Minnesota, 2019, pp. 602–608. URL:
https://aclanthology.org/N19-1060. doi:10.18653/ M. Guerini, CONAN - COunter NArratives through
v1/N19-1060. nichesourcing: a multilingual dataset of responses
[26] A. Ramponi, S. Tonelli, Features or spuri- to fight online hate speech, in: Proceedings of
ous artifacts? data-centric baselines for fair the 57th Annual Meeting of the Association for
and robust hate speech detection, in: Pro- Computational Linguistics, Association for
Comceedings of the 2022 Conference of the North putational Linguistics, Florence, Italy, 2019, pp.
American Chapter of the Association for Com- 2819–2829. URL: https://aclanthology.org/P19-1271.
putational Linguistics: Human Language Tech- doi:10.18653/v1/P19-1271.
nologies, Association for Computational Linguis- [35] S. Menini, A. P. Aprosio, S. Tonelli, Abuse is
contextics, Seattle, United States, 2022, pp. 3027–3040. tual, what about nlp? the role of context in abusive
URL: https://aclanthology.org/2022.naacl-main.221. language annotation and detection, arXiv preprint
doi:10.18653/v1/2022.naacl-main.221. arXiv:2103.14916 (2021).
[27] M. Sanguinetti, G. Comandini, E. Di Nuovo, [36] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova,
S. Frenda, M. Stranisci, C. Bosco, T. Caselli, V. Patti, G. Karadzhov, H. Mubarak, L. Derczynski, Z. Pitenis,
I. Russo, Overview of the evalita 2020 second hate c. Çöltekin, SemEval-2020 Task 12: Multilingual
Ofspeech detection task (haspeede 2), in: V. Basile, fensive Language Identification in Social Media
(OfD. Croce, M. Di Maro, L. C. Passaro (Eds.), Pro- fensEval 2020), in: Proceedings of SemEval, 2020.
ceedings of the 7th evaluation campaign of Natural
Language Processing and Speech tools for Italian
(EVALITA 2020), CEUR.org, Online, 2020.
[28] L. Grotti, P. Quick, Berticelli at haspeede3:
Finetuning and cross-validating large language models
for hate speech detection, EVALITA 2023 Eigth
Evaluation Campaign of Natural Language
Processing and Speech Tools for Italian (2023) –.
[29] I. Siragusa, R. Pirrone, Chilab at evalita 2023:</p>
        <p>Overview of the taks a textual, EVALITA 2023
Eigth Evaluation Campaign of Natural Language</p>
        <p>Processing and Speech Tools for Italian (2023) –.
[30] C. D. Hromei, D. Croce, V. Basile, R. Basili,
Extremita at evalita 2023: Multi-task sustainable scaling
to large language models at its extreme, EVALITA
2023 Eigth Evaluation Campaign of Natural
Language Processing and Speech Tools for Italian (2023)
–.
[31] V. Hangya, A. Fraserl, Lmu at haspeede3:
Multidataset training for cross-domain hate speech
detection, EVALITA 2023 Eigth Evaluation Campaign
of Natural Language Processing and Speech Tools
for Italian (2023) –.
[32] C. Di Bonaventura, A. Muti, M. A. Stranisci,</p>
        <p>B. McGillivray, A. Meroño-Peñuela, O-dang4 at
hodi and haspeede3: A knowledge-enhanced
approach to homotransphobia and hate speech
detection in italian, EVALITA 2023 Eigth
Evaluation Campaign of Natural Language Processing and</p>
        <p>Speech Tools for Italian (2023) –.
[33] M. A. Stranisci, S. Frenda, M. Lai, O. Araque, A. T.</p>
        <p>Cignarella, V. Basile, C. Bosco, V. Patti, O-dang! the
ontology of dangerous speech messages, in:
Proceedings of the 2nd Workshop on Sentiment
Analysis and Linguistic Linked Data, European Language
Resources Association, Marseille, France, 2022, pp.</p>
        <p>2–8. URL: https://aclanthology.org/2022.salld-1.2.
[34] Y.-L. Chung, E. Kuzmenko, S. S. Tekiroglu,</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] CENSIS, 50º rapporto sulla situazione sociale del paese</source>
          <year>2016</year>
          ,
          <string-name>
            <given-names>Franco</given-names>
            <surname>Angeli</surname>
          </string-name>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Conover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ratkiewicz</surname>
          </string-name>
          , M. Francisco,
          <string-name>
            <given-names>B.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Menczer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Flammini</surname>
          </string-name>
          , Political polarization on Twitter, in:
          <source>International AAAI Conference on Web and Social Media</source>
          ,
          <string-name>
            <surname>ICWSM</surname>
          </string-name>
          ,
          <source>Association for the Advancement of Artificial Intelligence</source>
          , Palo Alto, CA, USA,
          <year>2011</year>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Watanabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bouazizi</surname>
          </string-name>
          , T. Ohtsuki,
          <article-title>Hate speech on twitter: A pragmatic approach to collect hateful and ofensive expressions and perform hate speech detection</article-title>
          ,
          <source>IEEE access 6</source>
          (
          <year>2018</year>
          )
          <fpage>13825</fpage>
          -
          <lpage>13835</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Celli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Stepanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poesio</surname>
          </string-name>
          , G. Riccardi,
          <article-title>Predicting brexit: Classifying agreement is better than sentiment and pollsters</article-title>
          .,
          <source>in: PEOPLES@ COLING</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>110</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>