=Paper=
{{Paper
|id=Vol-3878/49_main_long
|storemode=property
|title=The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis
|pdfUrl=https://ceur-ws.org/Vol-3878/49_main_long.pdf
|volume=Vol-3878
|authors=Ibai Guillén-Pacho,Arianna Longo,Marco Antonio Stranisci,Viviana Patti,Carlos Badenes-Olmedo
|dblpUrl=https://dblp.org/rec/conf/clic-it/Guillen-PachoLS24
}}
==The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis==
The Vulnerable Identities Recognition Corpus (VIRC) for Hate
Speech Analysis
Ibai Guillén-Pacho1,*,† , Arianna Longo2,3,† , Marco Antonio Stranisci2,3 , Viviana Patti2 and
Carlos Badenes-Olmedo1,4
1
Ontology Engineering Group, Universidad Politécnica de Madrid, Spain
2
University of Turin, Italy
3
Aequa-tech, Torino, Italy (aequa-tech.com)
4
Computer Science Department, Universidad Poltécnica de Madrid, Spain
Abstract
This paper presents the Vulnerable Identities Recognition Corpus (VIRC), a novel resource designed to enhance hate speech analysis in
Italian and Spanish news headlines. VIRC comprises 880 headlines, manually annotated for vulnerable identities, dangerous discourse,
derogatory expressions, and entities. Our experiments reveal that recent large language models (LLMs) struggle with the fine-grained
identification of these elements, underscoring the complexity of detecting hate speech. VIRC stands out as the first resource of its kind
in these languages, offering a richer annotation scheme compared to existing corpora. The insights derived from VIRC can inform
the development of sophisticated detection tools and the creation of policies and regulations to combat hate speech on social media,
promoting a safer online environment. Future work will focus on expanding the corpus and refining annotation guidelines to further
enhance its comprehensiveness and reliability.
Keywords
hate speech, vulnerable identities, annotated corpora
1. Introduction et al. [10], which treat the identification of HS targets as a
span-based task.
Hate Speech (HS) detection is a task with a high social impact. In order to fill this gap, we present the Vulnerable Identi-
Developing technologies that are able to recognize these forms ties Recognition Corpus (VIRC): a dataset of 880 Italian and
of discrimination is not only crucial to enforce existing laws Spanish headlines against migrants aimed at providing an
but it also supports important tasks like the moderation of event-centric representation of HS against vulnerable groups.
social media contents. However, recognizing HS is challeng- The annotation scheme is built on four elements:
ing. Verbal discrimination takes different forms and involves a
number of correlated phenomena that make difficult to reduce • Named Entity Recognition (NER). All the named
HS as a binary classification. entities that are involved in a HS expression: ‘location’,
Analyzing the recent history of corpora annotated for HS it ‘organization’, and ‘person’.
is possible to observe the shift from very broad categorizations • Vulnerable Identity mentions. Generic mentions
of hatred contents to increasingly detailed annotation schemes related to identities target of HS as they are defined by
aimed at understanding the complexity of this phenomenon. the international regulatory frameworks 1 : ‘women’,
High-level schemes including dimensions like “hateful/offen- ‘LGBTQI’, ‘ethnic minority’, and ‘migrant’.
siveness” [1] or “sexism/racism” [2] paved the way for more • Derogatory mentions. All mentions that negatively
sophisticated attempts to formalize such concepts in different portray people belonging to vulnerable groups.
directions: exploring the interaction between HS and vulnera- • Dangerous speech. The part of the message that is
ble targets [3, 4, 5]; studying the impact of subjectivity [6, 7]; perceived as hateful against named entities or vulner-
identifying the triggers of HS in texts [8, 9]. able identities.
Despite this trend, the complex semantics of HS in texts
is far from being fully explored. Information Extraction (IE) In this paper we present a preliminary annotation experi-
approaches to HS annotation have been rarely implemented, ment intended to validate the scheme and to assess the impact
yet. Therefore, corpora that includes fine-grained structured on disagreement in such a fine-grained task. The paper is
semantic representation of HS incidents are not available. The structured as follows. In Section 2, we discuss related work,
only notable exception is the recent work of Büyükdemirci in Section 3, we describe the methodology used, in Section 4,
we introduce the VIRC corpus, and in Section 5, we present
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, the conclusions and discuss possible future work.
Dec 04 — 06, 2024, Pisa, Italy
*
Corresponding author.
†
These authors contributed equally. 2. Related Work
$ ibai.guillen@upm.es (I. Guillén-Pacho);
arianna.longo401@edu.unito.it (A. Longo); Literature on automatic HS detection is vast and follows differ-
marcoantonio.stranisci@unito.it (M. A. Stranisci); viviana.patti@unito.it ent research directions [11]: from the analysis of subjectivity
(V. Patti); carlos.badenes@upm.es (C. Badenes-Olmedo)
https://iguillenp.github.io/ (I. Guillén-Pacho);
in the perception of this phenomenon [12] to the definition of
https://marcostranisci.github.io/ (M. A. Stranisci); ever more refined categorizations of hateful contents [13]. In
https://www.unito.it/persone/vpatti (V. Patti); https://about.me/cbadenes this section we focus on the approaches to HS detection that
(C. Badenes-Olmedo) are aimed at studying the target of HS inspired by Informa-
0000-0001-7801-8815 (I. Guillén-Pacho); 0009-0005-8500-1946 tion Extraction (IE) approaches. In Section 2.1 we review HS
(A. Longo); 0000-0001-9337-7250 (M. A. Stranisci); 0000-0001-5991-370X
(V. Patti); 0000-0002-2753-9917 (C. Badenes-Olmedo) 1
https://www.coe.int/en/web/combating-hate-speech/
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution
4.0 International (CC BY 4.0). recommendation-on-combating-hate-speech
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
resources inspired by this approach with a specific focus on offensive expressions, totaling 765 tweets in English and 765
span-based annotated corpora. In Section 2.2 we discuss the tweets in Turkish.
implementation of NER-based techniquest in the creation of
HS corpora. 2.2. Named Entity Recognition
Developed as a branch of Information Extraction (IE), Named
2.1. Hate Speech Detection
Entity Recognition (NER) is a field of research aimed at de-
A large amount of work on HS detection focuses on clas- tecting named entities in documents according to different
sification, both binary (existence or not) and multi-labeled schemes. Following the review of Jehangir et al. [21], it is
(misogyny, racism, xenophobia, etc.). This has led to the exis- possible to observe general-purpose schemes, which usually
tence of large collections of datasets such as those grouped by includes entities of the type ‘person’, ‘location’, ‘organiza-
[14]. One of the main problems is that most resources are in tion’ and ‘time’, and schemes defined for specific applications.
English, and for mid-to-low resource languages (e.g., Italian), OntoNotes [22] is an example of the first type of approach: a
some HS categories are not covered. This constraint is miti- broad collection of documents gathered from different sources
gated by cross-lingual transfer learning to exploit resources in (e.g., newspaper, television news) annotated with a tagset
other languages [15] and, although good results are achieved, that includes general categories of named entities. On the
the creation of resources for these languages is still necessary. other hand, more specific applications include biomedical
The main resources for the identification of HS are par- NER, which focuses on identifying entities relevant to the
ticularly focused on a target by identifying the presence or biomedical field, such as diseases, genes and chemicals. An
absence of HS in them. As in the work of [16], where in 1,100 example in this field is the JNLPBA dataset[23], which is de-
tweets in Italian with special target on immigrants were an- rived from the GENIA corpus. This dataset consists of 2,000
notated according to the presence of HS, irony, and the stance biomedical abstracts from the MEDLINE database, annotated
of the message’s author on immigration matters. However, with detailed entity types such as proteins, DNA, RNA, cell
recently, there has been an increasing focus on identifying lines and cell types.
hateful expressions and their intended targets. The change in NER-based approaches for HS detection and analysis are
paradigm suggests that resources should be wider in scope and still few. ElSherief et al. [24] exploited Twitter users’ mentions
not focus on a particular discourse target. The main resources to distinguish between directed and generalized forms of HS.
in this field have high linguistic diversity, although they do Rodríguez-Sánchez et al. [25] used derogatory expressions of
not all follow the same annotation scheme, with English being women as seeds to collect misogynist messages according to
the most common language. We have found works in English a fine grained classification of this phenomenon. [26] adopted
[17]; Vietnamese[18]; Korean [19]; English and Turkish [10]; a similar methodology to collect tweets about 3 vulnerable
and English, French, and Arabic [20]. However, we have not groups to discrimination: ethnic minorities, religious minori-
found any in Italian or Spanish, which we believe makes this ties, and Roma communities. Piot et al. [14] analyzed the
work the first to cover these languages for this task. correlation between the presence of HS and named entities
Two main annotation approaches can be drawn from these in 60 existing datasets. Despite these previous works, there
studies, those that annotate at the span level [17, 18, 19, 10] and are no attempts to define a NER-based scheme specifically
those that annotate over the full text [20]. On the one hand, intended for HS detection. Our work represents an attempt
the work that follows the latter approach presents a corpus of to fill this gap by combining categories from general-purpose
13.000 tweets (5.647 English, 4.014 French, and 3.353 Arabic) NER and a taxonomy of vulnerable groups to discrimination
and notes the sentiment of the annotator (shock, sadness, in a common annotation scheme aimed at providing deeper
disgust, etc.), hostility type (abusive, hateful, offensive, etc.), insights about the targets of HS.
directness (direct or indirect), target attribute (gender, religion,
disabled, etc.) and target group (individual, women, African,
etc.). 3. Methodology
On the other hand, works that follow the approach of span
annotation design different annotation criteria. The simplest, 3.1. Data Collection
[17, 18], only annotates one dimension. The first, [17], anno- We collect news from public Telegram channels with the
tates the parts that make a comment toxic on a 30.000 English telegram-dataset-builder [27]. The selected channels are
comments of the Civil Comments platform. The second, [18], shown in Table 1, they are in Spanish and Italian and aligned
annotates only the parts that make a comment offensive or with the left and right wings of the political spectrum. The
hateful in 11.000 Vietnamese comments on Facebook and subset of Italian headlines was integrated with titles published
Youtube. The other papers, [19, 10], extend this approach on newspapers Facebook pages that have been collected in
and also label the span in which the target of the attack is collaboration with the Italian Amnesty Task Force on HS, a
mentioned. Moreover, [19] is not limited to that; they also group of activists that produce counter narratives against dis-
annotate the target type (individual, group, other), the tar- criminatory contents spread by online newspapers and users
get attribute (gender, race, ethnic, etc.) and the target group comments2 . We collected all the news headlines detected by
(LGBTQ +, Muslims, feminists, etc.). Their final corpus has activists in March 2020, 2021, 2022, and 2023, and added them
20.130 annotated offensive Korean-language news and video to our corpus.
comments. Given the large amount of news collected, we applied filters
However, the guidelines used by the different works some- to the dataset to reduce it to its final size. We focus on news
times present incompatibilities. Although some works use about racism; for this purpose, we applied the classifier piuba-
offensive and hateful labels in the same way [19, 18], others bigdata/beto-contextualized-hate-speech to stick to news items
distinguish between these two types of expression [10]. This labeled as racism. Since this classifier is trained on Spanish
resource, the last one, has separately annotated hateful and
2
https://www.amnesty.it/entra-in-azione/task-force-attivismo/
Migranti, un esercito di scrocconi: 120mila mantenuti con l’8 per mille degli italiani.3
Hordas de gitanos arrasan Mercadona después de que les ingresen 3000 euros en sus ‘tarjetas solidarias’.4
Questa è Villa Aldini, la residenza di lusso che ospita i migranti stupratori a Bologna.5
Vulnerable identity - Migrants Derogatory Entity - Location
Vulnerable identity - Ethnic minority Dangerous speech Entity - Organization
Figure 1: Examples of annotated headlines
Left-wing Right-wing provide details about the entity in a free-text field.
Spanish elpais_esp, smolny7 MediterraneoDGT,
The final layers of the annotation scheme address the con-
elmundoes
text in which these entities are mentioned, specifically fo-
Italian ByobluOfficial, terzaroma, mar- cusing on identifying derogatory mentions and dangerous
sadefenza cellopamio, ilpri- speech.
matonazionaleIPN, A derogatory mention is characterized by negative or dis-
VoxNewsInfo paraging remarks about the target. In these instances, explicit
hate speech is absent, but the mention itself is discriminatory
Table 1
Telegram channels from which the news have been extracted. or offensive, often employing a tone intended to belittle or
discredit the target. The label derogatory is used to mark
these mentions.
Moreover, the annotation includes identifying dangerous el-
texts, prior to this step we automatically translated Italian ements: portions of text that, intentionally or unintentionally,
news with the model facebook/nllb-200-distilled-600M. This could incite hate speech or increase the vulnerability of the
translation step is used only for the filtering process; once the target identity. Dangerous speech, which can be either explicit
news is selected, the translated text is no longer used. In the or implicit, promotes or perpetuates negative prejudices and
end, this process generates 532 news headlines classified as stereotypes, potentially triggering harmful responses against
racist for Italian and 348 for Spanish, that have been selected the group. The label dangerous [28] is used to tag these
for the annotation task. segments. Annotators were encouraged to use free-text fields
to provide details on implicit dangerous speech or recurring
3.2. Data Annotation dangerous concepts.
The annotation guidelines provided annotators with spe-
A comprehensive, span-based annotation scheme was devel-
cific criteria and with the following list of potential markers
oped to label vulnerable identities and entities present in the
of dangerous speech to help their identification:
dataset. Annotators were provided with instructions and had
to choose a label and highlight the word, phrase, or portion • Incitement to violence: the text explicitly encour-
of text that best embodied the qualities of the chosen label ages violence against the target group;
in the text. It was possible to choose more than one label • Open discrimination: the text openly states or sup-
for the same portion of text. The instructions also provided ports discrimination against the target group;
annotators with some examples of annotated headlines. • Ridicule: the text ridicules the target in the eyes of
The initial layer of annotation focuses on identifying vul- the readers by belittling it or mocking it;
nerable targets within the text and categorizing them into
• Stereotyping: the text perpetuates negative stereo-
one of six predefined labels: ethnic minority, migrant, reli-
types about the target group, contributing to a dis-
gious minority, women, LGBTQ+ community, and other.
torted view of it;
These labels represent vulnerable groups, as the vulnerability
• Disinformation: the text spreads false or misleading
of the targets can often be traced back to their belonging to
information that can harm the target group;
certain categories of people which are particularly exposed
to discrimination, marginalisation, or prejudice in society. In • Dehumanization: the text dehumanizes the target
cases where the targeted group didn’t fit into one of the pre- group, using language that equates it with objects or
defined labels, annotators were required to use the ‘other’ animals;
category. Then, for instances labeled as ‘other’, annotators • Criminalization: the text portrays the target group
were instructed to provide specific details regarding the group as inherently criminal or associates it with illegal ac-
in a free-text field. tivities, contributing to the perception that the group
After categorizing vulnerable targets, the second layer in- as a whole is dangerous.
volves annotating named entities. Annotators identify entities
However, a text may still be considered dangerous even if it
within the text and label them with one of five possible types:
does not explicitly include these markers, as they are intended
person, group, organization, location, and other. As in
as examples rather than strict requirements.
the first layer, instances labelled ‘other’ require annotators to
Figure 1 provides three examples of annotated headlines,
two in Italian and one in Spanish, showing the application
2
“Migrants, an army of scroungers: 120,000 supported by the Italians’ of the annotation scheme as described. In the figure, dif-
8x1000 tax allocation”. ferent colours highlight the various types of labels used. A
3
“Hordes of gypsies devastate Mercadona after 3000 euros were deposited
in their solidarity cards”.
vulnerable identity was detected in each headline: ‘Migranti’
4
“This is Villa Aldini, the luxury residence that hosts rapist migrants in in the first and in the third one and ‘gitanos’ in the second
Bologna”. one, respectively labelled as ‘vulnerable group - migrant’ and
‘vulnerable group - ethnic minority’. The three examples all this would be a full agreement (1 true positive). However,
contain multiple elements of dangerous speech, highlighted in if the latter selected “women of Italy”, it would be a partial
red, and the second text also contains an element which was agreement (0.5 true positive).
marked with the derogatory label. Additionally, the second
and the third headlines include examples of annotation for Quantitative Analysis. The agreement on the annotation
named entities, with ‘Mercadona’ labelled as ‘entity - organi- of entities is always moderate but differs between the Span-
zation’, and ‘Villa Aldini’ and ‘Bologna’ labelled as ‘entity - ish and the Italian subsets. Annotators of Spanish headlines
location’. scored a higher agreement on ‘location’ (0.66 vs 0.60), ‘vul-
nerable’ (0.15 vs 0) and ‘organization’ (0.41 vs 0.12) while
entities of the type ‘person’ (0.63 vs 0.47) and ‘other’ (0.1 vs
4. The VIRC Corpus 0) are better recognized in Italian headlines.
The VIRC corpus is a collection of 532 Italian and 348 Spanish On average, the annotation of vulnerable identities resulted
news headlines annotated by 2 independent annotators for in a higher agreement between annotators in both subsets
each language. Following the perspectivist paradigm [29], and at the same time confirmed an higher agreement of Span-
we both released the disaggregated annotations and the gold- ish annotations that always outperforms Italian ones. The
standard corpus. The code used to generate the gold standard highest agreement emerges for the label ‘migrant’ on which
corpus, carry out experiments, and compile statistics can be annotators obtained an F-score of 0.86 for Italian and 0.96 for
accessed through the following GitHub repository6 . In this Spanish. The agreement on ‘ethnic minority’ is a bit lower but
Section we present an analysis of disagreement (Section 4.1) still significant, while Spanish headlines reached an F-score of
and relevant statistics about the corpus (Section 4.2). 0.83 Italian ones only 0.63. An equally high agreement is on
the ‘lgbtq+’ label, which is only present in Italian headlines
with an F-score of 0.8. Among vulnerable groups, women
4.1. Inter-Annotator Agreement scored the lowest F-score: 0.6 for Spanish, 0.22 for Italian.
Since the span-based annotation task does not provide a fixed The largest observed discrepancy is with religious minorities,
number of annotated items, we adopted the F-score metric to in Spanish an F-score of 1 is achieved while in Italian 0.
evaluate the agreement between annotators [30]. For each sub- While the annotation of ‘dangerous’ spans achieves an ac-
set of the corpus we randomly chose one annotator as the gold ceptable agreement, the ‘derogatory’ annotation is character-
standard set of labels and the other as the set of predictions. ized as the one that achieves the lowest agreement between
We then computed the F-score between the two distributions annotators. Additionally, annotations of Italian headlines re-
of labels in order to measure the agreement between the an- sulted in higher disagreement than Spanish ones, contrary
notators. Table 2 shows the results of our analysis. In general, to what we observed about ‘entities’ and ‘vulnerable identi-
annotations always showed a fair or higher agreement, ex- ties’. Text spans expressing dangerous speech are recognized
cept for some entity-related labels and the “derogatory” one. with an agreement of 0.57 for Italian and 0.49 for Spanish
There is also a low agreement in the Italian set on the labels headlines. Agreement about ‘derogatory’ is low for Italian
“religious minority” and “women”. headlines (0.28) while Spanish ones show almost no agree-
ment (0.08)
IAA (F-score)
Spanish Italian Qualitative Analysis. In summary, while the overall re-
dangerous 0.49 0.57 sults of the annotation are positive, some categories show
derogatory 0.08 0.28
significant disagreement between annotators. These disagree-
entity - group 0.0 0.00
entity - location 0.66 0.60 ments highlight the need to review and refine the annotation
entity - organization 0.41 0.12 guidelines for problematic categories, and to provide more
entity - other 0.0 0.10 detailed instructions. The importance of reassessing the guide-
entity - person 0.47 0.63 lines in order to make them clearer and more consistent is
vulnerable entity 0.15 0.00 further underscored by the fact that, for Spanish headlines,
vulnerable group - ethnic minority 0.83 0.63 the annotators agreed on both labels and intervals in only 67
vulnerable group - lgbtq+ community - 0.80 cases, and for Italian headlines, agreement was reached in just
vulnerable group - migrant 0.96 0.86 88 cases.
vulnerable group - other 0.46 0.41 Since the annotation task was span-based, we opted not
vulnerable group - religious minority 1.0 0.00
to use a confusion matrix to analyze the disagreement. A
vulnerable group - women 0.6 0.22
confusion matrix is not appropriate for span detection, as it
Table 2 assumes discrete labels applied to predefined items, whereas
The annotators agreement measured through the F-score and bro- our task involved labeling spans of text that varied in length
ken down by label. and context. Instead, we performed a qualitative analysis,
examining specific cases of disagreement to understand their
Although the overall results are positive, they show signif- nature. This approach allowed us to explore not only how
icant variations that can be quantitatively and qualitatively. annotators differed in labeling spans but also why these differ-
Inclusion of overlapping spans was handled as follows: if ences emerged, providing a deeper insight into the underlying
one span fully included another, this was considered to be an issues of interpretation and guidelines.
agreement. In cases where the spans only partially overlapped, Looking more closely at the headlines where the annota-
meaning there was some shared text but not full inclusion, this tions present inconsistencies, a variety of motivations behind
was treated as a partial agreement. For example, if one anno- discrepancies can be identified.
tator labeled “All women” and another selected only “women”, For instance, in the Italian title “Orrore nella casa occu-
6
https://github.com/oeg-upm/virc
pata dagli immigrati: donna lanciata giù dal secondo piano”7 , Spanish Italian
dangerous 136 166
‘donna’ was marked as a vulnerable identity by only one of
derogatory 3 16
the annotators, suggesting maybe an erroneous focus on an entities 140 146
individual target at a time (‘immigrati’) by the other annotator. vulnerable groups 270 253
Another type of disagreement relates to the interpretation
of derogatory mentions. An example can be found in “Un Table 3
terzo dei reati sono commessi da stranieri (e gli africani hanno The distribution of labels in the gold standard corpus.
il record). Tutti i numeri”8 , where one annotator identified the
term ‘stranieri’ as a derogatory mention, as well as represen-
tative of a vulnerable identity, while another annotator simply 4.2. Dataset Analysis
stuck to the second label, perhaps highlighting a divergence In this section we provide an analysis of the four label types
in the interpretation of the guidelines. Furthermore, it is inter- that occur in the gold standard version of the VIRC corpus:
esting to observe the disagreement created by the headlines ‘derogatory’, ‘dangerous’, ‘named entities’, ‘vulnerable groups’.
that use generic term ‘stranieri’ (‘foreigners’), which was of- The analysis is twofold: first, we describe the distribution of
ten labelled as ‘vulnerable identity - ethnic minority’ by one these label types, then we present a zero-shot and a few-
annotator and as ‘vulnerable identity - migrant’ by the other. shot experiment aimed at understanding if existing LLMs
This inconsistence between annotators can be identified in (T5[31] and BART[32]) are able to recognize these labeled
two headlines: “Ius soli e cittadinanza facile agli stranieri? Il spans in news headlines by comparing their outputs to the
sangue non è acqua”9 and “Un terzo dei reati sono commessi gold standard annotations.
da stranieri (e gli africani hanno il record). Tutti i numeri”2 . In
the first case, we can solve the disagreement by looking at the
Corpus statistics. Table 3 shows the distribution of label
context: the explicit reference to the issue of granting citizen-
types in the corpus. As it can be observed, mentions of vulner-
ship suggests that the term ‘foreigners’ is more appropriately
able groups are the most present, with 270 occurrences in the
referred to the specific category of migrants. On the other
Spanish subset and 253 in the Italian subset. This confirms
hand, in the second headline, there is no direct reference to
the relevance of annotating vulnerable in the identification
specifically migration-related issues and thus both interpre-
of discriminatory contents, which is tied to their high rec-
tations in terms of the vulnerable category of belonging are
ognizability by annotators (Section 4.1). The role on named
acceptable.
entities differs in the two subsets. Annotators labeled them
Finally, some texts present a slight difference in the anno-
with agreement 130 times in Spanish headlines and 67 times
tation spans of choice, as observed in “Più di 200mila case
in Italian ones. This might be caused by their compositions.
popolari agli immigrati”10 , where the annotators identified
Since Italian headlines were partly collected from Facebook
dangerous speech in the same section of text, but with dif-
pages of mainstream newspapers, there was a higher num-
ferences in the number of highlighted words (first annotator
ber of named entities that were not relevant for the analysis
labelled ‘Più di 200mila’; second annotator labelled ‘200mila
of headlines’ danger. The number of text spans labeled as
case popolari’), reflecting variations in the identification of
dangerous is almost equivalent in the two subsets (136 for
relevant content for the analysis of dangerous speech.
Spanish, 166 for Italian), showing a good presence of this
In addition to the predefined labels, we also collected free-
label type despite the high disagreement between annotators.
text fields as part of the annotation process. These comments
Finally, it is worth mentioning the almost total absence of text
offered an additional layer of granularity, allowing annota-
spans labeled as ‘derogatory’ with agreement (3 for Spanish,
tors to describe nuances not covered by the fixed categories.
16 for Italian) that suggests the high subjectivity of this phe-
For example, in the Spanish headline “Dos menas marroquíes
nomenon and also the need of better define its characteristics
apuñalan a dos turistas para robarles en Salou”11 , both an-
in annotation guidelines.
notators used the two labels ‘vulnerable identity - ethnic mi-
nority’ and ‘vulnerable identity - other’ to annotate the span
‘menas marroquíes’. Alongside the ‘other’ label, one annotator Corpus analysis with LLMs. We completed our analysis
provided the comment ‘Under 18’, while the other one used of the VIRC corpus through zero-shot experiments aimed at
‘young people’ to describe the vulnerable group. Although exploring the ability of existing LLMs to identify the four
stated differently, both comments highlight the specific vul- types of labelled spans in messages. We considered the de-
nerability related to the age of the group, complementing the tection of spans as an extractive Question Answering (QA)
existing labels. As this example shows, the flexibility in the problem. For the task we adopted the T5[31] and BART[32]
annotation process provided by free-text fields is useful to LLMs architectures for both languages. For Italian we employ
capture multi-categorical terms and to identify potential new [33] and [34] and for Spanish [35] and [36] models, respec-
categories that may not have been initially considered in the tively. The translations of the prompts used are the following
predefined labels. (see Appendix A for the original ones):
• What part of the text is dangerous (criminalizes,
ridicules, incites violence, ...) against vulnerable iden-
tities (women, migrants, ethnic minorities, ...)?
7
• What part of the text is derogatory (negative or pe-
“Atrocity in a house occupied by migrants: woman thrown from second
jorative comments about the victim without explicit
floor”.
8
“One third of all crimes are committed by foreigners (and Africans hold hate speech, but the mention itself is discriminatory or
the record). All the numbers”. offensive, and often uses a tone intended to denigrate
9
“Ius soli and easy citizenships for foreigners? Blood is not water”. or discredit the victim)?
10
“More than 200,000 public housing units for immigrants”.
11
“Two Moroccan unaccompanied migrant minors stab two tourists to rob
• What named entity is mentioned in the sentence?
them in Salou”.
Non-Restictive Zero-Shot Restictive Zero-Shot
T5 BART T5 BART
Spanish Italian Spanish Italian Spanish Italian Spanish Italian
dangerous 0.39 0.28 0.43 0.39 0.49 0.47 0.51 0.43
derogatory 0.02 0.05 0.03 0.04 0.67 0.43 0.50 0.33
entity 0.28 0.11 0.23 0.23 0.40 0.30 0.30 0.27
vulnerable identity 0.63 0.19 0.41 0.48 0.56 0.18 0.35 0.37
Table 4
F-score results of zero-shot experiments on the VIRC corpus with T5 and BART models for each label.
• Which hate speech vulnerable identity is mentioned VIRC provides a detailed and structured resource that en-
in the sentence? hances understanding of the extensive use of hate speech in
Italian and Spanish news headlines. The corpus is particularly
We designed two approaches for zero-shot experiments, valuable as it includes more annotation dimensions compared
restictive and non-restrictive. On the one hand, for the non- to related studies in other languages, such as vulnerable identi-
restictive zero-shot experiments, for each sentence in the ties, dangerous discourse, derogatory expressions, and entities.
dataset, we queried the model with the prompt of each label This differentiation between vulnerable identities and enti-
and extracted the three most confident results. Then, we ties, as well as between dangerous and derogatory elements,
filtered out those responses below the %0.02 confidence of enables the development of sophisticated detection tools that
the model to limit the noise. Finally, all these annotations go can facilitate large-scale actions to mitigate the impact of
through a majority vote (identical to the one used to build the hate speech (e.g., moderation of messages and generation
aggregate dataset) to normalize the model response. of counter-narratives that reduce the damage to the mental
On the other hand, for the restictive zero-shot experi- health of victims).
ments, we queried the model with the prompts for each an- Future work will focus on expanding this resource by dou-
notation present in the aggregated dataset. And, as there are bling the size of annotations for both languages and including
sentences that have two equal labels in different spans, we non-racism-related phrases to ensure the resource is com-
request five different annotations from the model, ordered prehensive. Additionally, we plan to refine the annotation
from most confident to least confident. If an annotation was guidelines to avoid low agreement on the derogatory label, en-
already included, the next annotation is taken in order to hancing the overall reliability and utility of the corpus. These
avoid duplicating annotations in the model. efforts will further improve the effectiveness of hate speech
Table 4 presents the F-scores for each label type, experi- detection and contribute to the development of policies and
ment, and model. In general, T5 and BART tend to perform tools for a safer online environment.
more effectively in Spanish compared to Italian. The models
face noticeable challenges in identifying the labels ‘danger-
ous’, ‘derogatory’, and ‘entity’. Nevertheless, when they are Acknowledgments
aware that the label exists within the sentence (restictive),
they manage to recognize it with fairly good agreement. Dur- This work is supported by the Predoctoral Grant (PIPF-
ing annotation, the label ‘derogatory’ proves most challenging 2022/COM-25947) of the Consejería de Educación, Ciencia
to identify. In the non-restrictive scenario, it scarcely receives y Universidades de la Comunidad de Madrid, Spain. Arianna
any agreement, yet in the restictive scenario, it achieves a Longo’s work has been supported by aequa-tech. The au-
reasonable level, particularly in Spanish. This indicates that thors gratefully acknowledge the Universidad Politécnica de
the model struggles to discern its presence initially but, once Madrid (www.upm.es) for providing computing resources on
acknowledged, can recognise the expression. the IPTC-AI innovation Space AI Supercomputing Cluster.
The restictive method enhances performance over the non-
restictive method for all labels except ‘vulnerable identity.’
This shows that models generally have a better comprehension
References
and identification of vulnerable identities in sentences without [1] T. Davidson, D. Warmsley, M. Macy, I. Weber, Auto-
restrictions compared to when they are restricted to specific mated hate speech detection and the problem of offen-
mentions. It should also be noted that, in the Spanish context, sive language, in: Proceedings of the international AAAI
T5 is more effective than BART in identifying ‘vulnerable conference on web and social media, volume 11, 2017,
identity’ labels for both approaches, while BART performs pp. 512–515.
better in Italian. [2] Z. Waseem, Are you a racist or am i seeing things?
These results show that a NER-based annotation scheme annotator influence on hate speech detection on twit-
for HS detection is difficult to annotated but also to be auto- ter, in: Proceedings of the first workshop on NLP and
matically detected. Larger resources are necessary to develop computational social science, 2016, pp. 138–142.
models that are able to detect the complex semantics of HS. [3] M. ElSherief, C. Ziems, D. Muchlinski, V. Anupindi,
J. Seybolt, M. De Choudhury, D. Yang, Latent hatred: A
5. Conclusions and Future Work benchmark for understanding implicit hate speech, in:
Proceedings of the 2021 Conference on Empirical Meth-
The Vulnerable Identities Recognition Corpus (VIRC), created ods in Natural Language Processing, 2021, pp. 345–363.
in this work, reveals the challenge of identifying vulnerable [4] B. Vidgen, D. Nguyen, H. Margetts, P. Rossini,
identities due to the rapid evolution of language on social R. Tromble, Introducing cad: the contextual abuse
media. Our experiments indicate that large language models dataset, in: Proceedings of the 2021 Conference of the
(LLMs) struggle significantly with this task.
North American Chapter of the Association for Com- [16] M. Madeddu, S. Frenda, M. Lai, V. Patti, V. Basile, Dis-
putational Linguistics: Human Language Technologies, aggreghate it corpus: A disaggregated italian dataset of
2021, pp. 2289–2303. hate speech, in: F. Boschetti, G. E. Lebani, B. Magnini,
[5] P. Chiril, E. W. Pamungkas, F. Benamara, V. Moriceau, N. Novielli (Eds.), Proceedings of the Ninth Italian Con-
V. Patti, Emotionally informed hate speech detection: A ference on Computational Linguistics (CLiC-it 2023).,
multi-target perspective, Cogn. Comput. 14 (2022) 322– volume 3596, 2023.
352. URL: https://doi.org/10.1007/s12559-021-09862-5. [17] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopou-
doi:10.1007/S12559-021-09862-5. los, SemEval-2021 task 5: Toxic spans detection, in:
[6] M. Sap, D. Card, S. Gabriel, Y. Choi, N. A. Smith, The A. Palmer, N. Schneider, N. Schluter, G. Emerson, A. Her-
risk of racial bias in hate speech detection, in: Proceed- belot, X. Zhu (Eds.), Proceedings of the 15th Inter-
ings of the 57th annual meeting of the association for national Workshop on Semantic Evaluation (SemEval-
computational linguistics, 2019, pp. 1668–1678. 2021), Association for Computational Linguistics, On-
[7] P. Sachdeva, R. Barreto, G. Bacon, A. Sahn, C. Von Va- line, 2021, pp. 59–69. URL: https://aclanthology.org/2021.
cano, C. Kennedy, The measuring hate speech corpus: semeval-1.6. doi:10.18653/v1/2021.semeval-1.6.
Leveraging rasch measurement theory for data perspec- [18] P. G. Hoang, C. D. Luu, K. Q. Tran, K. V. Nguyen,
tivism, in: Proceedings of the 1st Workshop on Perspec- N. L.-T. Nguyen, ViHOS: Hate speech spans detec-
tivist Approaches to NLP@ LREC2022, 2022, pp. 83–94. tion for Vietnamese, in: A. Vlachos, I. Augenstein
[8] B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, (Eds.), Proceedings of the 17th Conference of the Eu-
A. Mukherjee, Hatexplain: A benchmark dataset for ropean Chapter of the Association for Computational
explainable hate speech detection, in: Proceedings of Linguistics, Association for Computational Linguistics,
the AAAI conference on artificial intelligence, volume 35, Dubrovnik, Croatia, 2023, pp. 652–669. URL: https:
2021, pp. 14867–14875. //aclanthology.org/2023.eacl-main.47. doi:10.18653/
[9] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopou- v1/2023.eacl-main.47.
los, Semeval-2021 task 5: Toxic spans detection, in: [19] Y. Jeong, J. Oh, J. Lee, J. Ahn, J. Moon, S. Park,
Proceedings of the 15th international workshop on se- A. Oh, KOLD: Korean offensive language dataset,
mantic evaluation (SemEval-2021), 2021, pp. 59–69. in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.), Pro-
[10] K. Büyükdemirci, I. E. Kucukkaya, E. Ölmez, C. Toraman, ceedings of the 2022 Conference on Empirical Meth-
JL-Hate: An Annotated Dataset for Joint Learning of ods in Natural Language Processing, Association
Hate Speech and Target Detection, in: N. Calzolari, M.-Y. for Computational Linguistics, Abu Dhabi, United
Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceed- Arab Emirates, 2022, pp. 10818–10833. URL: https://
ings of the 2024 Joint International Conference on Com- aclanthology.org/2022.emnlp-main.744. doi:10.18653/
putational Linguistics, Language Resources and Eval- v1/2022.emnlp-main.744.
uation (LREC-COLING 2024), ELRA and ICCL, Torino, [20] N. Ousidhoum, Z. Lin, H. Zhang, Y. Song, D.-Y. Yeung,
Italia, 2024, pp. 9543–9553. Multilingual and multi-aspect hate speech analysis, in:
[11] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the
V. Patti, Resources and benchmark corpora 2019 Conference on Empirical Methods in Natural Lan-
for hate speech detection: a systematic review, guage Processing and the 9th International Joint Confer-
Lang. Resour. Evaluation 55 (2021) 477–523. ence on Natural Language Processing (EMNLP-IJCNLP),
URL: https://doi.org/10.1007/s10579-020-09502-8. Association for Computational Linguistics, Hong Kong,
doi:10.1007/S10579-020-09502-8. China, 2019, pp. 4675–4684. URL: https://aclanthology.
[12] E. Leonardelli, S. Menini, A. P. Aprosio, M. Guerini, org/D19-1474. doi:10.18653/v1/D19-1474.
S. Tonelli, Agreeing to disagree: Annotating offensive [21] B. Jehangir, S. Radhakrishnan, R. Agarwal, A survey on
language datasets with annotators’ disagreement, in: named entity recognition - datasets, tools, and method-
Proceedings of the 2021 Conference on Empirical Meth- ologies, Natural Language Processing Journal 3 (2023).
ods in Natural Language Processing, 2021, pp. 10528– [22] E. Hovy, M. Marcus, M. Palmer, L. Ramshaw,
10539. R. Weischedel, Ontonotes: the 90% solution, in: Pro-
[13] H. Kirk, W. Yin, B. Vidgen, P. Röttger, Semeval-2023 task ceedings of the human language technology conference
10: Explainable detection of online sexism, in: Proceed- of the NAACL, Companion Volume: Short Papers, 2006,
ings of the 17th International Workshop on Semantic pp. 57–60.
Evaluation (SemEval-2023), 2023, pp. 2193–2210. [23] N. Collier, T. Ohta, Y. Tsuruoka, Y. Tateisi, J.-D. Kim,
[14] P. Piot, P. Martín-Rodilla, J. Parapar, Metahate: A dataset Introduction to the bio-entity recognition task at jnlpba,
for unifying efforts on hate speech detection, Proceed- in: N. Collier, P. Ruch, A. Nazarenko (Eds.), Proceedings
ings of the International AAAI Conference on Web of the International Joint Workshop on Natural Lan-
and Social Media 18 (2024) 2025–2039. URL: https://ojs. guage Processing in Biomedicine and its Applications
aaai.org/index.php/ICWSM/article/view/31445. doi:10. (NLPBA/BioNLP), COLING, 2004, pp. 73–78.
1609/icwsm.v18i1.31445. [24] M. ElSherief, V. Kulkarni, D. Nguyen, W. Y. Wang,
[15] D. Nozza, F. Bianchi, G. Attanasio, HATE-ITA: Hate E. Belding, Hate lingo: A target-based linguistic analysis
speech detection in Italian social media text, in: of hate speech in social media, in: Proceedings of the
K. Narang, A. Mostafazadeh Davani, L. Mathias, B. Vid- international AAAI conference on web and social media,
gen, Z. Talat (Eds.), Proceedings of the Sixth Work- volume 12, 2018.
shop on Online Abuse and Harms (WOAH), Associa- [25] F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, L. Plaza,
tion for Computational Linguistics, Seattle, Washington Automatic classification of sexism in social networks:
(Hybrid), 2022, pp. 252–260. doi:10.18653/v1/2022. An empirical study on twitter data, IEEE Access 8 (2020)
woah-1.24. 219563–219576.
[26] M. Sanguinetti, F. Poletto, C. Bosco, V. Patti, M. Stranisci, • Vulnerable Identity: “¿Qué identidad vulnerable al
An italian twitter corpus of hate speech against immi- discurso de odio se menciona en la frase?”
grants, in: Proceedings of the eleventh international
conference on language resources and evaluation (LREC For Italian:
2018), 2018. • Dangerous: “Quale parte del testo è pericolosa (crim-
[27] I. Guillén-Pacho, oeg-upm/telegram-dataset-builder: inalizza, ridicolizza, incita alla violenza, ...) nei con-
version 1.0.0, 2024. URL: https://doi.org/10.5281/zenodo. fronti di identità vulnerabili (donne, migranti, mino-
12773159. doi:10.5281/zenodo.12773159. ranze etniche, ...)?”
[28] S. Benesch, Dangerous speech, 86272 12 (2023) 185–197.
• Derogatory: “Quale parte del testo è dispregiativa
[29] F. Cabitza, A. Campagner, V. Basile, Toward a perspec-
(commenti negativi o denigratori sulla vittima senza
tivist turn in ground truthing for predictive computing,
un esplicito discorso d’odio, ma in cui la menzione
in: Proceedings of the AAAI Conference on Artificial
stessa è discriminatoria o offensiva e spesso usa un
Intelligence, volume 37, 2023, pp. 6860–6868.
tono volto a sminuire o screditare la vittima)?”
[30] T. Brants, Inter-annotator agreement for a german news-
• Entity: “Quale entità nominata è menzionata nella
paper corpus., in: LREC, Citeseer, 2000.
frase?”
[31] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang,
M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the • Vulnerable Identity: “Quale identità vulnerabile ai
limits of transfer learning with a unified text-to-text discorsi d’odio è menzionata nella frase?”
transformer, Journal of machine learning research 21
(2020) 1–67.
[32] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mo-
hamed, O. Levy, V. Stoyanov, L. Zettlemoyer, Bart:
Denoising sequence-to-sequence pre-training for nat-
ural language generation, translation, and compre-
hension, 2019. URL: https://arxiv.org/abs/1910.13461.
arXiv:1910.13461.
[33] G. Sarti, M. Nissim, IT5: Text-to-text pretraining for
Italian language understanding and generation, in:
N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti,
N. Xue (Eds.), Proceedings of the 2024 Joint Interna-
tional Conference on Computational Linguistics, Lan-
guage Resources and Evaluation (LREC-COLING 2024),
ELRA and ICCL, Torino, Italia, 2024, pp. 9422–9433. URL:
https://aclanthology.org/2024.lrec-main.823.
[34] M. La Quatra, L. Cagliero, Bart-it: An efficient sequence-
to-sequence model for italian text summarization, Fu-
ture Internet 15 (2023). URL: https://www.mdpi.com/
1999-5903/15/1/15. doi:10.3390/fi15010015.
[35] V. Araujo, M. M. Trusca, R. Tufiño, M.-F. Moens,
Sequence-to-sequence spanish pre-trained language
models, 2023. arXiv:2309.11259.
[36] V. Araujo, M. M. Trusca, R. Tufiño, M.-F. Moens,
Sequence-to-sequence spanish pre-trained language
models, 2023. arXiv:2309.11259.
A. LLMs Prompts
The prompts used are the same for each model but different
for each language. For Spanish, the prompts used for each
label are:
• Dangerous: “¿Qué parte del texto es peligroso (crimi-
naliza, ridiculiza, incita a la violencia, ...) contra iden-
tidades vulnerables (mujeres, migrantes, minorías ét-
nicas, ...)?”
• Derogatory: “¿Qué parte del texto es derogativo (co-
mentarios negativos o despectivos sobre la víctima
sin incitación explícita al odio, pero la mención en
sí es discriminatoria u ofensiva, y a menudo emplea
un tono destinado a menospreciar o desacreditar a la
víctima)?”
• Entity: “¿Qué entidad nombrada se menciona en la
frase?”