The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis Ibai Guillén-Pacho1,*,† , Arianna Longo2,3,† , Marco Antonio Stranisci2,3 , Viviana Patti2 and Carlos Badenes-Olmedo1,4 1 Ontology Engineering Group, Universidad Politécnica de Madrid, Spain 2 University of Turin, Italy 3 Aequa-tech, Torino, Italy (aequa-tech.com) 4 Computer Science Department, Universidad Poltécnica de Madrid, Spain Abstract This paper presents the Vulnerable Identities Recognition Corpus (VIRC), a novel resource designed to enhance hate speech analysis in Italian and Spanish news headlines. VIRC comprises 880 headlines, manually annotated for vulnerable identities, dangerous discourse, derogatory expressions, and entities. Our experiments reveal that recent large language models (LLMs) struggle with the fine-grained identification of these elements, underscoring the complexity of detecting hate speech. VIRC stands out as the first resource of its kind in these languages, offering a richer annotation scheme compared to existing corpora. The insights derived from VIRC can inform the development of sophisticated detection tools and the creation of policies and regulations to combat hate speech on social media, promoting a safer online environment. Future work will focus on expanding the corpus and refining annotation guidelines to further enhance its comprehensiveness and reliability. Keywords hate speech, vulnerable identities, annotated corpora 1. Introduction et al. [10], which treat the identification of HS targets as a span-based task. Hate Speech (HS) detection is a task with a high social impact. In order to fill this gap, we present the Vulnerable Identi- Developing technologies that are able to recognize these forms ties Recognition Corpus (VIRC): a dataset of 880 Italian and of discrimination is not only crucial to enforce existing laws Spanish headlines against migrants aimed at providing an but it also supports important tasks like the moderation of event-centric representation of HS against vulnerable groups. social media contents. However, recognizing HS is challeng- The annotation scheme is built on four elements: ing. Verbal discrimination takes different forms and involves a number of correlated phenomena that make difficult to reduce • Named Entity Recognition (NER). All the named HS as a binary classification. entities that are involved in a HS expression: ‘location’, Analyzing the recent history of corpora annotated for HS it ‘organization’, and ‘person’. is possible to observe the shift from very broad categorizations • Vulnerable Identity mentions. Generic mentions of hatred contents to increasingly detailed annotation schemes related to identities target of HS as they are defined by aimed at understanding the complexity of this phenomenon. the international regulatory frameworks 1 : ‘women’, High-level schemes including dimensions like “hateful/offen- ‘LGBTQI’, ‘ethnic minority’, and ‘migrant’. siveness” [1] or “sexism/racism” [2] paved the way for more • Derogatory mentions. All mentions that negatively sophisticated attempts to formalize such concepts in different portray people belonging to vulnerable groups. directions: exploring the interaction between HS and vulnera- • Dangerous speech. The part of the message that is ble targets [3, 4, 5]; studying the impact of subjectivity [6, 7]; perceived as hateful against named entities or vulner- identifying the triggers of HS in texts [8, 9]. able identities. Despite this trend, the complex semantics of HS in texts is far from being fully explored. Information Extraction (IE) In this paper we present a preliminary annotation experi- approaches to HS annotation have been rarely implemented, ment intended to validate the scheme and to assess the impact yet. Therefore, corpora that includes fine-grained structured on disagreement in such a fine-grained task. The paper is semantic representation of HS incidents are not available. The structured as follows. In Section 2, we discuss related work, only notable exception is the recent work of Büyükdemirci in Section 3, we describe the methodology used, in Section 4, we introduce the VIRC corpus, and in Section 5, we present CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, the conclusions and discuss possible future work. Dec 04 — 06, 2024, Pisa, Italy * Corresponding author. † These authors contributed equally. 2. Related Work $ ibai.guillen@upm.es (I. Guillén-Pacho); arianna.longo401@edu.unito.it (A. Longo); Literature on automatic HS detection is vast and follows differ- marcoantonio.stranisci@unito.it (M. A. Stranisci); viviana.patti@unito.it ent research directions [11]: from the analysis of subjectivity (V. Patti); carlos.badenes@upm.es (C. Badenes-Olmedo) € https://iguillenp.github.io/ (I. Guillén-Pacho); in the perception of this phenomenon [12] to the definition of https://marcostranisci.github.io/ (M. A. Stranisci); ever more refined categorizations of hateful contents [13]. In https://www.unito.it/persone/vpatti (V. Patti); https://about.me/cbadenes this section we focus on the approaches to HS detection that (C. Badenes-Olmedo) are aimed at studying the target of HS inspired by Informa-  0000-0001-7801-8815 (I. Guillén-Pacho); 0009-0005-8500-1946 tion Extraction (IE) approaches. In Section 2.1 we review HS (A. Longo); 0000-0001-9337-7250 (M. A. Stranisci); 0000-0001-5991-370X (V. Patti); 0000-0002-2753-9917 (C. Badenes-Olmedo) 1 https://www.coe.int/en/web/combating-hate-speech/ © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). recommendation-on-combating-hate-speech CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings resources inspired by this approach with a specific focus on offensive expressions, totaling 765 tweets in English and 765 span-based annotated corpora. In Section 2.2 we discuss the tweets in Turkish. implementation of NER-based techniquest in the creation of HS corpora. 2.2. Named Entity Recognition Developed as a branch of Information Extraction (IE), Named 2.1. Hate Speech Detection Entity Recognition (NER) is a field of research aimed at de- A large amount of work on HS detection focuses on clas- tecting named entities in documents according to different sification, both binary (existence or not) and multi-labeled schemes. Following the review of Jehangir et al. [21], it is (misogyny, racism, xenophobia, etc.). This has led to the exis- possible to observe general-purpose schemes, which usually tence of large collections of datasets such as those grouped by includes entities of the type ‘person’, ‘location’, ‘organiza- [14]. One of the main problems is that most resources are in tion’ and ‘time’, and schemes defined for specific applications. English, and for mid-to-low resource languages (e.g., Italian), OntoNotes [22] is an example of the first type of approach: a some HS categories are not covered. This constraint is miti- broad collection of documents gathered from different sources gated by cross-lingual transfer learning to exploit resources in (e.g., newspaper, television news) annotated with a tagset other languages [15] and, although good results are achieved, that includes general categories of named entities. On the the creation of resources for these languages is still necessary. other hand, more specific applications include biomedical The main resources for the identification of HS are par- NER, which focuses on identifying entities relevant to the ticularly focused on a target by identifying the presence or biomedical field, such as diseases, genes and chemicals. An absence of HS in them. As in the work of [16], where in 1,100 example in this field is the JNLPBA dataset[23], which is de- tweets in Italian with special target on immigrants were an- rived from the GENIA corpus. This dataset consists of 2,000 notated according to the presence of HS, irony, and the stance biomedical abstracts from the MEDLINE database, annotated of the message’s author on immigration matters. However, with detailed entity types such as proteins, DNA, RNA, cell recently, there has been an increasing focus on identifying lines and cell types. hateful expressions and their intended targets. The change in NER-based approaches for HS detection and analysis are paradigm suggests that resources should be wider in scope and still few. ElSherief et al. [24] exploited Twitter users’ mentions not focus on a particular discourse target. The main resources to distinguish between directed and generalized forms of HS. in this field have high linguistic diversity, although they do Rodríguez-Sánchez et al. [25] used derogatory expressions of not all follow the same annotation scheme, with English being women as seeds to collect misogynist messages according to the most common language. We have found works in English a fine grained classification of this phenomenon. [26] adopted [17]; Vietnamese[18]; Korean [19]; English and Turkish [10]; a similar methodology to collect tweets about 3 vulnerable and English, French, and Arabic [20]. However, we have not groups to discrimination: ethnic minorities, religious minori- found any in Italian or Spanish, which we believe makes this ties, and Roma communities. Piot et al. [14] analyzed the work the first to cover these languages for this task. correlation between the presence of HS and named entities Two main annotation approaches can be drawn from these in 60 existing datasets. Despite these previous works, there studies, those that annotate at the span level [17, 18, 19, 10] and are no attempts to define a NER-based scheme specifically those that annotate over the full text [20]. On the one hand, intended for HS detection. Our work represents an attempt the work that follows the latter approach presents a corpus of to fill this gap by combining categories from general-purpose 13.000 tweets (5.647 English, 4.014 French, and 3.353 Arabic) NER and a taxonomy of vulnerable groups to discrimination and notes the sentiment of the annotator (shock, sadness, in a common annotation scheme aimed at providing deeper disgust, etc.), hostility type (abusive, hateful, offensive, etc.), insights about the targets of HS. directness (direct or indirect), target attribute (gender, religion, disabled, etc.) and target group (individual, women, African, etc.). 3. Methodology On the other hand, works that follow the approach of span annotation design different annotation criteria. The simplest, 3.1. Data Collection [17, 18], only annotates one dimension. The first, [17], anno- We collect news from public Telegram channels with the tates the parts that make a comment toxic on a 30.000 English telegram-dataset-builder [27]. The selected channels are comments of the Civil Comments platform. The second, [18], shown in Table 1, they are in Spanish and Italian and aligned annotates only the parts that make a comment offensive or with the left and right wings of the political spectrum. The hateful in 11.000 Vietnamese comments on Facebook and subset of Italian headlines was integrated with titles published Youtube. The other papers, [19, 10], extend this approach on newspapers Facebook pages that have been collected in and also label the span in which the target of the attack is collaboration with the Italian Amnesty Task Force on HS, a mentioned. Moreover, [19] is not limited to that; they also group of activists that produce counter narratives against dis- annotate the target type (individual, group, other), the tar- criminatory contents spread by online newspapers and users get attribute (gender, race, ethnic, etc.) and the target group comments2 . We collected all the news headlines detected by (LGBTQ +, Muslims, feminists, etc.). Their final corpus has activists in March 2020, 2021, 2022, and 2023, and added them 20.130 annotated offensive Korean-language news and video to our corpus. comments. Given the large amount of news collected, we applied filters However, the guidelines used by the different works some- to the dataset to reduce it to its final size. We focus on news times present incompatibilities. Although some works use about racism; for this purpose, we applied the classifier piuba- offensive and hateful labels in the same way [19, 18], others bigdata/beto-contextualized-hate-speech to stick to news items distinguish between these two types of expression [10]. This labeled as racism. Since this classifier is trained on Spanish resource, the last one, has separately annotated hateful and 2 https://www.amnesty.it/entra-in-azione/task-force-attivismo/ Migranti, un esercito di scrocconi: 120mila mantenuti con l’8 per mille degli italiani.3 Hordas de gitanos arrasan Mercadona después de que les ingresen 3000 euros en sus ‘tarjetas solidarias’.4 Questa è Villa Aldini, la residenza di lusso che ospita i migranti stupratori a Bologna.5 Vulnerable identity - Migrants Derogatory Entity - Location Vulnerable identity - Ethnic minority Dangerous speech Entity - Organization Figure 1: Examples of annotated headlines Left-wing Right-wing provide details about the entity in a free-text field. Spanish elpais_esp, smolny7 MediterraneoDGT, The final layers of the annotation scheme address the con- elmundoes text in which these entities are mentioned, specifically fo- Italian ByobluOfficial, terzaroma, mar- cusing on identifying derogatory mentions and dangerous sadefenza cellopamio, ilpri- speech. matonazionaleIPN, A derogatory mention is characterized by negative or dis- VoxNewsInfo paraging remarks about the target. In these instances, explicit hate speech is absent, but the mention itself is discriminatory Table 1 Telegram channels from which the news have been extracted. or offensive, often employing a tone intended to belittle or discredit the target. The label derogatory is used to mark these mentions. Moreover, the annotation includes identifying dangerous el- texts, prior to this step we automatically translated Italian ements: portions of text that, intentionally or unintentionally, news with the model facebook/nllb-200-distilled-600M. This could incite hate speech or increase the vulnerability of the translation step is used only for the filtering process; once the target identity. Dangerous speech, which can be either explicit news is selected, the translated text is no longer used. In the or implicit, promotes or perpetuates negative prejudices and end, this process generates 532 news headlines classified as stereotypes, potentially triggering harmful responses against racist for Italian and 348 for Spanish, that have been selected the group. The label dangerous [28] is used to tag these for the annotation task. segments. Annotators were encouraged to use free-text fields to provide details on implicit dangerous speech or recurring 3.2. Data Annotation dangerous concepts. The annotation guidelines provided annotators with spe- A comprehensive, span-based annotation scheme was devel- cific criteria and with the following list of potential markers oped to label vulnerable identities and entities present in the of dangerous speech to help their identification: dataset. Annotators were provided with instructions and had to choose a label and highlight the word, phrase, or portion • Incitement to violence: the text explicitly encour- of text that best embodied the qualities of the chosen label ages violence against the target group; in the text. It was possible to choose more than one label • Open discrimination: the text openly states or sup- for the same portion of text. The instructions also provided ports discrimination against the target group; annotators with some examples of annotated headlines. • Ridicule: the text ridicules the target in the eyes of The initial layer of annotation focuses on identifying vul- the readers by belittling it or mocking it; nerable targets within the text and categorizing them into • Stereotyping: the text perpetuates negative stereo- one of six predefined labels: ethnic minority, migrant, reli- types about the target group, contributing to a dis- gious minority, women, LGBTQ+ community, and other. torted view of it; These labels represent vulnerable groups, as the vulnerability • Disinformation: the text spreads false or misleading of the targets can often be traced back to their belonging to information that can harm the target group; certain categories of people which are particularly exposed to discrimination, marginalisation, or prejudice in society. In • Dehumanization: the text dehumanizes the target cases where the targeted group didn’t fit into one of the pre- group, using language that equates it with objects or defined labels, annotators were required to use the ‘other’ animals; category. Then, for instances labeled as ‘other’, annotators • Criminalization: the text portrays the target group were instructed to provide specific details regarding the group as inherently criminal or associates it with illegal ac- in a free-text field. tivities, contributing to the perception that the group After categorizing vulnerable targets, the second layer in- as a whole is dangerous. volves annotating named entities. Annotators identify entities However, a text may still be considered dangerous even if it within the text and label them with one of five possible types: does not explicitly include these markers, as they are intended person, group, organization, location, and other. As in as examples rather than strict requirements. the first layer, instances labelled ‘other’ require annotators to Figure 1 provides three examples of annotated headlines, two in Italian and one in Spanish, showing the application 2 “Migrants, an army of scroungers: 120,000 supported by the Italians’ of the annotation scheme as described. In the figure, dif- 8x1000 tax allocation”. ferent colours highlight the various types of labels used. A 3 “Hordes of gypsies devastate Mercadona after 3000 euros were deposited in their solidarity cards”. vulnerable identity was detected in each headline: ‘Migranti’ 4 “This is Villa Aldini, the luxury residence that hosts rapist migrants in in the first and in the third one and ‘gitanos’ in the second Bologna”. one, respectively labelled as ‘vulnerable group - migrant’ and ‘vulnerable group - ethnic minority’. The three examples all this would be a full agreement (1 true positive). However, contain multiple elements of dangerous speech, highlighted in if the latter selected “women of Italy”, it would be a partial red, and the second text also contains an element which was agreement (0.5 true positive). marked with the derogatory label. Additionally, the second and the third headlines include examples of annotation for Quantitative Analysis. The agreement on the annotation named entities, with ‘Mercadona’ labelled as ‘entity - organi- of entities is always moderate but differs between the Span- zation’, and ‘Villa Aldini’ and ‘Bologna’ labelled as ‘entity - ish and the Italian subsets. Annotators of Spanish headlines location’. scored a higher agreement on ‘location’ (0.66 vs 0.60), ‘vul- nerable’ (0.15 vs 0) and ‘organization’ (0.41 vs 0.12) while entities of the type ‘person’ (0.63 vs 0.47) and ‘other’ (0.1 vs 4. The VIRC Corpus 0) are better recognized in Italian headlines. The VIRC corpus is a collection of 532 Italian and 348 Spanish On average, the annotation of vulnerable identities resulted news headlines annotated by 2 independent annotators for in a higher agreement between annotators in both subsets each language. Following the perspectivist paradigm [29], and at the same time confirmed an higher agreement of Span- we both released the disaggregated annotations and the gold- ish annotations that always outperforms Italian ones. The standard corpus. The code used to generate the gold standard highest agreement emerges for the label ‘migrant’ on which corpus, carry out experiments, and compile statistics can be annotators obtained an F-score of 0.86 for Italian and 0.96 for accessed through the following GitHub repository6 . In this Spanish. The agreement on ‘ethnic minority’ is a bit lower but Section we present an analysis of disagreement (Section 4.1) still significant, while Spanish headlines reached an F-score of and relevant statistics about the corpus (Section 4.2). 0.83 Italian ones only 0.63. An equally high agreement is on the ‘lgbtq+’ label, which is only present in Italian headlines with an F-score of 0.8. Among vulnerable groups, women 4.1. Inter-Annotator Agreement scored the lowest F-score: 0.6 for Spanish, 0.22 for Italian. Since the span-based annotation task does not provide a fixed The largest observed discrepancy is with religious minorities, number of annotated items, we adopted the F-score metric to in Spanish an F-score of 1 is achieved while in Italian 0. evaluate the agreement between annotators [30]. For each sub- While the annotation of ‘dangerous’ spans achieves an ac- set of the corpus we randomly chose one annotator as the gold ceptable agreement, the ‘derogatory’ annotation is character- standard set of labels and the other as the set of predictions. ized as the one that achieves the lowest agreement between We then computed the F-score between the two distributions annotators. Additionally, annotations of Italian headlines re- of labels in order to measure the agreement between the an- sulted in higher disagreement than Spanish ones, contrary notators. Table 2 shows the results of our analysis. In general, to what we observed about ‘entities’ and ‘vulnerable identi- annotations always showed a fair or higher agreement, ex- ties’. Text spans expressing dangerous speech are recognized cept for some entity-related labels and the “derogatory” one. with an agreement of 0.57 for Italian and 0.49 for Spanish There is also a low agreement in the Italian set on the labels headlines. Agreement about ‘derogatory’ is low for Italian “religious minority” and “women”. headlines (0.28) while Spanish ones show almost no agree- ment (0.08) IAA (F-score) Spanish Italian Qualitative Analysis. In summary, while the overall re- dangerous 0.49 0.57 sults of the annotation are positive, some categories show derogatory 0.08 0.28 significant disagreement between annotators. These disagree- entity - group 0.0 0.00 entity - location 0.66 0.60 ments highlight the need to review and refine the annotation entity - organization 0.41 0.12 guidelines for problematic categories, and to provide more entity - other 0.0 0.10 detailed instructions. The importance of reassessing the guide- entity - person 0.47 0.63 lines in order to make them clearer and more consistent is vulnerable entity 0.15 0.00 further underscored by the fact that, for Spanish headlines, vulnerable group - ethnic minority 0.83 0.63 the annotators agreed on both labels and intervals in only 67 vulnerable group - lgbtq+ community - 0.80 cases, and for Italian headlines, agreement was reached in just vulnerable group - migrant 0.96 0.86 88 cases. vulnerable group - other 0.46 0.41 Since the annotation task was span-based, we opted not vulnerable group - religious minority 1.0 0.00 to use a confusion matrix to analyze the disagreement. A vulnerable group - women 0.6 0.22 confusion matrix is not appropriate for span detection, as it Table 2 assumes discrete labels applied to predefined items, whereas The annotators agreement measured through the F-score and bro- our task involved labeling spans of text that varied in length ken down by label. and context. Instead, we performed a qualitative analysis, examining specific cases of disagreement to understand their Although the overall results are positive, they show signif- nature. This approach allowed us to explore not only how icant variations that can be quantitatively and qualitatively. annotators differed in labeling spans but also why these differ- Inclusion of overlapping spans was handled as follows: if ences emerged, providing a deeper insight into the underlying one span fully included another, this was considered to be an issues of interpretation and guidelines. agreement. In cases where the spans only partially overlapped, Looking more closely at the headlines where the annota- meaning there was some shared text but not full inclusion, this tions present inconsistencies, a variety of motivations behind was treated as a partial agreement. For example, if one anno- discrepancies can be identified. tator labeled “All women” and another selected only “women”, For instance, in the Italian title “Orrore nella casa occu- 6 https://github.com/oeg-upm/virc pata dagli immigrati: donna lanciata giù dal secondo piano”7 , Spanish Italian dangerous 136 166 ‘donna’ was marked as a vulnerable identity by only one of derogatory 3 16 the annotators, suggesting maybe an erroneous focus on an entities 140 146 individual target at a time (‘immigrati’) by the other annotator. vulnerable groups 270 253 Another type of disagreement relates to the interpretation of derogatory mentions. An example can be found in “Un Table 3 terzo dei reati sono commessi da stranieri (e gli africani hanno The distribution of labels in the gold standard corpus. il record). Tutti i numeri”8 , where one annotator identified the term ‘stranieri’ as a derogatory mention, as well as represen- tative of a vulnerable identity, while another annotator simply 4.2. Dataset Analysis stuck to the second label, perhaps highlighting a divergence In this section we provide an analysis of the four label types in the interpretation of the guidelines. Furthermore, it is inter- that occur in the gold standard version of the VIRC corpus: esting to observe the disagreement created by the headlines ‘derogatory’, ‘dangerous’, ‘named entities’, ‘vulnerable groups’. that use generic term ‘stranieri’ (‘foreigners’), which was of- The analysis is twofold: first, we describe the distribution of ten labelled as ‘vulnerable identity - ethnic minority’ by one these label types, then we present a zero-shot and a few- annotator and as ‘vulnerable identity - migrant’ by the other. shot experiment aimed at understanding if existing LLMs This inconsistence between annotators can be identified in (T5[31] and BART[32]) are able to recognize these labeled two headlines: “Ius soli e cittadinanza facile agli stranieri? Il spans in news headlines by comparing their outputs to the sangue non è acqua”9 and “Un terzo dei reati sono commessi gold standard annotations. da stranieri (e gli africani hanno il record). Tutti i numeri”2 . In the first case, we can solve the disagreement by looking at the Corpus statistics. Table 3 shows the distribution of label context: the explicit reference to the issue of granting citizen- types in the corpus. As it can be observed, mentions of vulner- ship suggests that the term ‘foreigners’ is more appropriately able groups are the most present, with 270 occurrences in the referred to the specific category of migrants. On the other Spanish subset and 253 in the Italian subset. This confirms hand, in the second headline, there is no direct reference to the relevance of annotating vulnerable in the identification specifically migration-related issues and thus both interpre- of discriminatory contents, which is tied to their high rec- tations in terms of the vulnerable category of belonging are ognizability by annotators (Section 4.1). The role on named acceptable. entities differs in the two subsets. Annotators labeled them Finally, some texts present a slight difference in the anno- with agreement 130 times in Spanish headlines and 67 times tation spans of choice, as observed in “Più di 200mila case in Italian ones. This might be caused by their compositions. popolari agli immigrati”10 , where the annotators identified Since Italian headlines were partly collected from Facebook dangerous speech in the same section of text, but with dif- pages of mainstream newspapers, there was a higher num- ferences in the number of highlighted words (first annotator ber of named entities that were not relevant for the analysis labelled ‘Più di 200mila’; second annotator labelled ‘200mila of headlines’ danger. The number of text spans labeled as case popolari’), reflecting variations in the identification of dangerous is almost equivalent in the two subsets (136 for relevant content for the analysis of dangerous speech. Spanish, 166 for Italian), showing a good presence of this In addition to the predefined labels, we also collected free- label type despite the high disagreement between annotators. text fields as part of the annotation process. These comments Finally, it is worth mentioning the almost total absence of text offered an additional layer of granularity, allowing annota- spans labeled as ‘derogatory’ with agreement (3 for Spanish, tors to describe nuances not covered by the fixed categories. 16 for Italian) that suggests the high subjectivity of this phe- For example, in the Spanish headline “Dos menas marroquíes nomenon and also the need of better define its characteristics apuñalan a dos turistas para robarles en Salou”11 , both an- in annotation guidelines. notators used the two labels ‘vulnerable identity - ethnic mi- nority’ and ‘vulnerable identity - other’ to annotate the span ‘menas marroquíes’. Alongside the ‘other’ label, one annotator Corpus analysis with LLMs. We completed our analysis provided the comment ‘Under 18’, while the other one used of the VIRC corpus through zero-shot experiments aimed at ‘young people’ to describe the vulnerable group. Although exploring the ability of existing LLMs to identify the four stated differently, both comments highlight the specific vul- types of labelled spans in messages. We considered the de- nerability related to the age of the group, complementing the tection of spans as an extractive Question Answering (QA) existing labels. As this example shows, the flexibility in the problem. For the task we adopted the T5[31] and BART[32] annotation process provided by free-text fields is useful to LLMs architectures for both languages. For Italian we employ capture multi-categorical terms and to identify potential new [33] and [34] and for Spanish [35] and [36] models, respec- categories that may not have been initially considered in the tively. The translations of the prompts used are the following predefined labels. (see Appendix A for the original ones): • What part of the text is dangerous (criminalizes, ridicules, incites violence, ...) against vulnerable iden- tities (women, migrants, ethnic minorities, ...)? 7 • What part of the text is derogatory (negative or pe- “Atrocity in a house occupied by migrants: woman thrown from second jorative comments about the victim without explicit floor”. 8 “One third of all crimes are committed by foreigners (and Africans hold hate speech, but the mention itself is discriminatory or the record). All the numbers”. offensive, and often uses a tone intended to denigrate 9 “Ius soli and easy citizenships for foreigners? Blood is not water”. or discredit the victim)? 10 “More than 200,000 public housing units for immigrants”. 11 “Two Moroccan unaccompanied migrant minors stab two tourists to rob • What named entity is mentioned in the sentence? them in Salou”. Non-Restictive Zero-Shot Restictive Zero-Shot T5 BART T5 BART Spanish Italian Spanish Italian Spanish Italian Spanish Italian dangerous 0.39 0.28 0.43 0.39 0.49 0.47 0.51 0.43 derogatory 0.02 0.05 0.03 0.04 0.67 0.43 0.50 0.33 entity 0.28 0.11 0.23 0.23 0.40 0.30 0.30 0.27 vulnerable identity 0.63 0.19 0.41 0.48 0.56 0.18 0.35 0.37 Table 4 F-score results of zero-shot experiments on the VIRC corpus with T5 and BART models for each label. • Which hate speech vulnerable identity is mentioned VIRC provides a detailed and structured resource that en- in the sentence? hances understanding of the extensive use of hate speech in Italian and Spanish news headlines. The corpus is particularly We designed two approaches for zero-shot experiments, valuable as it includes more annotation dimensions compared restictive and non-restrictive. On the one hand, for the non- to related studies in other languages, such as vulnerable identi- restictive zero-shot experiments, for each sentence in the ties, dangerous discourse, derogatory expressions, and entities. dataset, we queried the model with the prompt of each label This differentiation between vulnerable identities and enti- and extracted the three most confident results. Then, we ties, as well as between dangerous and derogatory elements, filtered out those responses below the %0.02 confidence of enables the development of sophisticated detection tools that the model to limit the noise. Finally, all these annotations go can facilitate large-scale actions to mitigate the impact of through a majority vote (identical to the one used to build the hate speech (e.g., moderation of messages and generation aggregate dataset) to normalize the model response. of counter-narratives that reduce the damage to the mental On the other hand, for the restictive zero-shot experi- health of victims). ments, we queried the model with the prompts for each an- Future work will focus on expanding this resource by dou- notation present in the aggregated dataset. And, as there are bling the size of annotations for both languages and including sentences that have two equal labels in different spans, we non-racism-related phrases to ensure the resource is com- request five different annotations from the model, ordered prehensive. Additionally, we plan to refine the annotation from most confident to least confident. If an annotation was guidelines to avoid low agreement on the derogatory label, en- already included, the next annotation is taken in order to hancing the overall reliability and utility of the corpus. These avoid duplicating annotations in the model. efforts will further improve the effectiveness of hate speech Table 4 presents the F-scores for each label type, experi- detection and contribute to the development of policies and ment, and model. In general, T5 and BART tend to perform tools for a safer online environment. more effectively in Spanish compared to Italian. The models face noticeable challenges in identifying the labels ‘danger- ous’, ‘derogatory’, and ‘entity’. Nevertheless, when they are Acknowledgments aware that the label exists within the sentence (restictive), they manage to recognize it with fairly good agreement. Dur- This work is supported by the Predoctoral Grant (PIPF- ing annotation, the label ‘derogatory’ proves most challenging 2022/COM-25947) of the Consejería de Educación, Ciencia to identify. In the non-restrictive scenario, it scarcely receives y Universidades de la Comunidad de Madrid, Spain. Arianna any agreement, yet in the restictive scenario, it achieves a Longo’s work has been supported by aequa-tech. The au- reasonable level, particularly in Spanish. This indicates that thors gratefully acknowledge the Universidad Politécnica de the model struggles to discern its presence initially but, once Madrid (www.upm.es) for providing computing resources on acknowledged, can recognise the expression. the IPTC-AI innovation Space AI Supercomputing Cluster. The restictive method enhances performance over the non- restictive method for all labels except ‘vulnerable identity.’ This shows that models generally have a better comprehension References and identification of vulnerable identities in sentences without [1] T. Davidson, D. Warmsley, M. Macy, I. Weber, Auto- restrictions compared to when they are restricted to specific mated hate speech detection and the problem of offen- mentions. It should also be noted that, in the Spanish context, sive language, in: Proceedings of the international AAAI T5 is more effective than BART in identifying ‘vulnerable conference on web and social media, volume 11, 2017, identity’ labels for both approaches, while BART performs pp. 512–515. better in Italian. [2] Z. Waseem, Are you a racist or am i seeing things? These results show that a NER-based annotation scheme annotator influence on hate speech detection on twit- for HS detection is difficult to annotated but also to be auto- ter, in: Proceedings of the first workshop on NLP and matically detected. Larger resources are necessary to develop computational social science, 2016, pp. 138–142. models that are able to detect the complex semantics of HS. [3] M. ElSherief, C. Ziems, D. Muchlinski, V. Anupindi, J. Seybolt, M. De Choudhury, D. Yang, Latent hatred: A 5. Conclusions and Future Work benchmark for understanding implicit hate speech, in: Proceedings of the 2021 Conference on Empirical Meth- The Vulnerable Identities Recognition Corpus (VIRC), created ods in Natural Language Processing, 2021, pp. 345–363. in this work, reveals the challenge of identifying vulnerable [4] B. Vidgen, D. Nguyen, H. Margetts, P. Rossini, identities due to the rapid evolution of language on social R. Tromble, Introducing cad: the contextual abuse media. Our experiments indicate that large language models dataset, in: Proceedings of the 2021 Conference of the (LLMs) struggle significantly with this task. North American Chapter of the Association for Com- [16] M. Madeddu, S. Frenda, M. Lai, V. Patti, V. Basile, Dis- putational Linguistics: Human Language Technologies, aggreghate it corpus: A disaggregated italian dataset of 2021, pp. 2289–2303. hate speech, in: F. Boschetti, G. E. Lebani, B. Magnini, [5] P. Chiril, E. W. Pamungkas, F. Benamara, V. Moriceau, N. Novielli (Eds.), Proceedings of the Ninth Italian Con- V. Patti, Emotionally informed hate speech detection: A ference on Computational Linguistics (CLiC-it 2023)., multi-target perspective, Cogn. Comput. 14 (2022) 322– volume 3596, 2023. 352. URL: https://doi.org/10.1007/s12559-021-09862-5. [17] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopou- doi:10.1007/S12559-021-09862-5. los, SemEval-2021 task 5: Toxic spans detection, in: [6] M. Sap, D. Card, S. Gabriel, Y. Choi, N. A. Smith, The A. Palmer, N. Schneider, N. Schluter, G. Emerson, A. Her- risk of racial bias in hate speech detection, in: Proceed- belot, X. Zhu (Eds.), Proceedings of the 15th Inter- ings of the 57th annual meeting of the association for national Workshop on Semantic Evaluation (SemEval- computational linguistics, 2019, pp. 1668–1678. 2021), Association for Computational Linguistics, On- [7] P. Sachdeva, R. Barreto, G. Bacon, A. Sahn, C. Von Va- line, 2021, pp. 59–69. URL: https://aclanthology.org/2021. cano, C. Kennedy, The measuring hate speech corpus: semeval-1.6. doi:10.18653/v1/2021.semeval-1.6. Leveraging rasch measurement theory for data perspec- [18] P. G. Hoang, C. D. Luu, K. Q. Tran, K. V. Nguyen, tivism, in: Proceedings of the 1st Workshop on Perspec- N. L.-T. Nguyen, ViHOS: Hate speech spans detec- tivist Approaches to NLP@ LREC2022, 2022, pp. 83–94. tion for Vietnamese, in: A. Vlachos, I. Augenstein [8] B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, (Eds.), Proceedings of the 17th Conference of the Eu- A. Mukherjee, Hatexplain: A benchmark dataset for ropean Chapter of the Association for Computational explainable hate speech detection, in: Proceedings of Linguistics, Association for Computational Linguistics, the AAAI conference on artificial intelligence, volume 35, Dubrovnik, Croatia, 2023, pp. 652–669. URL: https: 2021, pp. 14867–14875. //aclanthology.org/2023.eacl-main.47. doi:10.18653/ [9] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopou- v1/2023.eacl-main.47. los, Semeval-2021 task 5: Toxic spans detection, in: [19] Y. Jeong, J. Oh, J. Lee, J. Ahn, J. Moon, S. Park, Proceedings of the 15th international workshop on se- A. Oh, KOLD: Korean offensive language dataset, mantic evaluation (SemEval-2021), 2021, pp. 59–69. in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.), Pro- [10] K. Büyükdemirci, I. E. Kucukkaya, E. Ölmez, C. Toraman, ceedings of the 2022 Conference on Empirical Meth- JL-Hate: An Annotated Dataset for Joint Learning of ods in Natural Language Processing, Association Hate Speech and Target Detection, in: N. Calzolari, M.-Y. for Computational Linguistics, Abu Dhabi, United Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceed- Arab Emirates, 2022, pp. 10818–10833. URL: https:// ings of the 2024 Joint International Conference on Com- aclanthology.org/2022.emnlp-main.744. doi:10.18653/ putational Linguistics, Language Resources and Eval- v1/2022.emnlp-main.744. uation (LREC-COLING 2024), ELRA and ICCL, Torino, [20] N. Ousidhoum, Z. Lin, H. Zhang, Y. Song, D.-Y. Yeung, Italia, 2024, pp. 9543–9553. Multilingual and multi-aspect hate speech analysis, in: [11] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the V. Patti, Resources and benchmark corpora 2019 Conference on Empirical Methods in Natural Lan- for hate speech detection: a systematic review, guage Processing and the 9th International Joint Confer- Lang. Resour. Evaluation 55 (2021) 477–523. ence on Natural Language Processing (EMNLP-IJCNLP), URL: https://doi.org/10.1007/s10579-020-09502-8. Association for Computational Linguistics, Hong Kong, doi:10.1007/S10579-020-09502-8. China, 2019, pp. 4675–4684. URL: https://aclanthology. [12] E. Leonardelli, S. Menini, A. P. Aprosio, M. Guerini, org/D19-1474. doi:10.18653/v1/D19-1474. S. Tonelli, Agreeing to disagree: Annotating offensive [21] B. Jehangir, S. Radhakrishnan, R. Agarwal, A survey on language datasets with annotators’ disagreement, in: named entity recognition - datasets, tools, and method- Proceedings of the 2021 Conference on Empirical Meth- ologies, Natural Language Processing Journal 3 (2023). ods in Natural Language Processing, 2021, pp. 10528– [22] E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, 10539. R. Weischedel, Ontonotes: the 90% solution, in: Pro- [13] H. Kirk, W. Yin, B. Vidgen, P. Röttger, Semeval-2023 task ceedings of the human language technology conference 10: Explainable detection of online sexism, in: Proceed- of the NAACL, Companion Volume: Short Papers, 2006, ings of the 17th International Workshop on Semantic pp. 57–60. Evaluation (SemEval-2023), 2023, pp. 2193–2210. [23] N. Collier, T. Ohta, Y. Tsuruoka, Y. Tateisi, J.-D. Kim, [14] P. Piot, P. Martín-Rodilla, J. Parapar, Metahate: A dataset Introduction to the bio-entity recognition task at jnlpba, for unifying efforts on hate speech detection, Proceed- in: N. Collier, P. Ruch, A. Nazarenko (Eds.), Proceedings ings of the International AAAI Conference on Web of the International Joint Workshop on Natural Lan- and Social Media 18 (2024) 2025–2039. URL: https://ojs. guage Processing in Biomedicine and its Applications aaai.org/index.php/ICWSM/article/view/31445. doi:10. (NLPBA/BioNLP), COLING, 2004, pp. 73–78. 1609/icwsm.v18i1.31445. [24] M. ElSherief, V. Kulkarni, D. Nguyen, W. Y. Wang, [15] D. Nozza, F. Bianchi, G. Attanasio, HATE-ITA: Hate E. Belding, Hate lingo: A target-based linguistic analysis speech detection in Italian social media text, in: of hate speech in social media, in: Proceedings of the K. Narang, A. Mostafazadeh Davani, L. Mathias, B. Vid- international AAAI conference on web and social media, gen, Z. Talat (Eds.), Proceedings of the Sixth Work- volume 12, 2018. shop on Online Abuse and Harms (WOAH), Associa- [25] F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, L. Plaza, tion for Computational Linguistics, Seattle, Washington Automatic classification of sexism in social networks: (Hybrid), 2022, pp. 252–260. doi:10.18653/v1/2022. An empirical study on twitter data, IEEE Access 8 (2020) woah-1.24. 219563–219576. [26] M. Sanguinetti, F. Poletto, C. Bosco, V. Patti, M. Stranisci, • Vulnerable Identity: “¿Qué identidad vulnerable al An italian twitter corpus of hate speech against immi- discurso de odio se menciona en la frase?” grants, in: Proceedings of the eleventh international conference on language resources and evaluation (LREC For Italian: 2018), 2018. • Dangerous: “Quale parte del testo è pericolosa (crim- [27] I. Guillén-Pacho, oeg-upm/telegram-dataset-builder: inalizza, ridicolizza, incita alla violenza, ...) nei con- version 1.0.0, 2024. URL: https://doi.org/10.5281/zenodo. fronti di identità vulnerabili (donne, migranti, mino- 12773159. doi:10.5281/zenodo.12773159. ranze etniche, ...)?” [28] S. Benesch, Dangerous speech, 86272 12 (2023) 185–197. • Derogatory: “Quale parte del testo è dispregiativa [29] F. Cabitza, A. Campagner, V. Basile, Toward a perspec- (commenti negativi o denigratori sulla vittima senza tivist turn in ground truthing for predictive computing, un esplicito discorso d’odio, ma in cui la menzione in: Proceedings of the AAAI Conference on Artificial stessa è discriminatoria o offensiva e spesso usa un Intelligence, volume 37, 2023, pp. 6860–6868. tono volto a sminuire o screditare la vittima)?” [30] T. Brants, Inter-annotator agreement for a german news- • Entity: “Quale entità nominata è menzionata nella paper corpus., in: LREC, Citeseer, 2000. frase?” [31] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the • Vulnerable Identity: “Quale identità vulnerabile ai limits of transfer learning with a unified text-to-text discorsi d’odio è menzionata nella frase?” transformer, Journal of machine learning research 21 (2020) 1–67. [32] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mo- hamed, O. Levy, V. Stoyanov, L. Zettlemoyer, Bart: Denoising sequence-to-sequence pre-training for nat- ural language generation, translation, and compre- hension, 2019. URL: https://arxiv.org/abs/1910.13461. arXiv:1910.13461. [33] G. Sarti, M. Nissim, IT5: Text-to-text pretraining for Italian language understanding and generation, in: N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceedings of the 2024 Joint Interna- tional Conference on Computational Linguistics, Lan- guage Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL, Torino, Italia, 2024, pp. 9422–9433. URL: https://aclanthology.org/2024.lrec-main.823. [34] M. La Quatra, L. Cagliero, Bart-it: An efficient sequence- to-sequence model for italian text summarization, Fu- ture Internet 15 (2023). URL: https://www.mdpi.com/ 1999-5903/15/1/15. doi:10.3390/fi15010015. [35] V. Araujo, M. M. Trusca, R. Tufiño, M.-F. Moens, Sequence-to-sequence spanish pre-trained language models, 2023. arXiv:2309.11259. [36] V. Araujo, M. M. Trusca, R. Tufiño, M.-F. Moens, Sequence-to-sequence spanish pre-trained language models, 2023. arXiv:2309.11259. A. LLMs Prompts The prompts used are the same for each model but different for each language. For Spanish, the prompts used for each label are: • Dangerous: “¿Qué parte del texto es peligroso (crimi- naliza, ridiculiza, incita a la violencia, ...) contra iden- tidades vulnerables (mujeres, migrantes, minorías ét- nicas, ...)?” • Derogatory: “¿Qué parte del texto es derogativo (co- mentarios negativos o despectivos sobre la víctima sin incitación explícita al odio, pero la mención en sí es discriminatoria u ofensiva, y a menudo emplea un tono destinado a menospreciar o desacreditar a la víctima)?” • Entity: “¿Qué entidad nombrada se menciona en la frase?”