1. Introduction

The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis

Ibai Guillén-Pacho

Arianna Longo

0 3

Marco Antonio Stranisci

0 3

Viviana Patti

Carlos Badenes-Olmedo

1 2 0 Aequa-tech , Torino, Italy, aequa-tech.com 1 Computer Science Department, Universidad Poltécnica de Madrid , Spain 2 Ontology Engineering Group, Universidad Politécnica de Madrid , Spain 3 University of Turin , Italy

This paper presents the Vulnerable Identities Recognition Corpus (VIRC), a novel resource designed to enhance hate speech analysis in Italian and Spanish news headlines. VIRC comprises 880 headlines, manually annotated for vulnerable identities, dangerous discourse, derogatory expressions, and entities. Our experiments reveal that recent large language models (LLMs) struggle with the fine-grained identification of these elements, underscoring the complexity of detecting hate speech. VIRC stands out as the first resource of its kind in these languages, ofering a richer annotation scheme compared to existing corpora. The insights derived from VIRC can inform the development of sophisticated detection tools and the creation of policies and regulations to combat hate speech on social media, promoting a safer online environment. Future work will focus on expanding the corpus and refining annotation guidelines to further enhance its comprehensiveness and reliability.

eol>hate speech vulnerable identities annotated corpora

1. Introduction

et al. [10], which treat the identification of HS targets as a span-based task.

Hate Speech (HS) detection is a task with a high social impact. In order to fill this gap, we present the Vulnerable IdentiDeveloping technologies that are able to recognize these forms ties Recognition Corpus (VIRC): a dataset of 880 Italian and of discrimination is not only crucial to enforce existing laws Spanish headlines against migrants aimed at providing an but it also supports important tasks like the moderation of event-centric representation of HS against vulnerable groups. social media contents. However, recognizing HS is challeng- The annotation scheme is built on four elements: ing. Verbal discrimination takes diferent forms and involves a number of correlated phenomena that make dificult to reduce • Named Entity Recognition (NER). All the named HS as a binary classification. entities that are involved in a HS expression: ‘location’,

Analyzing the recent history of corpora annotated for HS it ‘organization’, and ‘person’. is possible to observe the shift from very broad categorizations • Vulnerable Identity mentions. Generic mentions of hatred contents to increasingly detailed annotation schemes related to identities target of HS as they are defined by aimed at understanding the complexity of this phenomenon. the international regulatory frameworks 1: ‘women’, High-level schemes including dimensions like “hateful/ofen- ‘LGBTQI’, ‘ethnic minority’, and ‘migrant’. siveness” [ 1 ] or “sexism/racism” [ 2 ] paved the way for more • Derogatory mentions. All mentions that negatively sophisticated attempts to formalize such concepts in diferent portray people belonging to vulnerable groups. directions: exploring the interaction between HS and vulnera- • Dangerous speech. The part of the message that is ble targets [ 3, 4, 5 ]; studying the impact of subjectivity [6, 7]; perceived as hateful against named entities or vulneridentifying the triggers of HS in texts [8, 9]. able identities.

Despite this trend, the complex semantics of HS in texts is far from being fully explored. Information Extraction (IE) approaches to HS annotation have been rarely implemented, yet. Therefore, corpora that includes fine-grained structured semantic representation of HS incidents are not available. The only notable exception is the recent work of Büyükdemirci

In this paper we present a preliminary annotation experiment intended to validate the scheme and to assess the impact on disagreement in such a fine-grained task. The paper is structured as follows. In Section 2, we discuss related work, in Section 3, we describe the methodology used, in Section 4, we introduce the VIRC corpus, and in Section 5, we present the conclusions and discuss possible future work. resources inspired by this approach with a specific focus on span-based annotated corpora. In Section 2.2 we discuss the implementation of NER-based techniquest in the creation of HS corpora. ofensive expressions, totaling 765 tweets in English and 765 tweets in Turkish.

2.2. Named Entity Recognition

2.1. Hate Speech Detection Developed as a branch of Information Extraction (IE), Named Entity Recognition (NER) is a field of research aimed at deA large amount of work on HS detection focuses on clas- tecting named entities in documents according to diferent sification, both binary (existence or not) and multi-labeled schemes. Following the review of Jehangir et al. [21], it is (misogyny, racism, xenophobia, etc.). This has led to the exis- possible to observe general-purpose schemes, which usually tence of large collections of datasets such as those grouped by includes entities of the type ‘person’, ‘location’, ‘organiza[14]. One of the main problems is that most resources are in tion’ and ‘time’, and schemes defined for specific applications. English, and for mid-to-low resource languages (e.g., Italian), OntoNotes [22] is an example of the first type of approach: a some HS categories are not covered. This constraint is miti- broad collection of documents gathered from diferent sources gated by cross-lingual transfer learning to exploit resources in (e.g., newspaper, television news) annotated with a tagset other languages [15] and, although good results are achieved, that includes general categories of named entities. On the the creation of resources for these languages is still necessary. other hand, more specific applications include biomedical

The main resources for the identification of HS are par- NER, which focuses on identifying entities relevant to the ticularly focused on a target by identifying the presence or biomedical field, such as diseases, genes and chemicals. An absence of HS in them. As in the work of [16], where in 1,100 example in this field is the JNLPBA dataset[ 23], which is detweets in Italian with special target on immigrants were an- rived from the GENIA corpus. This dataset consists of 2,000 notated according to the presence of HS, irony, and the stance biomedical abstracts from the MEDLINE database, annotated of the message’s author on immigration matters. However, with detailed entity types such as proteins, DNA, RNA, cell recently, there has been an increasing focus on identifying lines and cell types. hateful expressions and their intended targets. The change in NER-based approaches for HS detection and analysis are paradigm suggests that resources should be wider in scope and still few. ElSherief et al. [24] exploited Twitter users’ mentions not focus on a particular discourse target. The main resources to distinguish between directed and generalized forms of HS. in this field have high linguistic diversity, although they do Rodríguez-Sánchez et al. [25] used derogatory expressions of not all follow the same annotation scheme, with English being women as seeds to collect misogynist messages according to the most common language. We have found works in English a fine grained classification of this phenomenon. [ 26 ] adopted [17]; Vietnamese[18]; Korean [19]; English and Turkish [10]; a similar methodology to collect tweets about 3 vulnerable and English, French, and Arabic [20]. However, we have not groups to discrimination: ethnic minorities, religious minorifound any in Italian or Spanish, which we believe makes this ties, and Roma communities. Piot et al. [14] analyzed the work the first to cover these languages for this task. correlation between the presence of HS and named entities

Two main annotation approaches can be drawn from these in 60 existing datasets. Despite these previous works, there studies, those that annotate at the span level [17, 18, 19, 10] and are no attempts to define a NER-based scheme specifically those that annotate over the full text [20]. On the one hand, intended for HS detection. Our work represents an attempt the work that follows the latter approach presents a corpus of to fill this gap by combining categories from general-purpose 13.000 tweets (5.647 English, 4.014 French, and 3.353 Arabic) NER and a taxonomy of vulnerable groups to discrimination and notes the sentiment of the annotator (shock, sadness, in a common annotation scheme aimed at providing deeper disgust, etc.), hostility type (abusive, hateful, ofensive, etc.), insights about the targets of HS. directness (direct or indirect), target attribute (gender, religion, disabled, etc.) and target group (individual, women, African, etc.). 3. Methodology

On the other hand, works that follow the approach of span annotation design diferent annotation criteria. The simplest, 3.1. Data Collection [17, 18], only annotates one dimension. The first, [ 17], anno- We collect news from public Telegram channels with the tates the parts that make a comment toxic on a 30.000 English telegram-dataset-builder [ 27 ]. The selected channels are comments of the Civil Comments platform. The second, [18], shown in Table 1, they are in Spanish and Italian and aligned annotates only the parts that make a comment ofensive or with the left and right wings of the political spectrum. The hateful in 11.000 Vietnamese comments on Facebook and subset of Italian headlines was integrated with titles published Youtube. The other papers, [19, 10], extend this approach on newspapers Facebook pages that have been collected in and also label the span in which the target of the attack is collaboration with the Italian Amnesty Task Force on HS, a mentioned. Moreover, [19] is not limited to that; they also group of activists that produce counter narratives against disannotate the target type (individual, group, other), the tar- criminatory contents spread by online newspapers and users get attribute (gender, race, ethnic, etc.) and the target group comments2. We collected all the news headlines detected by (LGBTQ +, Muslims, feminists, etc.). Their final corpus has activists in March 2020, 2021, 2022, and 2023, and added them 20.130 annotated ofensive Korean-language news and video to our corpus. comments. Given the large amount of news collected, we applied filters

However, the guidelines used by the diferent works some- to the dataset to reduce it to its final size. We focus on news times present incompatibilities. Although some works use about racism; for this purpose, we applied the classifier piubaofensive and hateful labels in the same way [ 19, 18], others bigdata/beto-contextualized-hate-speech to stick to news items distinguish between these two types of expression [10]. This labeled as racism. Since this classifier is trained on Spanish resource, the last one, has separately annotated hateful and

2https://www.amnesty.it/entra-in-azione/task-force-attivismo/

Migranti, un esercito di scrocconi: 120mila mantenuti con l’8 per mille degli italiani.3 Hordas de gitanos arrasan Mercadona después de que les ingresen 3000 euros en sus ‘tarjetas solidarias’.4 Questa è Villa Aldini, la residenza di lusso che ospita i migranti stupratori a Bologna.5

Vulnerable identity - Migrants Vulnerable identity - Ethnic minority

However, a text may still be considered dangerous even if it

does not explicitly include these markers, as they are intended as examples rather than strict requirements.

Figure 1 provides three examples of annotated headlines, two in Italian and one in Spanish, showing the application 2“Migrants, an army of scroungers: 120,000 supported by the Italians’ of the annotation scheme as described. In the figure, dif8x1000 tax allocation”. ferent colours highlight the various types of labels used. A 3i“nHtohredierssoofligdyapristiyescdaredvsa”s.tate Mercadona after 3000 euros were deposited vulnerable identity was detected in each headline: ‘Migranti’ 4“This is Villa Aldini, the luxury residence that hosts rapist migrants in in the first and in the third one and ‘gitanos’ in the second Bologna”. one, respectively labelled as ‘vulnerable group - migrant’ and texts, prior to this step we automatically translated Italian news with the model facebook/nllb-200-distilled-600M. This translation step is used only for the filtering process; once the news is selected, the translated text is no longer used. In the end, this process generates 532 news headlines classified as racist for Italian and 348 for Spanish, that have been selected for the annotation task.

3.2. Data Annotation

A comprehensive, span-based annotation scheme was developed to label vulnerable identities and entities present in the dataset. Annotators were provided with instructions and had to choose a label and highlight the word, phrase, or portion of text that best embodied the qualities of the chosen label in the text. It was possible to choose more than one label for the same portion of text. The instructions also provided annotators with some examples of annotated headlines.

The initial layer of annotation focuses on identifying vulnerable targets within the text and categorizing them into one of six predefined labels: ethnic minority, migrant, religious minority, women, LGBTQ+ community, and other.

These labels represent vulnerable groups, as the vulnerability of the targets can often be traced back to their belonging to certain categories of people which are particularly exposed to discrimination, marginalisation, or prejudice in society. In cases where the targeted group didn’t fit into one of the predefined labels, annotators were required to use the ‘other’ category. Then, for instances labeled as ‘other’, annotators were instructed to provide specific details regarding the group in a free-text field.

After categorizing vulnerable targets, the second layer involves annotating named entities. Annotators identify entities within the text and label them with one of five possible types: person, group, organization, location, and other. As in the first layer, instances labelled ‘other’ require annotators to provide details about the entity in a free-text field.

The final layers of the annotation scheme address the context in which these entities are mentioned, specifically focusing on identifying derogatory mentions and dangerous speech.

A derogatory mention is characterized by negative or disparaging remarks about the target. In these instances, explicit hate speech is absent, but the mention itself is discriminatory or ofensive, often employing a tone intended to belittle or discredit the target. The label derogatory is used to mark these mentions.

Moreover, the annotation includes identifying dangerous elements: portions of text that, intentionally or unintentionally, could incite hate speech or increase the vulnerability of the target identity. Dangerous speech, which can be either explicit or implicit, promotes or perpetuates negative prejudices and stereotypes, potentially triggering harmful responses against the group. The label dangerous [ 28 ] is used to tag these segments. Annotators were encouraged to use free-text fields to provide details on implicit dangerous speech or recurring dangerous concepts.

The annotation guidelines provided annotators with specific criteria and with the following list of potential markers of dangerous speech to help their identification: • Incitement to violence: the text explicitly encour

ages violence against the target group; • Open discrimination: the text openly states or sup

ports discrimination against the target group; • Ridicule: the text ridicules the target in the eyes of

the readers by belittling it or mocking it; • Stereotyping: the text perpetuates negative stereotypes about the target group, contributing to a distorted view of it; • Disinformation: the text spreads false or misleading

information that can harm the target group; • Dehumanization: the text dehumanizes the target group, using language that equates it with objects or animals; • Criminalization: the text portrays the target group as inherently criminal or associates it with illegal activities, contributing to the perception that the group as a whole is dangerous. ‘vulnerable group - ethnic minority’. The three examples all this would be a full agreement (1 true positive). However, contain multiple elements of dangerous speech, highlighted in if the latter selected “women of Italy”, it would be a partial red, and the second text also contains an element which was agreement (0.5 true positive). marked with the derogatory label. Additionally, the second and the third headlines include examples of annotation for Quantitative Analysis. The agreement on the annotation named entities, with ‘Mercadona’ labelled as ‘entity - organi- of entities is always moderate but difers between the Spanzation’, and ‘Villa Aldini’ and ‘Bologna’ labelled as ‘entity - ish and the Italian subsets. Annotators of Spanish headlines location’. scored a higher agreement on ‘location’ (0.66 vs 0.60), ‘vulnerable’ (0.15 vs 0) and ‘organization’ (0.41 vs 0.12) while 4. The VIRC Corpus entities of the type ‘person’ (0.63 vs 0.47) and ‘other’ (0.1 vs 0) are better recognized in Italian headlines.

The VIRC corpus is a collection of 532 Italian and 348 Spanish On average, the annotation of vulnerable identities resulted news headlines annotated by 2 independent annotators for in a higher agreement between annotators in both subsets each language. Following the perspectivist paradigm [ 29 ], and at the same time confirmed an higher agreement of Spanwe both released the disaggregated annotations and the gold- ish annotations that always outperforms Italian ones. The standard corpus. The code used to generate the gold standard highest agreement emerges for the label ‘migrant’ on which corpus, carry out experiments, and compile statistics can be annotators obtained an F-score of 0.86 for Italian and 0.96 for accessed through the following GitHub repository6. In this Spanish. The agreement on ‘ethnic minority’ is a bit lower but Section we present an analysis of disagreement (Section 4.1) still significant, while Spanish headlines reached an F-score of and relevant statistics about the corpus (Section 4.2). 0.83 Italian ones only 0.63. An equally high agreement is on the ‘lgbtq+’ label, which is only present in Italian headlines with an F-score of 0.8. Among vulnerable groups, women 4.1. Inter-Annotator Agreement scored the lowest F-score: 0.6 for Spanish, 0.22 for Italian. Since the span-based annotation task does not provide a fixed The largest observed discrepancy is with religious minorities, number of annotated items, we adopted the F-score metric to in Spanish an F-score of 1 is achieved while in Italian 0. evaluate the agreement between annotators [ 30 ]. For each sub- While the annotation of ‘dangerous’ spans achieves an acset of the corpus we randomly chose one annotator as the gold ceptable agreement, the ‘derogatory’ annotation is characterstandard set of labels and the other as the set of predictions. ized as the one that achieves the lowest agreement between We then computed the F-score between the two distributions annotators. Additionally, annotations of Italian headlines reof labels in order to measure the agreement between the an- sulted in higher disagreement than Spanish ones, contrary notators. Table 2 shows the results of our analysis. In general, to what we observed about ‘entities’ and ‘vulnerable identiannotations always showed a fair or higher agreement, ex- ties’. Text spans expressing dangerous speech are recognized cept for some entity-related labels and the “derogatory” one. with an agreement of 0.57 for Italian and 0.49 for Spanish There is also a low agreement in the Italian set on the labels headlines. Agreement about ‘derogatory’ is low for Italian “religious minority” and “women”. headlines (0.28) while Spanish ones show almost no agreement (0.08)

Qualitative Analysis. In summary, while the overall redangerous sults of the annotation are positive, some categories show entditeyro-ggartoourpy significant disagreement between annotators. These disagreeentity - location ments highlight the need to review and refine the annotation entity - organization guidelines for problematic categories, and to provide more entity - other detailed instructions. The importance of reassessing the guideentity - person lines in order to make them clearer and more consistent is vulnerable entity further underscored by the fact that, for Spanish headlines, vulnerable group - ethnic minority the annotators agreed on both labels and intervals in only 67 vulnerable group - lgbtq+ community cases, and for Italian headlines, agreement was reached in just vulnerable group - migrant 88 cases.

vulnerable group - other Since the annotation task was span-based, we opted not vulnerable group - religious minority to use a confusion matrix to analyze the disagreement. A vulnerable group - women confusion matrix is not appropriate for span detection, as it Table 2 assumes discrete labels applied to predefined items, whereas The annotators agreement measured through the F-score and bro- our task involved labeling spans of text that varied in length ken down by label. and context. Instead, we performed a qualitative analysis, examining specific cases of disagreement to understand their

Although the overall results are positive, they show signif- nature. This approach allowed us to explore not only how icant variations that can be quantitatively and qualitatively. annotators difered in labeling spans but also why these diferInclusion of overlapping spans was handled as follows: if ences emerged, providing a deeper insight into the underlying one span fully included another, this was considered to be an issues of interpretation and guidelines. agreement. In cases where the spans only partially overlapped, Looking more closely at the headlines where the annotameaning there was some shared text but not full inclusion, this tions present inconsistencies, a variety of motivations behind was treated as a partial agreement. For example, if one anno- discrepancies can be identified. tator labeled “All women” and another selected only “women”, For instance, in the Italian title “Orrore nella casa occupata dagli immigrati: donna lanciata giù dal secondo piano”7, ‘donna’ was marked as a vulnerable identity by only one of ddearnoggeartoourys the annotators, suggesting maybe an erroneous focus on an entities individual target at a time (‘immigrati’) by the other annotator. vulnerable groups

Another type of disagreement relates to the interpretation of derogatory mentions. An example can be found in “Un Table 3 terzo dei reati sono commessi da stranieri (e gli africani hanno The distribution of labels in the gold standard corpus. il record). Tutti i numeri”8, where one annotator identified the term ‘stranieri’ as a derogatory mention, as well as representative of a vulnerable identity, while another annotator simply 4.2. Dataset Analysis stuck to the second label, perhaps highlighting a divergence In this section we provide an analysis of the four label types in the interpretation of the guidelines. Furthermore, it is inter- that occur in the gold standard version of the VIRC corpus: esting to observe the disagreement created by the headlines ‘derogatory’, ‘dangerous’, ‘named entities’, ‘vulnerable groups’. that use generic term ‘stranieri’ (‘foreigners’), which was of- The analysis is twofold: first, we describe the distribution of ten labelled as ‘vulnerable identity - ethnic minority’ by one these label types, then we present a zero-shot and a fewannotator and as ‘vulnerable identity - migrant’ by the other. shot experiment aimed at understanding if existing LLMs This inconsistence between annotators can be identified in (T5[ 31 ] and BART[ 32 ]) are able to recognize these labeled two headlines: “Ius soli e cittadinanza facile agli stranieri? Il spans in news headlines by comparing their outputs to the sangue non è acqua”9 and “Un terzo dei reati sono commessi gold standard annotations. da stranieri (e gli africani hanno il record). Tutti i numeri”2. In the first case, we can solve the disagreement by looking at the context: the explicit reference to the issue of granting citizen- Corpus statistics. Table 3 shows the distribution of label ship suggests that the term ‘foreigners’ is more appropriately types in the corpus. As it can be observed, mentions of vulnerreferred to the specific category of migrants. On the other able groups are the most present, with 270 occurrences in the hand, in the second headline, there is no direct reference to Spanish subset and 253 in the Italian subset. This confirms specifically migration-related issues and thus both interpre- the relevance of annotating vulnerable in the identification tations in terms of the vulnerable category of belonging are of discriminatory contents, which is tied to their high recacceptable. ognizability by annotators (Section 4.1). The role on named

Finally, some texts present a slight diference in the anno- entities difers in the two subsets. Annotators labeled them tation spans of choice, as observed in “Più di 200mila case with agreement 130 times in Spanish headlines and 67 times popolari agli immigrati”10, where the annotators identified in Italian ones. This might be caused by their compositions. dangerous speech in the same section of text, but with dif- Since Italian headlines were partly collected from Facebook ferences in the number of highlighted words (first annotator pages of mainstream newspapers, there was a higher numlabelled ‘Più di 200mila’; second annotator labelled ‘200mila ber of named entities that were not relevant for the analysis case popolari’), reflecting variations in the identification of of headlines’ danger. The number of text spans labeled as relevant content for the analysis of dangerous speech. dangerous is almost equivalent in the two subsets (136 for

In addition to the predefined labels, we also collected free- Spanish, 166 for Italian), showing a good presence of this text fields as part of the annotation process. These comments label type despite the high disagreement between annotators. ofered an additional layer of granularity, allowing annota- Finally, it is worth mentioning the almost total absence of text tors to describe nuances not covered by the fixed categories. spans labeled as ‘derogatory’ with agreement (3 for Spanish, For example, in the Spanish headline “Dos menas marroquíes 16 for Italian) that suggests the high subjectivity of this pheapuñalan a dos turistas para robarles en Salou”11, both an- nomenon and also the need of better define its characteristics notators used the two labels ‘vulnerable identity - ethnic mi- in annotation guidelines. nority’ and ‘vulnerable identity - other’ to annotate the span ‘menas marroquíes’. Alongside the ‘other’ label, one annotator Corpus analysis with LLMs. We completed our analysis provided the comment ‘Under 18’, while the other one used of the VIRC corpus through zero-shot experiments aimed at ‘young people’ to describe the vulnerable group. Although exploring the ability of existing LLMs to identify the four stated diferently, both comments highlight the specific vul- types of labelled spans in messages. We considered the denerability related to the age of the group, complementing the tection of spans as an extractive Question Answering (QA) existing labels. As this example shows, the flexibility in the problem. For the task we adopted the T5[ 31 ] and BART[ 32 ] annotation process provided by free-text fields is useful to LLMs architectures for both languages. For Italian we employ capture multi-categorical terms and to identify potential new [ 33 ] and [ 34 ] and for Spanish [ 35 ] and [ 36 ] models, respeccategories that may not have been initially considered in the tively. The translations of the prompts used are the following predefined labels. (see Appendix A for the original ones): 7“Atrocity in a house occupied by migrants: woman thrown from second lfoor”. 8“One third of all crimes are committed by foreigners (and Africans hold the record). All the numbers”. 9“Ius soli and easy citizenships for foreigners? Blood is not water”. 10“More than 200,000 public housing units for immigrants”. 11“Two Moroccan unaccompanied migrant minors stab two tourists to rob them in Salou”. • What part of the text is dangerous (criminalizes, ridicules, incites violence, ...) against vulnerable identities (women, migrants, ethnic minorities, ...)? • What part of the text is derogatory (negative or pejorative comments about the victim without explicit hate speech, but the mention itself is discriminatory or ofensive, and often uses a tone intended to denigrate or discredit the victim)? • What named entity is mentioned in the sentence? dangerous derogatory

entity vulnerable identity

We designed two approaches for zero-shot experiments, restictive and non-restrictive. On the one hand, for the nonrestictive zero-shot experiments, for each sentence in the dataset, we queried the model with the prompt of each label and extracted the three most confident results. Then, we ifltered out those responses below the %0.02 confidence of the model to limit the noise. Finally, all these annotations go through a majority vote (identical to the one used to build the aggregate dataset) to normalize the model response.

On the other hand, for the restictive zero-shot experiments, we queried the model with the prompts for each annotation present in the aggregated dataset. And, as there are sentences that have two equal labels in diferent spans, we request five diferent annotations from the model, ordered from most confident to least confident. If an annotation was already included, the next annotation is taken in order to avoid duplicating annotations in the model.

Table 4 presents the F-scores for each label type, experiment, and model. In general, T5 and BART tend to perform more efectively in Spanish compared to Italian. The models face noticeable challenges in identifying the labels ‘dangerous’, ‘derogatory’, and ‘entity’. Nevertheless, when they are aware that the label exists within the sentence (restictive), they manage to recognize it with fairly good agreement. During annotation, the label ‘derogatory’ proves most challenging to identify. In the non-restrictive scenario, it scarcely receives any agreement, yet in the restictive scenario, it achieves a reasonable level, particularly in Spanish. This indicates that the model struggles to discern its presence initially but, once acknowledged, can recognise the expression.

The restictive method enhances performance over the nonrestictive method for all labels except ‘vulnerable identity.’ This shows that models generally have a better comprehension and identification of vulnerable identities in sentences without restrictions compared to when they are restricted to specific mentions. It should also be noted that, in the Spanish context, T5 is more efective than BART in identifying ‘vulnerable identity’ labels for both approaches, while BART performs better in Italian.

These results show that a NER-based annotation scheme for HS detection is dificult to annotated but also to be automatically detected. Larger resources are necessary to develop models that are able to detect the complex semantics of HS.

5. Conclusions and Future Work The Vulnerable Identities Recognition Corpus (VIRC), created

in this work, reveals the challenge of identifying vulnerable identities due to the rapid evolution of language on social media. Our experiments indicate that large language models (LLMs) struggle significantly with this task.

VIRC provides a detailed and structured resource that en

hances understanding of the extensive use of hate speech in Italian and Spanish news headlines. The corpus is particularly valuable as it includes more annotation dimensions compared to related studies in other languages, such as vulnerable identities, dangerous discourse, derogatory expressions, and entities. This diferentiation between vulnerable identities and entities, as well as between dangerous and derogatory elements, enables the development of sophisticated detection tools that can facilitate large-scale actions to mitigate the impact of hate speech (e.g., moderation of messages and generation of counter-narratives that reduce the damage to the mental health of victims).

Future work will focus on expanding this resource by doubling the size of annotations for both languages and including non-racism-related phrases to ensure the resource is comprehensive. Additionally, we plan to refine the annotation guidelines to avoid low agreement on the derogatory label, enhancing the overall reliability and utility of the corpus. These eforts will further improve the efectiveness of hate speech detection and contribute to the development of policies and tools for a safer online environment.

Acknowledgments This work is supported by the Predoctoral Grant (PIPF

2022/COM-25947) of the Consejería de Educación, Ciencia y Universidades de la Comunidad de Madrid, Spain. Arianna Longo’s work has been supported by aequa-tech. The authors gratefully acknowledge the Universidad Politécnica de Madrid (www.upm.es) for providing computing resources on the IPTC-AI innovation Space AI Supercomputing Cluster. North American Chapter of the Association for Com- [16] M. Madeddu, S. Frenda, M. Lai, V. Patti, V. Basile, Disputational Linguistics: Human Language Technologies, aggreghate it corpus: A disaggregated italian dataset of 2021, pp. 2289–2303. hate speech, in: F. Boschetti, G. E. Lebani, B. Magnini, [5] P. Chiril, E. W. Pamungkas, F. Benamara, V. Moriceau, N. Novielli (Eds.), Proceedings of the Ninth Italian ConV. Patti, Emotionally informed hate speech detection: A ference on Computational Linguistics (CLiC-it 2023)., multi-target perspective, Cogn. Comput. 14 (2022) 322– volume 3596, 2023. 352. URL: https://doi.org/10.1007/s12559-021-09862-5. [17] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopoudoi:10.1007/S12559-021-09862-5. los, SemEval-2021 task 5: Toxic spans detection, in: [6] M. Sap, D. Card, S. Gabriel, Y. Choi, N. A. Smith, The A. Palmer, N. Schneider, N. Schluter, G. Emerson, A. Herrisk of racial bias in hate speech detection, in: Proceed- belot, X. Zhu (Eds.), Proceedings of the 15th Interings of the 57th annual meeting of the association for national Workshop on Semantic Evaluation (SemEvalcomputational linguistics, 2019, pp. 1668–1678. 2021), Association for Computational Linguistics, On[7] P. Sachdeva, R. Barreto, G. Bacon, A. Sahn, C. Von Va- line, 2021, pp. 59–69. URL: https://aclanthology.org/2021. cano, C. Kennedy, The measuring hate speech corpus: semeval-1.6. doi:10.18653/v1/2021.semeval-1.6. Leveraging rasch measurement theory for data perspec- [18] P. G. Hoang, C. D. Luu, K. Q. Tran, K. V. Nguyen, tivism, in: Proceedings of the 1st Workshop on Perspec- N. L.-T. Nguyen, ViHOS: Hate speech spans detectivist Approaches to NLP@ LREC2022, 2022, pp. 83–94. tion for Vietnamese, in: A. Vlachos, I. Augenstein [8] B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, (Eds.), Proceedings of the 17th Conference of the EuA. Mukherjee, Hatexplain: A benchmark dataset for ropean Chapter of the Association for Computational explainable hate speech detection, in: Proceedings of Linguistics, Association for Computational Linguistics, the AAAI conference on artificial intelligence, volume 35, Dubrovnik, Croatia, 2023, pp. 652–669. URL: https: 2021, pp. 14867–14875. //aclanthology.org/2023.eacl-main.47. doi:10.18653/ [9] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopou- v1/2023.eacl-main.47.

los, Semeval-2021 task 5: Toxic spans detection, in: [19] Y. Jeong, J. Oh, J. Lee, J. Ahn, J. Moon, S. Park, Proceedings of the 15th international workshop on se- A. Oh, KOLD: Korean ofensive language dataset, mantic evaluation (SemEval-2021), 2021, pp. 59–69. in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.), Pro[10] K. Büyükdemirci, I. E. Kucukkaya, E. Ölmez, C. Toraman, ceedings of the 2022 Conference on Empirical MethJL-Hate: An Annotated Dataset for Joint Learning of ods in Natural Language Processing, Association Hate Speech and Target Detection, in: N. Calzolari, M.-Y. for Computational Linguistics, Abu Dhabi, United Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceed- Arab Emirates, 2022, pp. 10818–10833. URL: https:// ings of the 2024 Joint International Conference on Com- aclanthology.org/2022.emnlp-main.744. doi:10.18653/ putational Linguistics, Language Resources and Eval- v1/2022.emnlp-main.744. uation (LREC-COLING 2024), ELRA and ICCL, Torino, [20] N. Ousidhoum, Z. Lin, H. Zhang, Y. Song, D.-Y. Yeung, Italia, 2024, pp. 9543–9553. Multilingual and multi-aspect hate speech analysis, in: [11] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the V. Patti, Resources and benchmark corpora 2019 Conference on Empirical Methods in Natural Lanfor hate speech detection: a systematic review, guage Processing and the 9th International Joint ConferLang. Resour. Evaluation 55 (2021) 477–523. ence on Natural Language Processing (EMNLP-IJCNLP), URL: https://doi.org/10.1007/s10579-020-09502-8. Association for Computational Linguistics, Hong Kong, doi:10.1007/S10579-020-09502-8. China, 2019, pp. 4675–4684. URL: https://aclanthology. [12] E. Leonardelli, S. Menini, A. P. Aprosio, M. Guerini, org/D19-1474. doi:10.18653/v1/D19-1474.

S. Tonelli, Agreeing to disagree: Annotating ofensive [21] B. Jehangir, S. Radhakrishnan, R. Agarwal, A survey on language datasets with annotators’ disagreement, in: named entity recognition - datasets, tools, and methodProceedings of the 2021 Conference on Empirical Meth- ologies, Natural Language Processing Journal 3 (2023). ods in Natural Language Processing, 2021, pp. 10528– [22] E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, 10539. R. Weischedel, Ontonotes: the 90% solution, in: Pro[13] H. Kirk, W. Yin, B. Vidgen, P. Röttger, Semeval-2023 task ceedings of the human language technology conference 10: Explainable detection of online sexism, in: Proceed- of the NAACL, Companion Volume: Short Papers, 2006, ings of the 17th International Workshop on Semantic pp. 57–60.

Evaluation (SemEval-2023), 2023, pp. 2193–2210. [23] N. Collier, T. Ohta, Y. Tsuruoka, Y. Tateisi, J.-D. Kim, [14] P. Piot, P. Martín-Rodilla, J. Parapar, Metahate: A dataset Introduction to the bio-entity recognition task at jnlpba, for unifying eforts on hate speech detection, Proceed- in: N. Collier, P. Ruch, A. Nazarenko (Eds.), Proceedings ings of the International AAAI Conference on Web of the International Joint Workshop on Natural Lanand Social Media 18 (2024) 2025–2039. URL: https://ojs. guage Processing in Biomedicine and its Applications aaai.org/index.php/ICWSM/article/view/31445. doi:10. (NLPBA/BioNLP), COLING, 2004, pp. 73–78. 1609/icwsm.v18i1.31445. [24] M. ElSherief, V. Kulkarni, D. Nguyen, W. Y. Wang, [15] D. Nozza, F. Bianchi, G. Attanasio, HATE-ITA: Hate E. Belding, Hate lingo: A target-based linguistic analysis speech detection in Italian social media text, in: of hate speech in social media, in: Proceedings of the K. Narang, A. Mostafazadeh Davani, L. Mathias, B. Vid- international AAAI conference on web and social media, gen, Z. Talat (Eds.), Proceedings of the Sixth Work- volume 12, 2018. shop on Online Abuse and Harms (WOAH), Associa- [25] F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, L. Plaza, tion for Computational Linguistics, Seattle, Washington Automatic classification of sexism in social networks: (Hybrid), 2022, pp. 252–260. doi:10.18653/v1/2022. An empirical study on twitter data, IEEE Access 8 (2020) woah-1.24. 219563–219576. • Dangerous: “¿Qué parte del texto es peligroso (criminaliza, ridiculiza, incita a la violencia, ...) contra identidades vulnerables (mujeres, migrantes, minorías étnicas, ...)?” • Derogatory: “¿Qué parte del texto es derogativo (comentarios negativos o despectivos sobre la víctima sin incitación explícita al odio, pero la mención en sí es discriminatoria u ofensiva, y a menudo emplea un tono destinado a menospreciar o desacreditar a la víctima)?” • Entity: “¿Qué entidad nombrada se menciona en la frase?” • Vulnerable Identity: “¿Qué identidad vulnerable al

discurso de odio se menciona en la frase?”

[1]

Davidson ,

Warmsley ,

Macy , I. Weber , Automated hate speech detection and the problem of ofensive language , in: Proceedings of the international AAAI conference on web and social media , volume 11 , 2017 , pp. 512 - 515 .

[2]

Waseem , Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter , in: Proceedings of the first workshop on NLP and computational social science , 2016 , pp. 138 - 142 .

[3]

ElSherief ,

Ziems ,

Muchlinski ,

Anupindi ,

Seybolt , M. De Choudhury , D. Yang , Latent hatred: A benchmark for understanding implicit hate speech , in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , 2021 , pp. 345 - 363 .

[4]

Vidgen ,

Nguyen ,

Margetts ,

Rossini ,

Tromble , Introducing cad: the contextual abuse dataset , in: Proceedings of the 2021 Conference of the

[26]

Sanguinetti ,

Poletto ,

Bosco ,

Patti ,

Stranisci , An italian twitter corpus of hate speech against immigrants , in: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018 ), 2018 .

[27] I.

Guillén-Pacho, oeg-upm/telegram-dataset-

builder: version 1.0.0 , 2024 . URL: https://doi.org/10.5281/zenodo. 12773159. doi: 10 .5281/zenodo.12773159.

[28]

Benesch , Dangerous speech, 86272 12 ( 2023 ) 185 - 197 .

[29]

Cabitza ,

Campagner ,

Basile , Toward a perspectivist turn in ground truthing for predictive computing , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 37 , 2023 , pp. 6860 - 6868 .

[30]

Brants , Inter-annotator agreement for a german newspaper corpus ., in: LREC, Citeseer, 2000 .

[31]

Rafel ,

Shazeer ,

Roberts ,

Lee ,

Narang ,

Matena ,

Zhou ,

Li ,

P. J.

Liu , Exploring the limits of transfer learning with a unified text-to-text transformer , Journal of machine learning research 21 ( 2020 ) 1 - 67 .

[32]

Lewis ,

Liu ,

Goyal ,

Ghazvininejad ,

Mohamed ,

Levy ,

Stoyanov , L. Zettlemoyer, Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation , and comprehension, 2019 . URL: https://arxiv.org/abs/ 1910 .13461. arXiv: 1910 .13461.

[33]

Sarti , M. Nissim, IT5: Text-to-text pretraining for Italian language understanding and generation , in: N. Calzolari , M.- Y.

Kan , V.

Hoste , A.

Lenci , S.

Sakti , N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics , Language Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL , Torino , Italia, 2024 , pp. 9422 - 9433 . URL: https://aclanthology.org/ 2024 .lrec-main. 823 .

[34] M. La Quatra , L. Cagliero , Bart-it: An eficient sequenceto-sequence model for italian text summarization , Future Internet 15 ( 2023 ). URL: https://www.mdpi.com/ 1999-5903/15/1/15. doi: 10 .3390/fi15010015.

[35]

Araujo , M. M. Trusca , R. Tufiño , M.-F. Moens , Sequence-to-sequence spanish pre-trained language models , 2023 . arXiv: 2309 . 11259 .

[36]

Araujo , M. M. Trusca , R. Tufiño , M.-F. Moens , Sequence-to-sequence spanish pre-trained language models , 2023 . arXiv: 2309 . 11259 .