<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ibai</forename><surname>Guillén-Pacho</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Ontology Engineering Group</orgName>
								<orgName type="institution">Universidad Politécnica de Madrid</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Arianna</forename><surname>Longo</surname></persName>
							<email>arianna.longo401@edu.unito.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Turin</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Aequa-tech</orgName>
								<address>
									<settlement>Torino</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marco</forename><forename type="middle">Antonio</forename><surname>Stranisci</surname></persName>
							<email>marcoantonio.stranisci@unito.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Turin</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Aequa-tech</orgName>
								<address>
									<settlement>Torino</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Viviana</forename><surname>Patti</surname></persName>
							<email>viviana.patti@unito.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Turin</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Carlos</forename><surname>Badenes-Olmedo</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Ontology Engineering Group</orgName>
								<orgName type="institution">Universidad Politécnica de Madrid</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
							<affiliation key="aff3">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="institution">Universidad Poltécnica de Madrid</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">1EADB3EAC43E813CAD032BE2B2D99457</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>hate speech</term>
					<term>vulnerable identities</term>
					<term>annotated corpora</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents the Vulnerable Identities Recognition Corpus (VIRC), a novel resource designed to enhance hate speech analysis in Italian and Spanish news headlines. VIRC comprises 880 headlines, manually annotated for vulnerable identities, dangerous discourse, derogatory expressions, and entities. Our experiments reveal that recent large language models (LLMs) struggle with the fine-grained identification of these elements, underscoring the complexity of detecting hate speech. VIRC stands out as the first resource of its kind in these languages, offering a richer annotation scheme compared to existing corpora. The insights derived from VIRC can inform the development of sophisticated detection tools and the creation of policies and regulations to combat hate speech on social media, promoting a safer online environment. Future work will focus on expanding the corpus and refining annotation guidelines to further enhance its comprehensiveness and reliability.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Hate Speech (HS) detection is a task with a high social impact. Developing technologies that are able to recognize these forms of discrimination is not only crucial to enforce existing laws but it also supports important tasks like the moderation of social media contents. However, recognizing HS is challenging. Verbal discrimination takes different forms and involves a number of correlated phenomena that make difficult to reduce HS as a binary classification.</p><p>Analyzing the recent history of corpora annotated for HS it is possible to observe the shift from very broad categorizations of hatred contents to increasingly detailed annotation schemes aimed at understanding the complexity of this phenomenon. High-level schemes including dimensions like "hateful/offensiveness" <ref type="bibr" target="#b0">[1]</ref> or "sexism/racism" <ref type="bibr" target="#b1">[2]</ref> paved the way for more sophisticated attempts to formalize such concepts in different directions: exploring the interaction between HS and vulnerable targets <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>; studying the impact of subjectivity <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>; identifying the triggers of HS in texts <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>. Despite this trend, the complex semantics of HS in texts is far from being fully explored. Information Extraction (IE) approaches to HS annotation have been rarely implemented, yet. Therefore, corpora that includes fine-grained structured semantic representation of HS incidents are not available. The only notable exception is the recent work of Büyükdemirci et al. <ref type="bibr" target="#b9">[10]</ref>, which treat the identification of HS targets as a span-based task.</p><p>In order to fill this gap, we present the Vulnerable Identities Recognition Corpus (VIRC): a dataset of 880 Italian and Spanish headlines against migrants aimed at providing an event-centric representation of HS against vulnerable groups. The annotation scheme is built on four elements:</p><p>• Named Entity Recognition (NER). All the named entities that are involved in a HS expression: 'location', 'organization', and 'person'. • Vulnerable Identity mentions. Generic mentions related to identities target of HS as they are defined by the international regulatory frameworks 1 : 'women', 'LGBTQI', 'ethnic minority', and 'migrant'. • Derogatory mentions. All mentions that negatively portray people belonging to vulnerable groups. • Dangerous speech. The part of the message that is perceived as hateful against named entities or vulnerable identities.</p><p>In this paper we present a preliminary annotation experiment intended to validate the scheme and to assess the impact on disagreement in such a fine-grained task. The paper is structured as follows. In Section 2, we discuss related work, in Section 3, we describe the methodology used, in Section 4, we introduce the VIRC corpus, and in Section 5, we present the conclusions and discuss possible future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Literature on automatic HS detection is vast and follows different research directions <ref type="bibr" target="#b11">[11]</ref>: from the analysis of subjectivity in the perception of this phenomenon <ref type="bibr" target="#b12">[12]</ref> to the definition of ever more refined categorizations of hateful contents <ref type="bibr" target="#b13">[13]</ref>. In this section we focus on the approaches to HS detection that are aimed at studying the target of HS inspired by Information Extraction (IE) approaches. In Section 2.1 we review HS resources inspired by this approach with a specific focus on span-based annotated corpora. In Section 2.2 we discuss the implementation of NER-based techniquest in the creation of HS corpora.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Hate Speech Detection</head><p>A large amount of work on HS detection focuses on classification, both binary (existence or not) and multi-labeled (misogyny, racism, xenophobia, etc.). This has led to the existence of large collections of datasets such as those grouped by <ref type="bibr" target="#b14">[14]</ref>. One of the main problems is that most resources are in English, and for mid-to-low resource languages (e.g., Italian), some HS categories are not covered. This constraint is mitigated by cross-lingual transfer learning to exploit resources in other languages <ref type="bibr" target="#b15">[15]</ref> and, although good results are achieved, the creation of resources for these languages is still necessary.</p><p>The main resources for the identification of HS are particularly focused on a target by identifying the presence or absence of HS in them. As in the work of <ref type="bibr" target="#b16">[16]</ref>, where in 1,100 tweets in Italian with special target on immigrants were annotated according to the presence of HS, irony, and the stance of the message's author on immigration matters. However, recently, there has been an increasing focus on identifying hateful expressions and their intended targets. The change in paradigm suggests that resources should be wider in scope and not focus on a particular discourse target. The main resources in this field have high linguistic diversity, although they do not all follow the same annotation scheme, with English being the most common language. We have found works in English <ref type="bibr" target="#b17">[17]</ref>; Vietnamese <ref type="bibr" target="#b18">[18]</ref>; Korean <ref type="bibr" target="#b19">[19]</ref>; English and Turkish <ref type="bibr" target="#b9">[10]</ref>; and English, French, and Arabic <ref type="bibr" target="#b20">[20]</ref>. However, we have not found any in Italian or Spanish, which we believe makes this work the first to cover these languages for this task.</p><p>Two main annotation approaches can be drawn from these studies, those that annotate at the span level <ref type="bibr" target="#b17">[17,</ref><ref type="bibr" target="#b18">18,</ref><ref type="bibr" target="#b19">19,</ref><ref type="bibr" target="#b9">10]</ref> and those that annotate over the full text <ref type="bibr" target="#b20">[20]</ref>. On the one hand, the work that follows the latter approach presents a corpus of 13.000 tweets (5.647 English, 4.014 French, and 3.353 Arabic) and notes the sentiment of the annotator (shock, sadness, disgust, etc.), hostility type (abusive, hateful, offensive, etc.), directness (direct or indirect), target attribute (gender, religion, disabled, etc.) and target group (individual, women, African, etc.).</p><p>On the other hand, works that follow the approach of span annotation design different annotation criteria. The simplest, <ref type="bibr" target="#b17">[17,</ref><ref type="bibr" target="#b18">18]</ref>, only annotates one dimension. The first, <ref type="bibr" target="#b17">[17]</ref>, annotates the parts that make a comment toxic on a 30.000 English comments of the Civil Comments platform. The second, <ref type="bibr" target="#b18">[18]</ref>, annotates only the parts that make a comment offensive or hateful in 11.000 Vietnamese comments on Facebook and Youtube. The other papers, <ref type="bibr" target="#b19">[19,</ref><ref type="bibr" target="#b9">10]</ref>, extend this approach and also label the span in which the target of the attack is mentioned. Moreover, <ref type="bibr" target="#b19">[19]</ref> is not limited to that; they also annotate the target type (individual, group, other), the target attribute (gender, race, ethnic, etc.) and the target group (LGBTQ +, Muslims, feminists, etc.). Their final corpus has 20.130 annotated offensive Korean-language news and video comments.</p><p>However, the guidelines used by the different works sometimes present incompatibilities. Although some works use offensive and hateful labels in the same way <ref type="bibr" target="#b19">[19,</ref><ref type="bibr" target="#b18">18]</ref>, others distinguish between these two types of expression <ref type="bibr" target="#b9">[10]</ref>. This resource, the last one, has separately annotated hateful and offensive expressions, totaling 765 tweets in English and 765 tweets in Turkish.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Named Entity Recognition</head><p>Developed as a branch of Information Extraction (IE), Named Entity Recognition (NER) is a field of research aimed at detecting named entities in documents according to different schemes. Following the review of Jehangir et al. <ref type="bibr" target="#b21">[21]</ref>, it is possible to observe general-purpose schemes, which usually includes entities of the type 'person', 'location', 'organization' and 'time', and schemes defined for specific applications. OntoNotes <ref type="bibr" target="#b22">[22]</ref> is an example of the first type of approach: a broad collection of documents gathered from different sources (e.g., newspaper, television news) annotated with a tagset that includes general categories of named entities. On the other hand, more specific applications include biomedical NER, which focuses on identifying entities relevant to the biomedical field, such as diseases, genes and chemicals. An example in this field is the JNLPBA dataset <ref type="bibr" target="#b23">[23]</ref>, which is derived from the GENIA corpus. This dataset consists of 2,000 biomedical abstracts from the MEDLINE database, annotated with detailed entity types such as proteins, DNA, RNA, cell lines and cell types.</p><p>NER-based approaches for HS detection and analysis are still few. ElSherief et al. <ref type="bibr" target="#b24">[24]</ref> exploited Twitter users' mentions to distinguish between directed and generalized forms of HS. Rodríguez-Sánchez et al. <ref type="bibr" target="#b25">[25]</ref> used derogatory expressions of women as seeds to collect misogynist messages according to a fine grained classification of this phenomenon. <ref type="bibr" target="#b26">[26]</ref> adopted a similar methodology to collect tweets about 3 vulnerable groups to discrimination: ethnic minorities, religious minorities, and Roma communities. Piot et al. <ref type="bibr" target="#b14">[14]</ref> analyzed the correlation between the presence of HS and named entities in 60 existing datasets. Despite these previous works, there are no attempts to define a NER-based scheme specifically intended for HS detection. Our work represents an attempt to fill this gap by combining categories from general-purpose NER and a taxonomy of vulnerable groups to discrimination in a common annotation scheme aimed at providing deeper insights about the targets of HS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data Collection</head><p>We collect news from public Telegram channels with the telegram-dataset-builder <ref type="bibr" target="#b27">[27]</ref>. The selected channels are shown in Table <ref type="table">1</ref>, they are in Spanish and Italian and aligned with the left and right wings of the political spectrum. The subset of Italian headlines was integrated with titles published on newspapers Facebook pages that have been collected in collaboration with the Italian Amnesty Task Force on HS, a group of activists that produce counter narratives against discriminatory contents spread by online newspapers and users comments <ref type="foot" target="#foot_0">2</ref> . We collected all the news headlines detected by activists in March 2020, 2021, 2022, and 2023, and added them to our corpus.</p><p>Given the large amount of news collected, we applied filters to the dataset to reduce it to its final size. We focus on news about racism; for this purpose, we applied the classifier piubabigdata/beto-contextualized-hate-speech to stick to news items labeled as racism. Since this classifier is trained on Spanish Migranti, un esercito di scrocconi: 120mila mantenuti con l'8 per mille degli italiani. <ref type="foot" target="#foot_2">3</ref>Hordas de gitanos arrasan Mercadona después de que les ingresen 3000 euros en sus 'tarjetas solidarias'. <ref type="foot" target="#foot_3">4</ref>Questa è Villa Aldini, la residenza di lusso che ospita i migranti stupratori a Bologna. 5   Vulnerable identity -Migrants Derogatory Entity -Location Vulnerable identity -Ethnic minority Dangerous speech Entity -Organization MediterraneoDGT, elmundoes</p><p>Italian ByobluOfficial, sadefenza terzaroma, marcellopamio, ilpri-matonazionaleIPN, VoxNewsInfo</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Telegram channels from which the news have been extracted. texts, prior to this step we automatically translated Italian news with the model facebook/nllb-200-distilled-600M. This translation step is used only for the filtering process; once the news is selected, the translated text is no longer used. In the end, this process generates 532 news headlines classified as racist for Italian and 348 for Spanish, that have been selected for the annotation task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Data Annotation</head><p>A comprehensive, span-based annotation scheme was developed to label vulnerable identities and entities present in the dataset. Annotators were provided with instructions and had to choose a label and highlight the word, phrase, or portion of text that best embodied the qualities of the chosen label in the text. It was possible to choose more than one label for the same portion of text. The instructions also provided annotators with some examples of annotated headlines.</p><p>The initial layer of annotation focuses on identifying vulnerable targets within the text and categorizing them into one of six predefined labels: ethnic minority, migrant, religious minority, women, LGBTQ+ community, and other. These labels represent vulnerable groups, as the vulnerability of the targets can often be traced back to their belonging to certain categories of people which are particularly exposed to discrimination, marginalisation, or prejudice in society. In cases where the targeted group didn't fit into one of the predefined labels, annotators were required to use the 'other' category. Then, for instances labeled as 'other', annotators were instructed to provide specific details regarding the group in a free-text field.</p><p>After categorizing vulnerable targets, the second layer involves annotating named entities. Annotators identify entities within the text and label them with one of five possible types: person, group, organization, location, and other. As in the first layer, instances labelled 'other' require annotators to provide details about the entity in a free-text field.</p><p>The final layers of the annotation scheme address the context in which these entities are mentioned, specifically focusing on identifying derogatory mentions and dangerous speech.</p><p>A derogatory mention is characterized by negative or disparaging remarks about the target. In these instances, explicit hate speech is absent, but the mention itself is discriminatory or offensive, often employing a tone intended to belittle or discredit the target. The label derogatory is used to mark these mentions.</p><p>Moreover, the annotation includes identifying dangerous elements: portions of text that, intentionally or unintentionally, could incite hate speech or increase the vulnerability of the target identity. Dangerous speech, which can be either explicit or implicit, promotes or perpetuates negative prejudices and stereotypes, potentially triggering harmful responses against the group. The label dangerous <ref type="bibr" target="#b28">[28]</ref> is used to tag these segments. Annotators were encouraged to use free-text fields to provide details on implicit dangerous speech or recurring dangerous concepts.</p><p>The annotation guidelines provided annotators with specific criteria and with the following list of potential markers of dangerous speech to help their identification:</p><p>• Incitement to violence: the text explicitly encourages violence against the target group; • Open discrimination: the text openly states or supports discrimination against the target group; • Ridicule: the text ridicules the target in the eyes of the readers by belittling it or mocking it; • Stereotyping: the text perpetuates negative stereotypes about the target group, contributing to a distorted view of it; • Disinformation: the text spreads false or misleading information that can harm the target group; • Dehumanization: the text dehumanizes the target group, using language that equates it with objects or animals; • Criminalization: the text portrays the target group as inherently criminal or associates it with illegal activities, contributing to the perception that the group as a whole is dangerous.</p><p>However, a text may still be considered dangerous even if it does not explicitly include these markers, as they are intended as examples rather than strict requirements.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> provides three examples of annotated headlines, two in Italian and one in Spanish, showing the application of the annotation scheme as described. In the figure, different colours highlight the various types of labels used. A vulnerable identity was detected in each headline: 'Migranti' in the first and in the third one and 'gitanos' in the second one, respectively labelled as 'vulnerable group -migrant' and 'vulnerable group -ethnic minority'. The three examples all contain multiple elements of dangerous speech, highlighted in red, and the second text also contains an element which was marked with the derogatory label. Additionally, the second and the third headlines include examples of annotation for named entities, with 'Mercadona' labelled as 'entity -organization', and 'Villa Aldini' and 'Bologna' labelled as 'entitylocation'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">The VIRC Corpus</head><p>The VIRC corpus is a collection of 532 Italian and 348 Spanish news headlines annotated by 2 independent annotators for each language. Following the perspectivist paradigm <ref type="bibr" target="#b29">[29]</ref>, we both released the disaggregated annotations and the goldstandard corpus. The code used to generate the gold standard corpus, carry out experiments, and compile statistics can be accessed through the following GitHub repository <ref type="foot" target="#foot_4">6</ref> . In this Section we present an analysis of disagreement (Section 4.1) and relevant statistics about the corpus (Section 4.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Inter-Annotator Agreement</head><p>Since the span-based annotation task does not provide a fixed number of annotated items, we adopted the F-score metric to evaluate the agreement between annotators <ref type="bibr" target="#b30">[30]</ref>. For each subset of the corpus we randomly chose one annotator as the gold standard set of labels and the other as the set of predictions. We then computed the F-score between the two distributions of labels in order to measure the agreement between the annotators. Table <ref type="table">2</ref> shows the results of our analysis. In general, annotations always showed a fair or higher agreement, except for some entity-related labels and the "derogatory" one. There is also a low agreement in the Italian set on the labels "religious minority" and "women". </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IAA (F-</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>The annotators agreement measured through the F-score and broken down by label.</p><p>Although the overall results are positive, they show significant variations that can be quantitatively and qualitatively. Inclusion of overlapping spans was handled as follows: if one span fully included another, this was considered to be an agreement. In cases where the spans only partially overlapped, meaning there was some shared text but not full inclusion, this was treated as a partial agreement. For example, if one annotator labeled "All women" and another selected only "women", this would be a full agreement (1 true positive). However, if the latter selected "women of Italy", it would be a partial agreement (0.5 true positive).</p><p>Quantitative Analysis. The agreement on the annotation of entities is always moderate but differs between the Spanish and the Italian subsets. Annotators of Spanish headlines scored a higher agreement on 'location' (0.66 vs 0.60), 'vulnerable' (0.15 vs 0) and 'organization' (0.41 vs 0.12) while entities of the type 'person' (0.63 vs 0.47) and 'other' (0.1 vs 0) are better recognized in Italian headlines.</p><p>On average, the annotation of vulnerable identities resulted in a higher agreement between annotators in both subsets and at the same time confirmed an higher agreement of Spanish annotations that always outperforms Italian ones. The highest agreement emerges for the label 'migrant' on which annotators obtained an F-score of 0.86 for Italian and 0.96 for Spanish. The agreement on 'ethnic minority' is a bit lower but still significant, while Spanish headlines reached an F-score of 0.83 Italian ones only 0.63. An equally high agreement is on the 'lgbtq+' label, which is only present in Italian headlines with an F-score of 0.8. Among vulnerable groups, women scored the lowest F-score: 0.6 for Spanish, 0.22 for Italian. The largest observed discrepancy is with religious minorities, in Spanish an F-score of 1 is achieved while in Italian 0.</p><p>While the annotation of 'dangerous' spans achieves an acceptable agreement, the 'derogatory' annotation is characterized as the one that achieves the lowest agreement between annotators. Additionally, annotations of Italian headlines resulted in higher disagreement than Spanish ones, contrary to what we observed about 'entities' and 'vulnerable identities'. Text spans expressing dangerous speech are recognized with an agreement of 0.57 for Italian and 0.49 for Spanish headlines. Agreement about 'derogatory' is low for Italian headlines (0.28) while Spanish ones show almost no agreement (0.08) Qualitative Analysis. In summary, while the overall results of the annotation are positive, some categories show significant disagreement between annotators. These disagreements highlight the need to review and refine the annotation guidelines for problematic categories, and to provide more detailed instructions. The importance of reassessing the guidelines in order to make them clearer and more consistent is further underscored by the fact that, for Spanish headlines, the annotators agreed on both labels and intervals in only 67 cases, and for Italian headlines, agreement was reached in just 88 cases.</p><p>Since the annotation task was span-based, we opted not to use a confusion matrix to analyze the disagreement. A confusion matrix is not appropriate for span detection, as it assumes discrete labels applied to predefined items, whereas our task involved labeling spans of text that varied in length and context. Instead, we performed a qualitative analysis, examining specific cases of disagreement to understand their nature. This approach allowed us to explore not only how annotators differed in labeling spans but also why these differences emerged, providing a deeper insight into the underlying issues of interpretation and guidelines.</p><p>Looking more closely at the headlines where the annotations present inconsistencies, a variety of motivations behind discrepancies can be identified.</p><p>For instance, in the Italian title "Orrore nella casa occu-pata dagli immigrati: donna lanciata giù dal secondo piano" 7 , 'donna' was marked as a vulnerable identity by only one of the annotators, suggesting maybe an erroneous focus on an individual target at a time ('immigrati') by the other annotator.</p><p>Another type of disagreement relates to the interpretation of derogatory mentions. An example can be found in "Un terzo dei reati sono commessi da stranieri (e gli africani hanno il record). Tutti i numeri" 8 , where one annotator identified the term 'stranieri' as a derogatory mention, as well as representative of a vulnerable identity, while another annotator simply stuck to the second label, perhaps highlighting a divergence in the interpretation of the guidelines. Furthermore, it is interesting to observe the disagreement created by the headlines that use generic term 'stranieri' ('foreigners'), which was often labelled as 'vulnerable identity -ethnic minority' by one annotator and as 'vulnerable identity -migrant' by the other. This inconsistence between annotators can be identified in two headlines: "Ius soli e cittadinanza facile agli stranieri? Il sangue non è acqua" 9 and "Un terzo dei reati sono commessi da stranieri (e gli africani hanno il record). Tutti i numeri" 2 . In the first case, we can solve the disagreement by looking at the context: the explicit reference to the issue of granting citizenship suggests that the term 'foreigners' is more appropriately referred to the specific category of migrants. On the other hand, in the second headline, there is no direct reference to specifically migration-related issues and thus both interpretations in terms of the vulnerable category of belonging are acceptable.</p><p>Finally, some texts present a slight difference in the annotation spans of choice, as observed in "Più di 200mila case popolari agli immigrati" 10 , where the annotators identified dangerous speech in the same section of text, but with differences in the number of highlighted words (first annotator labelled 'Più di 200mila'; second annotator labelled '200mila case popolari'), reflecting variations in the identification of relevant content for the analysis of dangerous speech.</p><p>In addition to the predefined labels, we also collected freetext fields as part of the annotation process. These comments offered an additional layer of granularity, allowing annotators to describe nuances not covered by the fixed categories. For example, in the Spanish headline "Dos menas marroquíes apuñalan a dos turistas para robarles en Salou" 11 , both annotators used the two labels 'vulnerable identity -ethnic minority' and 'vulnerable identity -other' to annotate the span 'menas marroquíes'. Alongside the 'other' label, one annotator provided the comment 'Under 18', while the other one used 'young people' to describe the vulnerable group. Although stated differently, both comments highlight the specific vulnerability related to the age of the group, complementing the existing labels. As this example shows, the flexibility in the annotation process provided by free-text fields is useful to capture multi-categorical terms and to identify potential new categories that may not have been initially considered in the predefined labels. "Atrocity in a house occupied by migrants: woman thrown from second floor".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>8</head><p>"One third of all crimes are committed by foreigners (and Africans hold the record). All the numbers". 9 "Ius soli and easy citizenships for foreigners? Blood is not water". 10 "More than 200,000 public housing units for immigrants". </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>The distribution of labels in the gold standard corpus.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Dataset Analysis</head><p>In this section we provide an analysis of the four label types that occur in the gold standard version of the VIRC corpus: 'derogatory', 'dangerous', 'named entities', 'vulnerable groups'. The analysis is twofold: first, we describe the distribution of these label types, then we present a zero-shot and a fewshot experiment aimed at understanding if existing LLMs (T5 <ref type="bibr" target="#b31">[31]</ref> and BART <ref type="bibr" target="#b32">[32]</ref>) are able to recognize these labeled spans in news headlines by comparing their outputs to the gold standard annotations.</p><p>Corpus statistics. Table <ref type="table">3</ref> shows the distribution of label types in the corpus. As it can be observed, mentions of vulnerable groups are the most present, with 270 occurrences in the Spanish subset and 253 in the Italian subset. This confirms the relevance of annotating vulnerable in the identification of discriminatory contents, which is tied to their high recognizability by annotators (Section 4.1). The role on named entities differs in the two subsets. Annotators labeled them with agreement 130 times in Spanish headlines and 67 times in Italian ones. This might be caused by their compositions. Since Italian headlines were partly collected from Facebook pages of mainstream newspapers, there was a higher number of named entities that were not relevant for the analysis of headlines' danger. The number of text spans labeled as dangerous is almost equivalent in the two subsets (136 for Spanish, 166 for Italian), showing a good presence of this label type despite the high disagreement between annotators. Finally, it is worth mentioning the almost total absence of text spans labeled as 'derogatory' with agreement (3 for Spanish, 16 for Italian) that suggests the high subjectivity of this phenomenon and also the need of better define its characteristics in annotation guidelines.</p><p>Corpus analysis with LLMs. We completed our analysis of the VIRC corpus through zero-shot experiments aimed at exploring the ability of existing LLMs to identify the four types of labelled spans in messages. We considered the detection of spans as an extractive Question Answering (QA) problem. For the task we adopted the T5 <ref type="bibr" target="#b31">[31]</ref> and BART <ref type="bibr" target="#b32">[32]</ref> LLMs architectures for both languages. For Italian we employ <ref type="bibr" target="#b33">[33]</ref> and <ref type="bibr" target="#b34">[34]</ref> and for Spanish <ref type="bibr" target="#b35">[35]</ref> and <ref type="bibr" target="#b36">[36]</ref>  F-score results of zero-shot experiments on the VIRC corpus with T5 and BART models for each label.</p><p>• Which hate speech vulnerable identity is mentioned in the sentence?</p><p>We designed two approaches for zero-shot experiments, restictive and non-restrictive. On the one hand, for the nonrestictive zero-shot experiments, for each sentence in the dataset, we queried the model with the prompt of each label and extracted the three most confident results. Then, we filtered out those responses below the %0.02 confidence of the model to limit the noise. Finally, all these annotations go through a majority vote (identical to the one used to build the aggregate dataset) to normalize the model response.</p><p>On the other hand, for the restictive zero-shot experiments, we queried the model with the prompts for each annotation present in the aggregated dataset. And, as there are sentences that have two equal labels in different spans, we request five different annotations from the model, ordered from most confident to least confident. If an annotation was already included, the next annotation is taken in order to avoid duplicating annotations in the model.</p><p>Table <ref type="table" target="#tab_2">4</ref> presents the F-scores for each label type, experiment, and model. In general, T5 and BART tend to perform more effectively in Spanish compared to Italian. The models face noticeable challenges in identifying the labels 'dangerous', 'derogatory', and 'entity'. Nevertheless, when they are aware that the label exists within the sentence (restictive), they manage to recognize it with fairly good agreement. During annotation, the label 'derogatory' proves most challenging to identify. In the non-restrictive scenario, it scarcely receives any agreement, yet in the restictive scenario, it achieves a reasonable level, particularly in Spanish. This indicates that the model struggles to discern its presence initially but, once acknowledged, can recognise the expression.</p><p>The restictive method enhances performance over the nonrestictive method for all labels except 'vulnerable identity.' This shows that models generally have a better comprehension and identification of vulnerable identities in sentences without restrictions compared to when they are restricted to specific mentions. It should also be noted that, in the Spanish context, T5 is more effective than BART in identifying 'vulnerable identity' labels for both approaches, while BART performs better in Italian.</p><p>These results show that a NER-based annotation scheme for HS detection is difficult to annotated but also to be automatically detected. Larger resources are necessary to develop models that are able to detect the complex semantics of HS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and Future Work</head><p>The Vulnerable Identities Recognition Corpus (VIRC), created in this work, reveals the challenge of identifying vulnerable identities due to the rapid evolution of language on social media. Our experiments indicate that large language models (LLMs) struggle significantly with this task.</p><p>VIRC provides a detailed and structured resource that enhances understanding of the extensive use of hate speech in Italian and Spanish news headlines. The corpus is particularly valuable as it includes more annotation dimensions compared to related studies in other languages, such as vulnerable identities, dangerous discourse, derogatory expressions, and entities. This differentiation between vulnerable identities and entities, as well as between dangerous and derogatory elements, enables the development of sophisticated detection tools that can facilitate large-scale actions to mitigate the impact of hate speech (e.g., moderation of messages and generation of counter-narratives that reduce the damage to the mental health of victims).</p><p>Future work will focus on expanding this resource by doubling the size of annotations for both languages and including non-racism-related phrases to ensure the resource is comprehensive. Additionally, we plan to refine the annotation guidelines to avoid low agreement on the derogatory label, enhancing the overall reliability and utility of the corpus. These efforts will further improve the effectiveness of hate speech detection and contribute to the development of policies and tools for a safer online environment.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Examples of annotated headlines</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>7</head><label>7</label><figDesc></figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>11 "</head><label>11</label><figDesc>Two Moroccan unaccompanied migrant minors stab two tourists to rob them in Salou".</figDesc><table><row><cell></cell><cell>Spanish</cell><cell>Italian</cell></row><row><cell>dangerous</cell><cell>136</cell><cell>166</cell></row><row><cell>derogatory</cell><cell>3</cell><cell>16</cell></row><row><cell>entities</cell><cell>140</cell><cell>146</cell></row><row><cell>vulnerable groups</cell><cell>270</cell><cell>253</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc></figDesc><table><row><cell>models, respec-</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://www.amnesty.it/entra-in-azione/task-force-attivismo/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">"Migrants, an army of scroungers: 120,000 supported by the Italians' 8x1000 tax allocation".</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">"Hordes of gypsies devastate Mercadona after 3000 euros were deposited in their solidarity cards".</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">"This is Villa Aldini, the luxury residence that hosts rapist migrants in Bologna".</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">https://github.com/oeg-upm/virc</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work is supported by the Predoctoral Grant (PIPF-2022/COM-25947) of the Consejería de Educación, Ciencia y Universidades de la Comunidad de Madrid, Spain. Arianna Longo's work has been supported by aequa-tech. The authors gratefully acknowledge the Universidad Politécnica de Madrid (www.upm.es) for providing computing resources on the IPTC-AI innovation Space AI Supercomputing Cluster.</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>C. Badenes-Olmedo) https://iguillenp.github.io/ (I. Guillén-Pacho); https://marcostranisci.github.io/ (M. A. Stranisci); https://www.unito.it/persone/vpatti (V. Patti); https://about.me/cbadenes (C. Badenes-Olmedo) 0000-0001-7801-8815 (I. Guillén-Pacho); 0009-0005-8500-1946 (A. Longo); 0000-0001-9337-7250 (M. A. Stranisci); 0000-0001-5991-370X (V. Patti); 0000-0002-2753-9917 (C. Badenes-Olmedo</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. LLMs Prompts</head><p>The prompts used are the same for each model but different for each language. For Spanish, the prompts used for each label are:</p><p>• Dangerous: "¿Qué parte del texto es peligroso (criminaliza, ridiculiza, incita a la violencia, ...) contra identidades vulnerables (mujeres, migrantes, minorías étnicas, ...)?" • Derogatory: "¿Qué parte del texto es derogativo (comentarios negativos o despectivos sobre la víctima sin incitación explícita al odio, pero la mención en sí es discriminatoria u ofensiva, y a menudo emplea un tono destinado a menospreciar o desacreditar a la víctima)?" • Entity: "¿Qué entidad nombrada se menciona en la frase?"</p><p>• Vulnerable Identity: "¿Qué identidad vulnerable al discurso de odio se menciona en la frase?"</p><p>For Italian:</p><p>• Dangerous: "Quale parte del testo è pericolosa (criminalizza, ridicolizza, incita alla violenza, ...) nei confronti di identità vulnerabili (donne, migranti, minoranze etniche, ...)?" • Derogatory: "Quale parte del testo è dispregiativa (commenti negativi o denigratori sulla vittima senza un esplicito discorso d'odio, ma in cui la menzione stessa è discriminatoria o offensiva e spesso usa un tono volto a sminuire o screditare la vittima)?" • Entity: "Quale entità nominata è menzionata nella frase?" • Vulnerable Identity: "Quale identità vulnerabile ai discorsi d'odio è menzionata nella frase?"</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Automated hate speech detection and the problem of offensive language</title>
		<author>
			<persName><forename type="first">T</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warmsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Macy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Weber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the international AAAI conference on web and social media</title>
				<meeting>the international AAAI conference on web and social media</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="512" to="515" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Waseem</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the first workshop on NLP and computational social science</title>
				<meeting>the first workshop on NLP and computational social science</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="138" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Latent hatred: A benchmark for understanding implicit hate speech</title>
		<author>
			<persName><forename type="first">M</forename><surname>Elsherief</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ziems</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Muchlinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Anupindi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Seybolt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">De</forename><surname>Choudhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="345" to="363" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Introducing cad: the contextual abuse dataset</title>
		<author>
			<persName><forename type="first">B</forename><surname>Vidgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Margetts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rossini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tromble</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2289" to="2303" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Emotionally informed hate speech detection: A multi-target perspective</title>
		<author>
			<persName><forename type="first">P</forename><surname>Chiril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">W</forename><surname>Pamungkas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Benamara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Moriceau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<idno type="DOI">10.1007/S12559-021-09862-5</idno>
		<ptr target="https://doi.org/10.1007/s12559-021-09862-5.doi:10.1007/S12559-021-09862-5" />
	</analytic>
	<monogr>
		<title level="j">Cogn. Comput</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="322" to="352" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The risk of racial bias in hate speech detection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Card</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gabriel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th annual meeting of the association for computational linguistics</title>
				<meeting>the 57th annual meeting of the association for computational linguistics</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1668" to="1678" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The measuring hate speech corpus: Leveraging rasch measurement theory for data perspectivism</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sachdeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Barreto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bacon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Von</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Vacano</surname></persName>
		</author>
		<author>
			<persName><surname>Kennedy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Workshop on Perspectivist Approaches to NLP@ LREC2022</title>
				<meeting>the 1st Workshop on Perspectivist Approaches to NLP@ LREC2022</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="83" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Hatexplain: A benchmark dataset for explainable hate speech detection</title>
		<author>
			<persName><forename type="first">B</forename><surname>Mathew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Yimam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Biemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI conference on artificial intelligence</title>
				<meeting>the AAAI conference on artificial intelligence</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="14867" to="14875" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Androutsopoulos, Semeval-2021 task 5: Toxic spans detection</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pavlopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sorensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Laugier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th international workshop on semantic evaluation (SemEval-2021)</title>
				<meeting>the 15th international workshop on semantic evaluation (SemEval-2021)</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="59" to="69" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Hate: An Annotated Dataset for Joint Learning of Hate Speech and Target Detection</title>
		<author>
			<persName><forename type="first">K</forename><surname>Büyükdemirci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">E</forename><surname>Kucukkaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ölmez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Toraman</surname></persName>
		</author>
		<author>
			<persName><surname>Jl-</surname></persName>
		</author>
		<editor>N. Calzolari, M.-Y.</editor>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m">Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL</title>
				<editor>
			<persName><forename type="first">V</forename><surname>Kan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Hoste</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Lenci</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Sakti</surname></persName>
		</editor>
		<editor>
			<persName><surname>Xue</surname></persName>
		</editor>
		<meeting>the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL<address><addrLine>Torino, Italia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="9543" to="9553" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Resources and benchmark corpora for hate speech detection: a systematic review</title>
		<author>
			<persName><forename type="first">F</forename><surname>Poletto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sanguinetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bosco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<idno type="DOI">10.1007/S10579-020-09502-8</idno>
		<ptr target="https://doi.org/10.1007/s10579-020-09502-8.doi:10.1007/S10579-020-09502-8" />
	</analytic>
	<monogr>
		<title level="j">Lang. Resour. Evaluation</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="477" to="523" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Agreeing to disagree: Annotating offensive language datasets with annotators&apos; disagreement</title>
		<author>
			<persName><forename type="first">E</forename><surname>Leonardelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Menini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Aprosio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tonelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="10528" to="10539" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Semeval-2023 task 10: Explainable detection of online sexism</title>
		<author>
			<persName><forename type="first">H</forename><surname>Kirk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Vidgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Röttger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th International Workshop on Semantic Evaluation</title>
				<meeting>the 17th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page" from="2193" to="2210" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Metahate: A dataset for unifying efforts on hate speech detection</title>
		<author>
			<persName><forename type="first">P</forename><surname>Piot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martín-Rodilla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Parapar</surname></persName>
		</author>
		<idno type="DOI">10.1609/icwsm.v18i1.31445</idno>
		<ptr target="https://ojs.aaai.org/index.php/ICWSM/article/view/31445.doi:10.1609/icwsm.v18i1.31445" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International AAAI Conference on Web and Social Media</title>
				<meeting>the International AAAI Conference on Web and Social Media</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="2025" to="2039" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">HATE-ITA: Hate speech detection in Italian social media text</title>
		<author>
			<persName><forename type="first">D</forename><surname>Nozza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bianchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Attanasio</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.woah-1.24</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Narang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Mostafazadeh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Davani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Mathias</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Vidgen</surname></persName>
		</editor>
		<editor>
			<persName><surname>Talat</surname></persName>
		</editor>
		<meeting>the Sixth Workshop on Online Abuse and Harms (WOAH), Association for Computational Linguistics<address><addrLine>Seattle, Washington (Hybrid</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="252" to="260" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Disaggreghate it corpus: A disaggregated italian dataset of hate speech</title>
		<author>
			<persName><forename type="first">M</forename><surname>Madeddu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Frenda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)</title>
				<editor>
			<persName><forename type="first">F</forename><surname>Boschetti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Lebani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Magnini</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</editor>
		<meeting>the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">3596</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Androutsopoulos, SemEval-2021 task 5: Toxic spans detection</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pavlopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sorensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Laugier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.semeval-1.6</idno>
		<ptr target="https://aclanthology.org/2021.semeval-1.6.doi:10.18653/v1/2021.semeval-1.6" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Palmer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schneider</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Emerson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Herbelot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Zhu</surname></persName>
		</editor>
		<meeting>the 15th International Workshop on Semantic Evaluation (SemEval-2021), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="59" to="69" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">ViHOS: Hate speech spans detection for Vietnamese</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G</forename><surname>Hoang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Luu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">V</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">L</forename></persName>
		</author>
		<author>
			<persName><forename type="first">.-T</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.eacl-main.47</idno>
		<ptr target="https://aclanthology.org/2023.eacl-main.47.doi:10.18653/v1/2023.eacl-main.47" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Augenstein</surname></persName>
		</editor>
		<meeting>the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Dubrovnik, Croatia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="652" to="669" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">KOLD: Korean offensive language dataset</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Jeong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Oh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Moon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.emnlp-main.744</idno>
		<ptr target="https://aclanthology.org/2022.emnlp-main.744.doi:10.18653/v1/2022.emnlp-main.744" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Kozareva</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</editor>
		<meeting>the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Abu Dhabi, United Arab Emirates</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="10818" to="10833" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Multilingual and multi-aspect hate speech analysis</title>
		<author>
			<persName><forename type="first">N</forename><surname>Ousidhoum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D.-Y</forename><surname>Yeung</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1474</idno>
		<ptr target="https://aclanthology.org/D19-1474.doi:10.18653/v1/D19-1474" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4675" to="4684" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A survey on named entity recognition -datasets, tools, and methodologies</title>
		<author>
			<persName><forename type="first">B</forename><surname>Jehangir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Radhakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Agarwal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Natural Language Processing Journal</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Ontonotes: the 90% solution</title>
		<author>
			<persName><forename type="first">E</forename><surname>Hovy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Marcus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Palmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ramshaw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weischedel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers</title>
				<meeting>the human language technology conference of the NAACL, Companion Volume: Short Papers</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="57" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Introduction to the bio-entity recognition task at jnlpba</title>
		<author>
			<persName><forename type="first">N</forename><surname>Collier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ohta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tsuruoka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tateisi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-D</forename><surname>Kim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Collier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Ruch</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Nazarenko</surname></persName>
		</editor>
		<meeting>the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)</meeting>
		<imprint>
			<publisher>COLING</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="73" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Hate lingo: A target-based linguistic analysis of hate speech in social media</title>
		<author>
			<persName><forename type="first">M</forename><surname>Elsherief</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kulkarni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Belding</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the international AAAI conference on web and social media</title>
				<meeting>the international AAAI conference on web and social media</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Automatic classification of sexism in social networks: An empirical study on twitter data</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rodríguez-Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="219563" to="219576" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">An italian twitter corpus of hate speech against immigrants</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sanguinetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Poletto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bosco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stranisci</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018)</title>
				<meeting>the eleventh international conference on language resources and evaluation (LREC 2018)</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">oeg-upm/telegram-dataset-builder</title>
		<author>
			<persName><forename type="first">I</forename><surname>Guillén-Pacho</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.12773159</idno>
		<ptr target="https://doi.org/10.5281/zenodo.12773159.doi:10.5281/zenodo.12773159" />
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">S</forename><surname>Benesch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Dangerous speech</title>
		<imprint>
			<biblScope unit="volume">86272</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="185" to="197" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Toward a perspectivist turn in ground truthing for predictive computing</title>
		<author>
			<persName><forename type="first">F</forename><surname>Cabitza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Campagner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="6860" to="6868" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Inter-annotator agreement for a german newspaper corpus</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brants</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LREC, Citeseer</title>
				<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Exploring the limits of transfer learning with a unified text-to-text transformer</title>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Matena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of machine learning research</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="1" to="67" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ghazvininejad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mohamed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1910.13461.arXiv:1910.13461" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">IT5: Text-to-text pretraining for Italian language understanding and generation</title>
		<author>
			<persName><forename type="first">G</forename><surname>Sarti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nissim</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.lrec-main.823" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M.-Y</forename><surname>Kan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Hoste</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Sakti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Xue</surname></persName>
		</editor>
		<meeting>the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)<address><addrLine>Torino, Italia</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA and ICCL</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="9422" to="9433" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Bart-it: An efficient sequenceto-sequence model for italian text summarization</title>
		<author>
			<persName><forename type="first">M</forename><surname>La Quatra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cagliero</surname></persName>
		</author>
		<idno type="DOI">10.3390/fi15010015</idno>
		<ptr target="https://www.mdpi.com/1999-5903/15/1/15.doi:10.3390/fi15010015" />
	</analytic>
	<monogr>
		<title level="j">Future Internet</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Araujo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Trusca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tufiño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-F</forename><surname>Moens</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2309.11259</idno>
		<title level="m">Sequence-to-sequence spanish pre-trained language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Araujo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Trusca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tufiño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-F</forename><surname>Moens</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2309.11259</idno>
		<title level="m">Sequence-to-sequence spanish pre-trained language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
