<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ibai Guillén-Pacho</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arianna Longo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Antonio Stranisci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viviana Patti</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Badenes-Olmedo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aequa-tech</institution>
          ,
          <addr-line>Torino, Italy, aequa-tech.com</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Computer Science Department, Universidad Poltécnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ontology Engineering Group, Universidad Politécnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the Vulnerable Identities Recognition Corpus (VIRC), a novel resource designed to enhance hate speech analysis in Italian and Spanish news headlines. VIRC comprises 880 headlines, manually annotated for vulnerable identities, dangerous discourse, derogatory expressions, and entities. Our experiments reveal that recent large language models (LLMs) struggle with the fine-grained identification of these elements, underscoring the complexity of detecting hate speech. VIRC stands out as the first resource of its kind in these languages, ofering a richer annotation scheme compared to existing corpora. The insights derived from VIRC can inform the development of sophisticated detection tools and the creation of policies and regulations to combat hate speech on social media, promoting a safer online environment. Future work will focus on expanding the corpus and refining annotation guidelines to further enhance its comprehensiveness and reliability.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;hate speech</kwd>
        <kwd>vulnerable identities</kwd>
        <kwd>annotated corpora</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>et al. [10], which treat the identification of HS targets as a
span-based task.</p>
      <p>Hate Speech (HS) detection is a task with a high social impact. In order to fill this gap, we present the Vulnerable
IdentiDeveloping technologies that are able to recognize these forms ties Recognition Corpus (VIRC): a dataset of 880 Italian and
of discrimination is not only crucial to enforce existing laws Spanish headlines against migrants aimed at providing an
but it also supports important tasks like the moderation of event-centric representation of HS against vulnerable groups.
social media contents. However, recognizing HS is challeng- The annotation scheme is built on four elements:
ing. Verbal discrimination takes diferent forms and involves a
number of correlated phenomena that make dificult to reduce • Named Entity Recognition (NER). All the named
HS as a binary classification. entities that are involved in a HS expression: ‘location’,</p>
      <p>
        Analyzing the recent history of corpora annotated for HS it ‘organization’, and ‘person’.
is possible to observe the shift from very broad categorizations • Vulnerable Identity mentions. Generic mentions
of hatred contents to increasingly detailed annotation schemes related to identities target of HS as they are defined by
aimed at understanding the complexity of this phenomenon. the international regulatory frameworks 1: ‘women’,
High-level schemes including dimensions like “hateful/ofen- ‘LGBTQI’, ‘ethnic minority’, and ‘migrant’.
siveness” [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or “sexism/racism” [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] paved the way for more • Derogatory mentions. All mentions that negatively
sophisticated attempts to formalize such concepts in diferent portray people belonging to vulnerable groups.
directions: exploring the interaction between HS and vulnera- • Dangerous speech. The part of the message that is
ble targets [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4, 5</xref>
        ]; studying the impact of subjectivity [6, 7]; perceived as hateful against named entities or
vulneridentifying the triggers of HS in texts [8, 9]. able identities.
      </p>
      <p>Despite this trend, the complex semantics of HS in texts
is far from being fully explored. Information Extraction (IE)
approaches to HS annotation have been rarely implemented,
yet. Therefore, corpora that includes fine-grained structured
semantic representation of HS incidents are not available. The
only notable exception is the recent work of Büyükdemirci</p>
      <p>In this paper we present a preliminary annotation
experiment intended to validate the scheme and to assess the impact
on disagreement in such a fine-grained task. The paper is
structured as follows. In Section 2, we discuss related work,
in Section 3, we describe the methodology used, in Section 4,
we introduce the VIRC corpus, and in Section 5, we present
the conclusions and discuss possible future work.
resources inspired by this approach with a specific focus on
span-based annotated corpora. In Section 2.2 we discuss the
implementation of NER-based techniquest in the creation of
HS corpora.
ofensive expressions, totaling 765 tweets in English and 765
tweets in Turkish.</p>
      <sec id="sec-1-1">
        <title>2.2. Named Entity Recognition</title>
        <p>2.1. Hate Speech Detection Developed as a branch of Information Extraction (IE), Named
Entity Recognition (NER) is a field of research aimed at
deA large amount of work on HS detection focuses on clas- tecting named entities in documents according to diferent
sification, both binary (existence or not) and multi-labeled schemes. Following the review of Jehangir et al. [21], it is
(misogyny, racism, xenophobia, etc.). This has led to the exis- possible to observe general-purpose schemes, which usually
tence of large collections of datasets such as those grouped by includes entities of the type ‘person’, ‘location’,
‘organiza[14]. One of the main problems is that most resources are in tion’ and ‘time’, and schemes defined for specific applications.
English, and for mid-to-low resource languages (e.g., Italian), OntoNotes [22] is an example of the first type of approach: a
some HS categories are not covered. This constraint is miti- broad collection of documents gathered from diferent sources
gated by cross-lingual transfer learning to exploit resources in (e.g., newspaper, television news) annotated with a tagset
other languages [15] and, although good results are achieved, that includes general categories of named entities. On the
the creation of resources for these languages is still necessary. other hand, more specific applications include biomedical</p>
        <p>
          The main resources for the identification of HS are par- NER, which focuses on identifying entities relevant to the
ticularly focused on a target by identifying the presence or biomedical field, such as diseases, genes and chemicals. An
absence of HS in them. As in the work of [16], where in 1,100 example in this field is the JNLPBA dataset[ 23], which is
detweets in Italian with special target on immigrants were an- rived from the GENIA corpus. This dataset consists of 2,000
notated according to the presence of HS, irony, and the stance biomedical abstracts from the MEDLINE database, annotated
of the message’s author on immigration matters. However, with detailed entity types such as proteins, DNA, RNA, cell
recently, there has been an increasing focus on identifying lines and cell types.
hateful expressions and their intended targets. The change in NER-based approaches for HS detection and analysis are
paradigm suggests that resources should be wider in scope and still few. ElSherief et al. [24] exploited Twitter users’ mentions
not focus on a particular discourse target. The main resources to distinguish between directed and generalized forms of HS.
in this field have high linguistic diversity, although they do Rodríguez-Sánchez et al. [25] used derogatory expressions of
not all follow the same annotation scheme, with English being women as seeds to collect misogynist messages according to
the most common language. We have found works in English a fine grained classification of this phenomenon. [
          <xref ref-type="bibr" rid="ref5">26</xref>
          ] adopted
[17]; Vietnamese[18]; Korean [19]; English and Turkish [10]; a similar methodology to collect tweets about 3 vulnerable
and English, French, and Arabic [20]. However, we have not groups to discrimination: ethnic minorities, religious
minorifound any in Italian or Spanish, which we believe makes this ties, and Roma communities. Piot et al. [14] analyzed the
work the first to cover these languages for this task. correlation between the presence of HS and named entities
        </p>
        <p>Two main annotation approaches can be drawn from these in 60 existing datasets. Despite these previous works, there
studies, those that annotate at the span level [17, 18, 19, 10] and are no attempts to define a NER-based scheme specifically
those that annotate over the full text [20]. On the one hand, intended for HS detection. Our work represents an attempt
the work that follows the latter approach presents a corpus of to fill this gap by combining categories from general-purpose
13.000 tweets (5.647 English, 4.014 French, and 3.353 Arabic) NER and a taxonomy of vulnerable groups to discrimination
and notes the sentiment of the annotator (shock, sadness, in a common annotation scheme aimed at providing deeper
disgust, etc.), hostility type (abusive, hateful, ofensive, etc.), insights about the targets of HS.
directness (direct or indirect), target attribute (gender, religion,
disabled, etc.) and target group (individual, women, African,
etc.). 3. Methodology</p>
        <p>
          On the other hand, works that follow the approach of span
annotation design diferent annotation criteria. The simplest, 3.1. Data Collection
[17, 18], only annotates one dimension. The first, [ 17], anno- We collect news from public Telegram channels with the
tates the parts that make a comment toxic on a 30.000 English telegram-dataset-builder [
          <xref ref-type="bibr" rid="ref6">27</xref>
          ]. The selected channels are
comments of the Civil Comments platform. The second, [18], shown in Table 1, they are in Spanish and Italian and aligned
annotates only the parts that make a comment ofensive or with the left and right wings of the political spectrum. The
hateful in 11.000 Vietnamese comments on Facebook and subset of Italian headlines was integrated with titles published
Youtube. The other papers, [19, 10], extend this approach on newspapers Facebook pages that have been collected in
and also label the span in which the target of the attack is collaboration with the Italian Amnesty Task Force on HS, a
mentioned. Moreover, [19] is not limited to that; they also group of activists that produce counter narratives against
disannotate the target type (individual, group, other), the tar- criminatory contents spread by online newspapers and users
get attribute (gender, race, ethnic, etc.) and the target group comments2. We collected all the news headlines detected by
(LGBTQ +, Muslims, feminists, etc.). Their final corpus has activists in March 2020, 2021, 2022, and 2023, and added them
20.130 annotated ofensive Korean-language news and video to our corpus.
comments. Given the large amount of news collected, we applied filters
        </p>
        <p>However, the guidelines used by the diferent works some- to the dataset to reduce it to its final size. We focus on news
times present incompatibilities. Although some works use about racism; for this purpose, we applied the classifier
piubaofensive and hateful labels in the same way [ 19, 18], others bigdata/beto-contextualized-hate-speech to stick to news items
distinguish between these two types of expression [10]. This labeled as racism. Since this classifier is trained on Spanish
resource, the last one, has separately annotated hateful and</p>
        <sec id="sec-1-1-1">
          <title>2https://www.amnesty.it/entra-in-azione/task-force-attivismo/</title>
          <p>Migranti, un esercito di scrocconi: 120mila mantenuti con l’8 per mille degli italiani.3
Hordas de gitanos arrasan Mercadona después de que les ingresen 3000 euros en sus ‘tarjetas solidarias’.4
Questa è Villa Aldini, la residenza di lusso che ospita i migranti stupratori a Bologna.5</p>
          <p>Vulnerable identity - Migrants
Vulnerable identity - Ethnic minority</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>However, a text may still be considered dangerous even if it</title>
          <p>does not explicitly include these markers, as they are intended
as examples rather than strict requirements.</p>
          <p>Figure 1 provides three examples of annotated headlines,
two in Italian and one in Spanish, showing the application
2“Migrants, an army of scroungers: 120,000 supported by the Italians’ of the annotation scheme as described. In the figure,
dif8x1000 tax allocation”. ferent colours highlight the various types of labels used. A
3i“nHtohredierssoofligdyapristiyescdaredvsa”s.tate Mercadona after 3000 euros were deposited vulnerable identity was detected in each headline: ‘Migranti’
4“This is Villa Aldini, the luxury residence that hosts rapist migrants in in the first and in the third one and ‘gitanos’ in the second
Bologna”. one, respectively labelled as ‘vulnerable group - migrant’ and
texts, prior to this step we automatically translated Italian
news with the model facebook/nllb-200-distilled-600M. This
translation step is used only for the filtering process; once the
news is selected, the translated text is no longer used. In the
end, this process generates 532 news headlines classified as
racist for Italian and 348 for Spanish, that have been selected
for the annotation task.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>3.2. Data Annotation</title>
        <p>A comprehensive, span-based annotation scheme was
developed to label vulnerable identities and entities present in the
dataset. Annotators were provided with instructions and had
to choose a label and highlight the word, phrase, or portion
of text that best embodied the qualities of the chosen label
in the text. It was possible to choose more than one label
for the same portion of text. The instructions also provided
annotators with some examples of annotated headlines.</p>
        <p>The initial layer of annotation focuses on identifying
vulnerable targets within the text and categorizing them into
one of six predefined labels: ethnic minority, migrant,
religious minority, women, LGBTQ+ community, and other.</p>
        <p>These labels represent vulnerable groups, as the vulnerability
of the targets can often be traced back to their belonging to
certain categories of people which are particularly exposed
to discrimination, marginalisation, or prejudice in society. In
cases where the targeted group didn’t fit into one of the
predefined labels, annotators were required to use the ‘other’
category. Then, for instances labeled as ‘other’, annotators
were instructed to provide specific details regarding the group
in a free-text field.</p>
        <p>After categorizing vulnerable targets, the second layer
involves annotating named entities. Annotators identify entities
within the text and label them with one of five possible types:
person, group, organization, location, and other. As in
the first layer, instances labelled ‘other’ require annotators to
provide details about the entity in a free-text field.</p>
        <p>The final layers of the annotation scheme address the
context in which these entities are mentioned, specifically
focusing on identifying derogatory mentions and dangerous
speech.</p>
        <p>A derogatory mention is characterized by negative or
disparaging remarks about the target. In these instances, explicit
hate speech is absent, but the mention itself is discriminatory
or ofensive, often employing a tone intended to belittle or
discredit the target. The label derogatory is used to mark
these mentions.</p>
        <p>
          Moreover, the annotation includes identifying dangerous
elements: portions of text that, intentionally or unintentionally,
could incite hate speech or increase the vulnerability of the
target identity. Dangerous speech, which can be either explicit
or implicit, promotes or perpetuates negative prejudices and
stereotypes, potentially triggering harmful responses against
the group. The label dangerous [
          <xref ref-type="bibr" rid="ref7">28</xref>
          ] is used to tag these
segments. Annotators were encouraged to use free-text fields
to provide details on implicit dangerous speech or recurring
dangerous concepts.
        </p>
        <p>The annotation guidelines provided annotators with
specific criteria and with the following list of potential markers
of dangerous speech to help their identification:
• Incitement to violence: the text explicitly
encour</p>
        <p>ages violence against the target group;
• Open discrimination: the text openly states or
sup</p>
        <p>ports discrimination against the target group;
• Ridicule: the text ridicules the target in the eyes of</p>
        <p>the readers by belittling it or mocking it;
• Stereotyping: the text perpetuates negative
stereotypes about the target group, contributing to a
distorted view of it;
• Disinformation: the text spreads false or misleading</p>
        <p>information that can harm the target group;
• Dehumanization: the text dehumanizes the target
group, using language that equates it with objects or
animals;
• Criminalization: the text portrays the target group
as inherently criminal or associates it with illegal
activities, contributing to the perception that the group
as a whole is dangerous.
‘vulnerable group - ethnic minority’. The three examples all this would be a full agreement (1 true positive). However,
contain multiple elements of dangerous speech, highlighted in if the latter selected “women of Italy”, it would be a partial
red, and the second text also contains an element which was agreement (0.5 true positive).
marked with the derogatory label. Additionally, the second
and the third headlines include examples of annotation for Quantitative Analysis. The agreement on the annotation
named entities, with ‘Mercadona’ labelled as ‘entity - organi- of entities is always moderate but difers between the
Spanzation’, and ‘Villa Aldini’ and ‘Bologna’ labelled as ‘entity - ish and the Italian subsets. Annotators of Spanish headlines
location’. scored a higher agreement on ‘location’ (0.66 vs 0.60),
‘vulnerable’ (0.15 vs 0) and ‘organization’ (0.41 vs 0.12) while
4. The VIRC Corpus entities of the type ‘person’ (0.63 vs 0.47) and ‘other’ (0.1 vs
0) are better recognized in Italian headlines.</p>
        <p>
          The VIRC corpus is a collection of 532 Italian and 348 Spanish On average, the annotation of vulnerable identities resulted
news headlines annotated by 2 independent annotators for in a higher agreement between annotators in both subsets
each language. Following the perspectivist paradigm [
          <xref ref-type="bibr" rid="ref8">29</xref>
          ], and at the same time confirmed an higher agreement of
Spanwe both released the disaggregated annotations and the gold- ish annotations that always outperforms Italian ones. The
standard corpus. The code used to generate the gold standard highest agreement emerges for the label ‘migrant’ on which
corpus, carry out experiments, and compile statistics can be annotators obtained an F-score of 0.86 for Italian and 0.96 for
accessed through the following GitHub repository6. In this Spanish. The agreement on ‘ethnic minority’ is a bit lower but
Section we present an analysis of disagreement (Section 4.1) still significant, while Spanish headlines reached an F-score of
and relevant statistics about the corpus (Section 4.2). 0.83 Italian ones only 0.63. An equally high agreement is on
the ‘lgbtq+’ label, which is only present in Italian headlines
with an F-score of 0.8. Among vulnerable groups, women
4.1. Inter-Annotator Agreement scored the lowest F-score: 0.6 for Spanish, 0.22 for Italian.
Since the span-based annotation task does not provide a fixed The largest observed discrepancy is with religious minorities,
number of annotated items, we adopted the F-score metric to in Spanish an F-score of 1 is achieved while in Italian 0.
evaluate the agreement between annotators [
          <xref ref-type="bibr" rid="ref9">30</xref>
          ]. For each sub- While the annotation of ‘dangerous’ spans achieves an
acset of the corpus we randomly chose one annotator as the gold ceptable agreement, the ‘derogatory’ annotation is
characterstandard set of labels and the other as the set of predictions. ized as the one that achieves the lowest agreement between
We then computed the F-score between the two distributions annotators. Additionally, annotations of Italian headlines
reof labels in order to measure the agreement between the an- sulted in higher disagreement than Spanish ones, contrary
notators. Table 2 shows the results of our analysis. In general, to what we observed about ‘entities’ and ‘vulnerable
identiannotations always showed a fair or higher agreement, ex- ties’. Text spans expressing dangerous speech are recognized
cept for some entity-related labels and the “derogatory” one. with an agreement of 0.57 for Italian and 0.49 for Spanish
There is also a low agreement in the Italian set on the labels headlines. Agreement about ‘derogatory’ is low for Italian
“religious minority” and “women”. headlines (0.28) while Spanish ones show almost no
agreement (0.08)
        </p>
        <p>Qualitative Analysis. In summary, while the overall
redangerous sults of the annotation are positive, some categories show
entditeyro-ggartoourpy significant disagreement between annotators. These
disagreeentity - location ments highlight the need to review and refine the annotation
entity - organization guidelines for problematic categories, and to provide more
entity - other detailed instructions. The importance of reassessing the
guideentity - person lines in order to make them clearer and more consistent is
vulnerable entity further underscored by the fact that, for Spanish headlines,
vulnerable group - ethnic minority the annotators agreed on both labels and intervals in only 67
vulnerable group - lgbtq+ community cases, and for Italian headlines, agreement was reached in just
vulnerable group - migrant 88 cases.</p>
        <p>vulnerable group - other Since the annotation task was span-based, we opted not
vulnerable group - religious minority to use a confusion matrix to analyze the disagreement. A
vulnerable group - women confusion matrix is not appropriate for span detection, as it
Table 2 assumes discrete labels applied to predefined items, whereas
The annotators agreement measured through the F-score and bro- our task involved labeling spans of text that varied in length
ken down by label. and context. Instead, we performed a qualitative analysis,
examining specific cases of disagreement to understand their</p>
        <p>Although the overall results are positive, they show signif- nature. This approach allowed us to explore not only how
icant variations that can be quantitatively and qualitatively. annotators difered in labeling spans but also why these
diferInclusion of overlapping spans was handled as follows: if ences emerged, providing a deeper insight into the underlying
one span fully included another, this was considered to be an issues of interpretation and guidelines.
agreement. In cases where the spans only partially overlapped, Looking more closely at the headlines where the
annotameaning there was some shared text but not full inclusion, this tions present inconsistencies, a variety of motivations behind
was treated as a partial agreement. For example, if one anno- discrepancies can be identified.
tator labeled “All women” and another selected only “women”, For instance, in the Italian title “Orrore nella casa
occupata dagli immigrati: donna lanciata giù dal secondo piano”7,
‘donna’ was marked as a vulnerable identity by only one of ddearnoggeartoourys
the annotators, suggesting maybe an erroneous focus on an entities
individual target at a time (‘immigrati’) by the other annotator. vulnerable groups</p>
        <p>
          Another type of disagreement relates to the interpretation
of derogatory mentions. An example can be found in “Un Table 3
terzo dei reati sono commessi da stranieri (e gli africani hanno The distribution of labels in the gold standard corpus.
il record). Tutti i numeri”8, where one annotator identified the
term ‘stranieri’ as a derogatory mention, as well as
representative of a vulnerable identity, while another annotator simply 4.2. Dataset Analysis
stuck to the second label, perhaps highlighting a divergence In this section we provide an analysis of the four label types
in the interpretation of the guidelines. Furthermore, it is inter- that occur in the gold standard version of the VIRC corpus:
esting to observe the disagreement created by the headlines ‘derogatory’, ‘dangerous’, ‘named entities’, ‘vulnerable groups’.
that use generic term ‘stranieri’ (‘foreigners’), which was of- The analysis is twofold: first, we describe the distribution of
ten labelled as ‘vulnerable identity - ethnic minority’ by one these label types, then we present a zero-shot and a
fewannotator and as ‘vulnerable identity - migrant’ by the other. shot experiment aimed at understanding if existing LLMs
This inconsistence between annotators can be identified in (T5[
          <xref ref-type="bibr" rid="ref10">31</xref>
          ] and BART[
          <xref ref-type="bibr" rid="ref11">32</xref>
          ]) are able to recognize these labeled
two headlines: “Ius soli e cittadinanza facile agli stranieri? Il spans in news headlines by comparing their outputs to the
sangue non è acqua”9 and “Un terzo dei reati sono commessi gold standard annotations.
da stranieri (e gli africani hanno il record). Tutti i numeri”2. In
the first case, we can solve the disagreement by looking at the
context: the explicit reference to the issue of granting citizen- Corpus statistics. Table 3 shows the distribution of label
ship suggests that the term ‘foreigners’ is more appropriately types in the corpus. As it can be observed, mentions of
vulnerreferred to the specific category of migrants. On the other able groups are the most present, with 270 occurrences in the
hand, in the second headline, there is no direct reference to Spanish subset and 253 in the Italian subset. This confirms
specifically migration-related issues and thus both interpre- the relevance of annotating vulnerable in the identification
tations in terms of the vulnerable category of belonging are of discriminatory contents, which is tied to their high
recacceptable. ognizability by annotators (Section 4.1). The role on named
        </p>
        <p>Finally, some texts present a slight diference in the anno- entities difers in the two subsets. Annotators labeled them
tation spans of choice, as observed in “Più di 200mila case with agreement 130 times in Spanish headlines and 67 times
popolari agli immigrati”10, where the annotators identified in Italian ones. This might be caused by their compositions.
dangerous speech in the same section of text, but with dif- Since Italian headlines were partly collected from Facebook
ferences in the number of highlighted words (first annotator pages of mainstream newspapers, there was a higher
numlabelled ‘Più di 200mila’; second annotator labelled ‘200mila ber of named entities that were not relevant for the analysis
case popolari’), reflecting variations in the identification of of headlines’ danger. The number of text spans labeled as
relevant content for the analysis of dangerous speech. dangerous is almost equivalent in the two subsets (136 for</p>
        <p>
          In addition to the predefined labels, we also collected free- Spanish, 166 for Italian), showing a good presence of this
text fields as part of the annotation process. These comments label type despite the high disagreement between annotators.
ofered an additional layer of granularity, allowing annota- Finally, it is worth mentioning the almost total absence of text
tors to describe nuances not covered by the fixed categories. spans labeled as ‘derogatory’ with agreement (3 for Spanish,
For example, in the Spanish headline “Dos menas marroquíes 16 for Italian) that suggests the high subjectivity of this
pheapuñalan a dos turistas para robarles en Salou”11, both an- nomenon and also the need of better define its characteristics
notators used the two labels ‘vulnerable identity - ethnic mi- in annotation guidelines.
nority’ and ‘vulnerable identity - other’ to annotate the span
‘menas marroquíes’. Alongside the ‘other’ label, one annotator Corpus analysis with LLMs. We completed our analysis
provided the comment ‘Under 18’, while the other one used of the VIRC corpus through zero-shot experiments aimed at
‘young people’ to describe the vulnerable group. Although exploring the ability of existing LLMs to identify the four
stated diferently, both comments highlight the specific vul- types of labelled spans in messages. We considered the
denerability related to the age of the group, complementing the tection of spans as an extractive Question Answering (QA)
existing labels. As this example shows, the flexibility in the problem. For the task we adopted the T5[
          <xref ref-type="bibr" rid="ref10">31</xref>
          ] and BART[
          <xref ref-type="bibr" rid="ref11">32</xref>
          ]
annotation process provided by free-text fields is useful to LLMs architectures for both languages. For Italian we employ
capture multi-categorical terms and to identify potential new [
          <xref ref-type="bibr" rid="ref12">33</xref>
          ] and [
          <xref ref-type="bibr" rid="ref13">34</xref>
          ] and for Spanish [
          <xref ref-type="bibr" rid="ref14">35</xref>
          ] and [
          <xref ref-type="bibr" rid="ref15">36</xref>
          ] models,
respeccategories that may not have been initially considered in the tively. The translations of the prompts used are the following
predefined labels. (see Appendix A for the original ones):
7“Atrocity in a house occupied by migrants: woman thrown from second
lfoor”.
8“One third of all crimes are committed by foreigners (and Africans hold
the record). All the numbers”.
9“Ius soli and easy citizenships for foreigners? Blood is not water”.
10“More than 200,000 public housing units for immigrants”.
11“Two Moroccan unaccompanied migrant minors stab two tourists to rob
them in Salou”.
• What part of the text is dangerous (criminalizes,
ridicules, incites violence, ...) against vulnerable
identities (women, migrants, ethnic minorities, ...)?
• What part of the text is derogatory (negative or
pejorative comments about the victim without explicit
hate speech, but the mention itself is discriminatory or
ofensive, and often uses a tone intended to denigrate
or discredit the victim)?
• What named entity is mentioned in the sentence?
dangerous
derogatory
        </p>
        <p>entity
vulnerable identity</p>
        <p>We designed two approaches for zero-shot experiments,
restictive and non-restrictive. On the one hand, for the
nonrestictive zero-shot experiments, for each sentence in the
dataset, we queried the model with the prompt of each label
and extracted the three most confident results. Then, we
ifltered out those responses below the %0.02 confidence of
the model to limit the noise. Finally, all these annotations go
through a majority vote (identical to the one used to build the
aggregate dataset) to normalize the model response.</p>
        <p>On the other hand, for the restictive zero-shot
experiments, we queried the model with the prompts for each
annotation present in the aggregated dataset. And, as there are
sentences that have two equal labels in diferent spans, we
request five diferent annotations from the model, ordered
from most confident to least confident. If an annotation was
already included, the next annotation is taken in order to
avoid duplicating annotations in the model.</p>
        <p>Table 4 presents the F-scores for each label type,
experiment, and model. In general, T5 and BART tend to perform
more efectively in Spanish compared to Italian. The models
face noticeable challenges in identifying the labels
‘dangerous’, ‘derogatory’, and ‘entity’. Nevertheless, when they are
aware that the label exists within the sentence (restictive),
they manage to recognize it with fairly good agreement.
During annotation, the label ‘derogatory’ proves most challenging
to identify. In the non-restrictive scenario, it scarcely receives
any agreement, yet in the restictive scenario, it achieves a
reasonable level, particularly in Spanish. This indicates that
the model struggles to discern its presence initially but, once
acknowledged, can recognise the expression.</p>
        <p>The restictive method enhances performance over the
nonrestictive method for all labels except ‘vulnerable identity.’
This shows that models generally have a better comprehension
and identification of vulnerable identities in sentences without
restrictions compared to when they are restricted to specific
mentions. It should also be noted that, in the Spanish context,
T5 is more efective than BART in identifying ‘vulnerable
identity’ labels for both approaches, while BART performs
better in Italian.</p>
        <p>These results show that a NER-based annotation scheme
for HS detection is dificult to annotated but also to be
automatically detected. Larger resources are necessary to develop
models that are able to detect the complex semantics of HS.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. Conclusions and Future Work</title>
      <sec id="sec-2-1">
        <title>The Vulnerable Identities Recognition Corpus (VIRC), created</title>
        <p>in this work, reveals the challenge of identifying vulnerable
identities due to the rapid evolution of language on social
media. Our experiments indicate that large language models
(LLMs) struggle significantly with this task.</p>
      </sec>
      <sec id="sec-2-2">
        <title>VIRC provides a detailed and structured resource that en</title>
        <p>hances understanding of the extensive use of hate speech in
Italian and Spanish news headlines. The corpus is particularly
valuable as it includes more annotation dimensions compared
to related studies in other languages, such as vulnerable
identities, dangerous discourse, derogatory expressions, and entities.
This diferentiation between vulnerable identities and
entities, as well as between dangerous and derogatory elements,
enables the development of sophisticated detection tools that
can facilitate large-scale actions to mitigate the impact of
hate speech (e.g., moderation of messages and generation
of counter-narratives that reduce the damage to the mental
health of victims).</p>
        <p>Future work will focus on expanding this resource by
doubling the size of annotations for both languages and including
non-racism-related phrases to ensure the resource is
comprehensive. Additionally, we plan to refine the annotation
guidelines to avoid low agreement on the derogatory label,
enhancing the overall reliability and utility of the corpus. These
eforts will further improve the efectiveness of hate speech
detection and contribute to the development of policies and
tools for a safer online environment.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <sec id="sec-3-1">
        <title>This work is supported by the Predoctoral Grant (PIPF</title>
        <p>2022/COM-25947) of the Consejería de Educación, Ciencia
y Universidades de la Comunidad de Madrid, Spain. Arianna
Longo’s work has been supported by aequa-tech. The
authors gratefully acknowledge the Universidad Politécnica de
Madrid (www.upm.es) for providing computing resources on
the IPTC-AI innovation Space AI Supercomputing Cluster.
North American Chapter of the Association for Com- [16] M. Madeddu, S. Frenda, M. Lai, V. Patti, V. Basile,
Disputational Linguistics: Human Language Technologies, aggreghate it corpus: A disaggregated italian dataset of
2021, pp. 2289–2303. hate speech, in: F. Boschetti, G. E. Lebani, B. Magnini,
[5] P. Chiril, E. W. Pamungkas, F. Benamara, V. Moriceau, N. Novielli (Eds.), Proceedings of the Ninth Italian
ConV. Patti, Emotionally informed hate speech detection: A ference on Computational Linguistics (CLiC-it 2023).,
multi-target perspective, Cogn. Comput. 14 (2022) 322– volume 3596, 2023.
352. URL: https://doi.org/10.1007/s12559-021-09862-5. [17] J. Pavlopoulos, J. Sorensen, L. Laugier, I.
Androutsopoudoi:10.1007/S12559-021-09862-5. los, SemEval-2021 task 5: Toxic spans detection, in:
[6] M. Sap, D. Card, S. Gabriel, Y. Choi, N. A. Smith, The A. Palmer, N. Schneider, N. Schluter, G. Emerson, A.
Herrisk of racial bias in hate speech detection, in: Proceed- belot, X. Zhu (Eds.), Proceedings of the 15th
Interings of the 57th annual meeting of the association for national Workshop on Semantic Evaluation
(SemEvalcomputational linguistics, 2019, pp. 1668–1678. 2021), Association for Computational Linguistics,
On[7] P. Sachdeva, R. Barreto, G. Bacon, A. Sahn, C. Von Va- line, 2021, pp. 59–69. URL: https://aclanthology.org/2021.
cano, C. Kennedy, The measuring hate speech corpus: semeval-1.6. doi:10.18653/v1/2021.semeval-1.6.
Leveraging rasch measurement theory for data perspec- [18] P. G. Hoang, C. D. Luu, K. Q. Tran, K. V. Nguyen,
tivism, in: Proceedings of the 1st Workshop on Perspec- N. L.-T. Nguyen, ViHOS: Hate speech spans
detectivist Approaches to NLP@ LREC2022, 2022, pp. 83–94. tion for Vietnamese, in: A. Vlachos, I. Augenstein
[8] B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, (Eds.), Proceedings of the 17th Conference of the
EuA. Mukherjee, Hatexplain: A benchmark dataset for ropean Chapter of the Association for Computational
explainable hate speech detection, in: Proceedings of Linguistics, Association for Computational Linguistics,
the AAAI conference on artificial intelligence, volume 35, Dubrovnik, Croatia, 2023, pp. 652–669. URL: https:
2021, pp. 14867–14875. //aclanthology.org/2023.eacl-main.47. doi:10.18653/
[9] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopou- v1/2023.eacl-main.47.</p>
        <p>los, Semeval-2021 task 5: Toxic spans detection, in: [19] Y. Jeong, J. Oh, J. Lee, J. Ahn, J. Moon, S. Park,
Proceedings of the 15th international workshop on se- A. Oh, KOLD: Korean ofensive language dataset,
mantic evaluation (SemEval-2021), 2021, pp. 59–69. in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.),
Pro[10] K. Büyükdemirci, I. E. Kucukkaya, E. Ölmez, C. Toraman, ceedings of the 2022 Conference on Empirical
MethJL-Hate: An Annotated Dataset for Joint Learning of ods in Natural Language Processing, Association
Hate Speech and Target Detection, in: N. Calzolari, M.-Y. for Computational Linguistics, Abu Dhabi, United
Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceed- Arab Emirates, 2022, pp. 10818–10833. URL: https://
ings of the 2024 Joint International Conference on Com- aclanthology.org/2022.emnlp-main.744. doi:10.18653/
putational Linguistics, Language Resources and Eval- v1/2022.emnlp-main.744.
uation (LREC-COLING 2024), ELRA and ICCL, Torino, [20] N. Ousidhoum, Z. Lin, H. Zhang, Y. Song, D.-Y. Yeung,
Italia, 2024, pp. 9543–9553. Multilingual and multi-aspect hate speech analysis, in:
[11] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the
V. Patti, Resources and benchmark corpora 2019 Conference on Empirical Methods in Natural
Lanfor hate speech detection: a systematic review, guage Processing and the 9th International Joint
ConferLang. Resour. Evaluation 55 (2021) 477–523. ence on Natural Language Processing (EMNLP-IJCNLP),
URL: https://doi.org/10.1007/s10579-020-09502-8. Association for Computational Linguistics, Hong Kong,
doi:10.1007/S10579-020-09502-8. China, 2019, pp. 4675–4684. URL: https://aclanthology.
[12] E. Leonardelli, S. Menini, A. P. Aprosio, M. Guerini, org/D19-1474. doi:10.18653/v1/D19-1474.</p>
        <p>S. Tonelli, Agreeing to disagree: Annotating ofensive [21] B. Jehangir, S. Radhakrishnan, R. Agarwal, A survey on
language datasets with annotators’ disagreement, in: named entity recognition - datasets, tools, and
methodProceedings of the 2021 Conference on Empirical Meth- ologies, Natural Language Processing Journal 3 (2023).
ods in Natural Language Processing, 2021, pp. 10528– [22] E. Hovy, M. Marcus, M. Palmer, L. Ramshaw,
10539. R. Weischedel, Ontonotes: the 90% solution, in:
Pro[13] H. Kirk, W. Yin, B. Vidgen, P. Röttger, Semeval-2023 task ceedings of the human language technology conference
10: Explainable detection of online sexism, in: Proceed- of the NAACL, Companion Volume: Short Papers, 2006,
ings of the 17th International Workshop on Semantic pp. 57–60.</p>
        <p>Evaluation (SemEval-2023), 2023, pp. 2193–2210. [23] N. Collier, T. Ohta, Y. Tsuruoka, Y. Tateisi, J.-D. Kim,
[14] P. Piot, P. Martín-Rodilla, J. Parapar, Metahate: A dataset Introduction to the bio-entity recognition task at jnlpba,
for unifying eforts on hate speech detection, Proceed- in: N. Collier, P. Ruch, A. Nazarenko (Eds.), Proceedings
ings of the International AAAI Conference on Web of the International Joint Workshop on Natural
Lanand Social Media 18 (2024) 2025–2039. URL: https://ojs. guage Processing in Biomedicine and its Applications
aaai.org/index.php/ICWSM/article/view/31445. doi:10. (NLPBA/BioNLP), COLING, 2004, pp. 73–78.
1609/icwsm.v18i1.31445. [24] M. ElSherief, V. Kulkarni, D. Nguyen, W. Y. Wang,
[15] D. Nozza, F. Bianchi, G. Attanasio, HATE-ITA: Hate E. Belding, Hate lingo: A target-based linguistic analysis
speech detection in Italian social media text, in: of hate speech in social media, in: Proceedings of the
K. Narang, A. Mostafazadeh Davani, L. Mathias, B. Vid- international AAAI conference on web and social media,
gen, Z. Talat (Eds.), Proceedings of the Sixth Work- volume 12, 2018.
shop on Online Abuse and Harms (WOAH), Associa- [25] F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, L. Plaza,
tion for Computational Linguistics, Seattle, Washington Automatic classification of sexism in social networks:
(Hybrid), 2022, pp. 252–260. doi:10.18653/v1/2022. An empirical study on twitter data, IEEE Access 8 (2020)
woah-1.24. 219563–219576.
• Dangerous: “¿Qué parte del texto es peligroso
(criminaliza, ridiculiza, incita a la violencia, ...) contra
identidades vulnerables (mujeres, migrantes, minorías
étnicas, ...)?”
• Derogatory: “¿Qué parte del texto es derogativo
(comentarios negativos o despectivos sobre la víctima
sin incitación explícita al odio, pero la mención en
sí es discriminatoria u ofensiva, y a menudo emplea
un tono destinado a menospreciar o desacreditar a la
víctima)?”
• Entity: “¿Qué entidad nombrada se menciona en la
frase?”
• Vulnerable Identity: “¿Qué identidad vulnerable al</p>
        <p>discurso de odio se menciona en la frase?”</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warmsley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Macy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <article-title>Automated hate speech detection and the problem of ofensive language</article-title>
          ,
          <source>in: Proceedings of the international AAAI conference on web and social media</source>
          , volume
          <volume>11</volume>
          ,
          <year>2017</year>
          , pp.
          <fpage>512</fpage>
          -
          <lpage>515</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Waseem</surname>
          </string-name>
          ,
          <article-title>Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter</article-title>
          ,
          <source>in: Proceedings of the first workshop on NLP and computational social science</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>ElSherief</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ziems</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Muchlinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Anupindi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seybolt</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Choudhury</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Latent hatred: A benchmark for understanding implicit hate speech</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>345</fpage>
          -
          <lpage>363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Vidgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Margetts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rossini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tromble</surname>
          </string-name>
          ,
          <article-title>Introducing cad: the contextual abuse dataset</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference of the</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Poletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <article-title>An italian twitter corpus of hate speech against immigrants</article-title>
          ,
          <source>in: Proceedings of the eleventh international conference on language resources and evaluation (LREC</source>
          <year>2018</year>
          ),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [27]
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Guillén-Pacho, oeg-upm/telegram-dataset-</article-title>
          <source>builder: version 1.0.0</source>
          ,
          <year>2024</year>
          . URL: https://doi.org/10.5281/zenodo. 12773159. doi:
          <volume>10</volume>
          .5281/zenodo.12773159.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S.</given-names>
            <surname>Benesch</surname>
          </string-name>
          , Dangerous speech,
          <fpage>86272</fpage>
          <lpage>12</lpage>
          (
          <year>2023</year>
          )
          <fpage>185</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Campagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <article-title>Toward a perspectivist turn in ground truthing for predictive computing</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>37</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>6860</fpage>
          -
          <lpage>6868</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brants</surname>
          </string-name>
          ,
          <article-title>Inter-annotator agreement for a german newspaper corpus</article-title>
          ., in: LREC, Citeseer,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>Journal of machine learning research 21</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, Bart:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation</article-title>
          , and comprehension,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1910</year>
          .13461. arXiv:
          <year>1910</year>
          .13461.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Nissim, IT5: Text-to-text pretraining for Italian language understanding and generation</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sakti</surname>
          </string-name>
          , N. Xue (Eds.),
          <source>Proceedings of the 2024 Joint International Conference on Computational Linguistics</source>
          ,
          <article-title>Language Resources and Evaluation (LREC-COLING 2024), ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>9422</fpage>
          -
          <lpage>9433</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .lrec-main.
          <volume>823</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [34]
          <string-name>
            <surname>M. La Quatra</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Cagliero</surname>
          </string-name>
          ,
          <article-title>Bart-it: An eficient sequenceto-sequence model for italian text summarization</article-title>
          ,
          <source>Future Internet</source>
          <volume>15</volume>
          (
          <year>2023</year>
          ). URL: https://www.mdpi.com/ 1999-5903/15/1/15. doi:
          <volume>10</volume>
          .3390/fi15010015.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>V.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Trusca</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Tufiño</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-F. Moens</surname>
          </string-name>
          ,
          <article-title>Sequence-to-sequence spanish pre-trained language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>11259</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>V.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Trusca</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Tufiño</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-F. Moens</surname>
          </string-name>
          ,
          <article-title>Sequence-to-sequence spanish pre-trained language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>11259</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>