<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>B. Fiumanò);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Unveiling Stereotypes: Combining Knowledge Graphs and LLMs for Implied Stereotype Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Cuccarini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lia Draetta</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Beatrice Fiumanò</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Bistarelli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rossana Damiano</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentina Presutti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bologna, Department of Modern Languages</institution>
          ,
          <addr-line>Literatures, and Cultures</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Naples Federico II, Department of Biology</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Perugia, Department of Mathematics and Computer Science</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Turin, Department of Computer science</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In recent years, hate speech detection models have achieved significantly improved results, largely due to advances in Large Language Models (LLMs). As a result, research has increasingly focused on more nuanced phenomena, such as the detection of implicit hate and stereotypes. Although the challenge of identifying implicit language has been largely explored, it remains an open issue for state-of-the-art models due to their limited ability to grasp contextual and culturally specific knowledge. In this work, we address the task of identifying stereotypes implicitly encoded in hate speech messages, and propose a method for generating them by leveraging the combined potential of LLMs and Knowledge Graphs (KGs). As a first step, we designed an ontology specifically tailored to represent implicit hate speech. We then populated the ontology using a subset of an Italian-language hate speech dataset, in which targets and implied stereotype statements were manually annotated. The remaining portion of the dataset was reserved as a test set to evaluate the impact of knowledge graph-derived information on LLM-generated stereotypes. For each input sentence, relevant knowledge was extracted from the ontology using SPARQL queries and used to enrich the prompt provided to various LLMs. We compared the results of the knowledge-enhanced approach against those of a baseline few-shot learning approach. Evaluation was conducted using BLEU, BERTScore and ROUGE metrics. Additionally, given the high subjectivity of the task, we performed a manual qualitative analysis on a subset of the model outputs to assess both the quality of the evaluation and the soundness of the generated stereotypes. Warning: This paper contains examples of explicitly ofensive content.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hate speech detection</kwd>
        <kwd>Stereotype</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Knowledge Graph</kwd>
        <kwd>Retrieval Augmented Generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        resources. We then populate the ontology using a subset Several approaches have been explored to identify
of an Italian dataset on implicit stereotypes, which com- subtle hate speech, including transformer-based
modprises manual annotations on HS targets, hateful chunks els [
        <xref ref-type="bibr" rid="ref16">16, 17, 18</xref>
        ], neural networks [19] or leveraging
seand stereotypes. Finally, starting from the target entities mantic information embedded in texts [19, 20]. Other
in each sentence, we extract relevant knowledge from approaches tried to tackle this task by incorporating the
the KG and integrate it into the prompt of three diferent potentiality of external sources of knowledge, such as
LLMs. We task the models with generating the implicit Knowledge Graphs [21].
stereotype that underlies each hate speech message. We In this context, few studies have directly addressed the
compare these stereotypes with those generated by a challenge of unveiling or explaining subtle hate speech.
baseline model using a non-KG-enhanced prompt. The Some researchers [
        <xref ref-type="bibr" rid="ref16">16, 22</xref>
        ] have focused on the role of
main contributions of this work are the following: social stereotypes, aiming to uncover their implicit
meanings and to develop benchmarks for explanation-oriented
• StereoGraph: a Knowledge Graph grounded in a tasks. Other works have specifically addressed the task
dedicated ontology designed to represent implicit of implicit hate speech explanation. Kim and colleagues
hate expressed in social media posts. [23] present a pipeline that guides transformer models’
• A graph-based methodology to generate explicit predictive decisions through the identification of key
stereotypes encoded in hateful messages. rationales. More recent studies have leveraged the
generative capabilities of LLMs. For example, Huang and
• A fine-grained manual assessment and error anal- colleagues [
        <xref ref-type="bibr" rid="ref17">24</xref>
        ] propose a Chain-of-Explanation
promptysis to evaluate the suitability of the evaluation ing method to generate stereotypes. Similarly, Yang et
metrics used to compare both the baseline and al. [
        <xref ref-type="bibr" rid="ref18">25</xref>
        ] introduce step-by-step approach that combines
KG-enhanced outputs against the gold standard. LLM-based chain-of-thought prompting with a
humanThis was particularly relevant given the highly annotated benchmark.
subjective and culturally specific nature of task. While several studies have focused on creating
benchmarks and providing insights into implicit hate speech in
      </p>
      <p>
        In the following Section (2) we present relevant related English, resources for the Italian language remain limited,
works on detection and analysis of subtle hate speech with only a few datasets addressing the hate speech
phe(2.1), together with graph-based approaches (2.2) to the nomenon in depth. Notable studies [
        <xref ref-type="bibr" rid="ref19 ref20 ref21 ref22">26, 27, 28, 29</xref>
        ] have
same tasks. Section 3 describes the adopted methodol- provided valuable annotated resources that distinguish
ogy, the dataset we used for constructing the KG, and between implicit and explicit hate speech and
stereothe ontology design process. The experimental setup is types, with the goal of detecting the more subtle and
detailed in Section 4, while the results, including quanti- less recognizable nuances of hate. Nevertheless, research
tative evaluation, human assessment, and error analysis on stereotype explication remains limited. For example,
are discussed in Section 5. Finally, the conclusions and Muti and colleagues [
        <xref ref-type="bibr" rid="ref23">30</xref>
        ] investigate the ability of LLMs to
limitations are presented in Sections 6 and 7, respectively. accurately identify implicit messages in misogynistic
conAll data and code for reproducibility can be found on the texts, also exploring how prompts can reconstruct subtle
following GitHub page1. meanings to make the messages explicit. However, to our
knowledge, no previous work about embedded
stereo2. Related Works types has been carried out in the Italian cultural context.
We suggest that the generation of implicit stereotypes
2.1. Subtle Hate Speech Explanation can support the development of more comprehensive
benchmarks, improving models’ performance in
detectUnlike explicit hate speech, the interpretation of implicit ing subtle forms of hate speech.
hate speech often requires inference and integration of
background knowledge [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], particularly since hate 2.2. Knowledge-Enhanced Approaches
expressions are usually socio-culturally dependent and
rely on contextual knowledge [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. These factors con- Knowledge-enhanced and Retrieval-Augmented
Genertribute to the challenge of detecting implicit hate speech ation (RAG) methods [
        <xref ref-type="bibr" rid="ref24">31</xref>
        ] have emerged as a powerful
and highlight the ongoing need for more sophisticated paradigm to address key limitations of LLMs. More
redetection systems, as current state-of-the-art models still cently, this line of work has incorporated structured,
struggle to eficiently handle this task [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Some studies graph-based knowledge, particularly KGs [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], to enhance
have attempted to identify subtle hate speech by leverag- retrieval and reasoning capabilities.
ing diferent approaches In the domain of hate speech research,
knowledgeenhanced approaches have provided solutions to address
the challenges posed by implicit hate speech across
vari
      </p>
      <sec id="sec-1-1">
        <title>1https://github.com/marcocuccarini/</title>
        <p>StereoGraphUnveilingStereotypes
ous tasks.</p>
        <p>
          Zhao et al. [21] propose MetaTox, a RAG-based
approach that integrates a meta-toxic knowledge graph In this work, we aim to perform the task of implicit
with LLMs for hate speech detection. First, LLMs are used stereotype generation using LLMs, comparing a
baseto construct the KG by combining data from three En- line approach with a KG-enhanced alternative. Given a
glish datasets. Then Qwen and LLaMA3.1 are prompted sentence and its associated hate speech target, the model
to classify tweets as toxic or non-toxic. The authors is prompted to generate the subtle stereotype that
condemonstrate that the MetaTox method enables to re- tributes to the message’s hateful nature. In the
followduce false positives, leading to better generalization and ing sections, we briefly present the proposed pipeline
reduced hallucinations from LLMs. Lin [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] combines (Section 3.1), describe the dataset used (Section 3.2), and
Entity Linking techniques with summarized Wikipedia outline the construction of the ontology that serves as
descriptions to improve performances in implicit hate the foundation for the knowledge graph (Section 3.3).
speech detection and classification task. Although it does
not follow a standard RAG approach, the paper proposes 3.1. Pipeline Overview
feeding a Multi-Layer Perceptron with embeddings of
concatenated tweet and external knowledge representa- Our methodology is designed to make subtle stereotypes
tions, training it to perform a multi-label classification conveyed in hateful content explicit. This is a
particof implicit hate speech types. This approach demon- ularly challenging task, as it requires nuanced
contexstrated significant improvements when entity triggers tual understanding and awareness of culturally specific
were mentioned in text, although limitations remained stereotypes associated with the target. By integrating
for the classification of tweets requiring pragmatic un- external knowledge, we investigate whether language
derstanding. models can efectively contextualize such messages and
        </p>
        <p>
          In the context of implicit hate speech, Yadav et al. generate more accurate and transparent stereotypes.
[
          <xref ref-type="bibr" rid="ref25">32</xref>
          ] introduce Tox-BART, a BART-based architecture The proposed approach is illustrated in Figure 1. Given
enhanced with toxicity attributes, i.e. structured meta- an input sentence and its associated HS target, retrieved
information on tweets, encompassing target groups, in- from the annotated dataset, we use the target to query
sult types, and hate intensity levels. This approach ad- the KG via a SPARQL query, retrieving all triples in which
dresses limitations derived from poor quality of retrieved target is linked to its stereotypes. We then adopt a
fewKG tuples, which can hinder KG-augmented approaches. shot learning approach, integrating into the prompt the
Using diferent evaluation metrics, they demonstrate external knowledge retrieved from the KG in RDF format.
that infusion of toxicity attributes achieves performance The evaluation phase consists of a comparison between
comparable to simple KG-infusion. In the Italian con- the results (i.e. generated stereotypes) obtained using
text, Di Bonaventura and colleagues [33] implemented a the knowledge-enhanced and the baseline approach. A
knowledge-enhanced approach for detecting homotrans- hybrid evaluation was performed comparing automatic
phobic hate speech. The system leverages the O-Dang metrics with human assessment.
knowledge graph, which contains information about
named entities in the Italian HT context. The approach 3.2. Dataset
showed promising results, outperforming baseline scores.
        </p>
        <p>Compared to the reviewed literature, our approach
represents a step forward, particularly in the area of
Italian language hate speech detection. While most prior
work has focused on the detection of implicit hate speech,
our study shifts the emphasis toward the explanatory
capabilities of LLMs, specifically investigating how these
can be enhanced through the integration of structured
knowledge. Furthermore, by focusing on stereotypes and
adopting and hybrid evaluation approach (automatic and
human-based), our work also provides valuable insights
into the ability of LLMs to uncover sound and coherent
stereotypes from implicit language, as well as into the
reliability of the evaluation metrics used.</p>
      </sec>
      <sec id="sec-1-2">
        <title>To address the task of subtle stereotype generation, we</title>
        <p>leveraged the Open Stereotype Corpus2 [34] containing
3,578 Italian tweets collected between October 2018 and
June 2019 from the Contro l’Odio dataset [35]. The dataset
was annotated by five diferent annotators. For each
message, the annotators identified the specific chunk (trigger)
containing the hate content, the implicit stereotype (if
present) and the stereotype cluster (a more general class
aiming at creating a stereotype categorization). In the
original dataset the authors automatically distinguished
between agent and patient parsing each rationale, we
chose to simplify this distinction aggregating the two
columns under a unique class named "target". An
example of the dataset structure along with a subset of
annotations is presented in Figure 3. From the dataset</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Methodology</title>
      <sec id="sec-2-1">
        <title>3.3. Ontology Design</title>
        <p>two subclasses: Group and Person. These subclasses
represent diferent types of targets and are connected
to specific situations via the hasTarget relation, which
links a message to its corresponding target. The class
Type is designed to provide a taxonomy for both
targets (e.g., racial target, religious target) and stereotypes
(e.g., ‘are dangerous’, ‘are unclean’). The ontology was
subsequently populated using SPARQLAnything4 [37]
leveraging the datasets described in the previous section
as data source. After this process we obtained a
knowledge graph containing triples as to the followings:
ster:_803176483174780929
rdf:type dul:Situation ;
rdfs:label "Forza ragazzi, 180mila clandestini all
anno, rom da tutte le parti, illegalita totale,
Coop rosse e bianche che lucrano. ora sapete
cosa votare" ;
dul:hasTarget ster:immigrati ;
ster:hasStereoManifestation ster:180mila-clandestini</p>
        <p>-allanno ;
ster:hasStereotype ster:invadendo-italia .</p>
        <p>For the ontology design process we adopted a fully
manual approach to ensure the quality of the resulting re- ster:invadendo-italia
source through several means: aligning it with foun- rdf:type ster:Stereotype ;
dational ontologies and related semantic resources, en- srtdefrs::hlaasbTeylpe ""SionnvoaIdnevnadsooirtia"l.ia" ;
suring the conceptual correctness of the defined classes,
and minimizing the potential introduction of bias. The ster:immigrati rdf:type foaf:Group .
ontology includes four top-level classes: Situation,
Stereotype, Agent, and Type. The class Situation
is aligned with the homonymous class from the
foundational ontology DOLCE [36]. Its purpose is to link a
given target and its associated stereotype to a specific
occurrence, such as a Twitter post, in order to avoid the
introduction of bias or overly generic statements about
stereotypes. The class Stereotype captures the implicit
assertions conveyed in a given sentence. The class Agent,
aligned with the FOAF (Friend of a Friend) ontology3, has</p>
        <sec id="sec-2-1-1">
          <title>This means that a specific post, identified by the</title>
          <p>ID ster:_803176483174780929, is an instance of
the class Situation. It has a specific content,
expressed trough the relation rdfs:label, and it is
associated to a specific stereotype chunk trough the relation
ster:hasStereoManifestation. The tweet is then
associated with a particular target, ster:immigrati,
as well as a stereotype, ster:invadendoitalia. The
stereotype is then defined as an instance of the class</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>3http://xmlns.com/foaf/spec/</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>4https://sparql-anything.cc/</title>
          <p>SELECT ?s ?stereotype</p>
          <p>WHERE {{
?s a dul:Situation ;
dul:hasTarget &lt;{target_uri}&gt; ;
ster:hasStereotype ?stereotype .</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>4.3. Evaluation</title>
        <p>ster:Stereotype and linked to a specific cluster their ability to understand the subtle stereotype
embedSonoInvasori through the relation ster:hasType. ded in the message. We selected these three distinct
LLMs because they are state-of-the-art, multilingual,
open-source models with comparable architecture and
4. Experiment Setting medium scale size.</p>
        <p>The task is conducted in the Italian language. For the
In the next sections, the experimental setting is presented. baseline, we used a few-shot learning approach and for
The following approach consists of three main steps: the prompt construction we adopt a vanilla structure
Knowledge retrieval, where relevant information is re- setup; the prompt is written in Italian. Additionally, it
trieved from the KG (Section 4.1); Prompting, where three includes instructions on how to structure the output
senmodels are prompted using both a few-shot baseline and tence, explicitly asking the models to generate output in
a few-shot KG-enriched approach 4.2; and Evaluation
(4.3), where the results are assessed using both automatic tkhneofworlemdagte[-esnuhbajneccetd]ap[parroea/chdoin]co[rpproerdaitecsaatep]ro. mThpet
metrics and manual evaluation. containing information about the target entity from the
KG. For each target, we associate the relevant retrieved
4.1. Knowledge Extraction stereotypes. The full prompt is presented in Appendix
For every sentence of the test set we extracted relevant A. The output produced by the LLM was preprocessed
knowledge from the Knowledge graph leveraging the before the evaluation, removing generic elements
profollowing SPARQL query: vided by the LLM, such as the usual formulaic closing
statements (e.g., asking if it can assist further).
models</p>
        <p>and
explore</p>
        <sec id="sec-2-2-1">
          <title>For the evaluation phase, we leverage BLEU [41],</title>
          <p>BERTScore [42] and ROUGE [43]. BLEU measures how
}} many n-grams in the generated text appear in the
reference text, focusing on precision and penalizing very short
outputs. ROUGE focuses on recall, checking how much</p>
          <p>Using this query we were able to retrieve all the stereo- of the n-grams or sequences of the reference text
aptype associated with a certain tweet that has the specified pear in the generated text, often used for summarization.
target. For example using "immigrati" as target we are BERTScore compares the generated and reference texts
able to extract triples like the followings, in which the using deep contextual embeddings from BERT, capturing
ifrst element is the ID, the second the gold stereotype semantic similarity beyond exact word matches.
and the third hateful span: Since recent studies [44, 45] have highlighted the
limister:_id sono-irregolari clandestini-musulmani tations of automated evaluation methods and some
scholster:_id non-rispettano-legge nn-amano-subire-le ars [46, 47] are beginning to emphasize the potential
-nostre-leggi-sti-migranti of hybrid approaches and aware of the fact that
stereoster:_id spacciano immigrati-spacciatori-e- types are characterized by high subjectivity and
culturestupratori specific variation, we conducted also a human-based
evalSince our goal is to prove that this integrated informa- uation to better understand the reliability of the metrics
tion could improve implicit stereotype generation, we used in such contexts. To this end, we designed a twofold
rely on the gold-standard targets provided in the dataset. manual assessment.</p>
          <p>This avoids the noise introduced by potential errors in First, aiming at replicating the automatic evaluation,
target prediction. One limitation encountered is the over- we compared the generated (baseline and KG-enhanced)
representation of certain targets, which appear with a outputs with the gold standard, assessing semantic and
high number of samples. To reduce the impact of the “lost syntactic similarity on a scale from 1 to 5. The
annotain the middle” phenomenon [38] and to balance the quan- tors were presented with the following questions: (Q1)
tity of information, we randomly sample 20 stereotypes To what extent, on a scale from 1 to 5, is the baseline
per target. output semantically and syntactically comparable to the
gold standard? (Q2) To what extent, on a scale from 1 to
5, is the KG-enhanced output semantically and
syntacti4.2. Prompt Construction cally comparable to the gold standard? Then, aware that
to stereotypes are not universally shared, and that some are
more prevalent or culturally dependent than others [48],
we aimed to evaluate the extent to which the generated</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>We decided test three diferent</title>
          <p>LLaMA-3.1-8B, gemma-2-9b-it [39]
Mistral-7B-Instruct-v0.2 [40] to</p>
          <p>Gold
I rom sono trufatori</p>
          <p>Baseline
I rom sono falsi
invalidi per commuovere.</p>
          <p>KG
rom_sinti sono ladri
i migranti non sono
profughi
gli avvocati pagano i
migranti.</p>
          <p>gli immigrati sono
criminali
i migranti sono
criminali
gli immigrati sono
violenti
gli immigrati
delinquono
stereotype might be culturally recognizable from our own
perspective as white Italian researchers aged between 25
and 30. The evaluation of generated stereotypes was
conducted only on content produced by the baseline model,
as the KG-enhanced method provides the model with
additional contextually relevant information. Annotators
were asked to assess whether, in their own perspective,
the generated stereotype reflects commonly held beliefs
or societal biases (Q3). For example, the stereotype "gli
avvocati pagano i migranti" ("Lawyers pay the migrants")
was judged unrealistic by all three annotators. In
contrast, "gli immigrati delinquono" ("Immigrants commit
crimes") received two positive evaluations out of three,
suggesting that this stereotype may reflect a commonly
held bias in the Italian context. The human evaluation
was conducted by three annotators on a subset of 50
sentences. An example of the conducted manual evaluation
is presented in Table 1.</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>LLaMA3.1, gemma2, and Mistral7B, across BLEU, Rouge,</title>
          <p>and BERT-based scores. Gemma2 benefits the most, with
its BLEU score more than doubling and a big gain in
Rouge. LLaMA3.1 and Mistral7B also show consistent,
though smaller, improvements. The BERT-based scores
5. Results indicate better semantic relevance with KG. Overall, the
KG helps the models produce more accurate and
meanIn the next sections the experiment results are provided. ingful results.</p>
          <p>While automated methods are eficient, they often lack
precision. In contrast, human evaluation ofers greater 5.2. Human-based Analysis
contextual understanding but is time-consuming and
costly. To balance accuracy and eficiency, we applied
an automatic method to the full dataset and selected a
smaller subset for manual evaluation.</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>The annotators were provided with answers from both</title>
          <p>the baseline and the KG-enhanced method. Each answer
was evaluated on the basis of its similarity to the gold
standard, the normalized results are presented in Table
3. Furthermore, for the baseline generation only,
annota5.1. Computer-Based Analysis tors were asked to assess whether the stereotypes reflect
In the Table 2 are presented the result of the genera- commonly held beliefs or communal biases. LLaMA 3.1
tion task comparing the three models across the two the highest average scores for both baseline and
KGapproaches, i.e. baseline versus knowledge graph en- enhanced outputs, demonstrating strong overall
perforhanced. The Results shows that adding the information mances. Gemma 2 shows lower results across all metrics,
from KG improves the performance of all three models, while Mistral7B performs the lowest on both baseline
and KG averages. Human evaluation further confirms 5.4. Error Analysis
that incorporating knowledge from the graph improves
model performance across all models and annotators. In To gain deeper insight into the functionality and
limitaaddition, the variation in annotators’ scores highlights tions of our approach, and to identify areas for potential
the subjective nature of the task and the challenge of future improvements, we conducted an error analysis
achieving consistent judgments. Annotator 2, for exam- on the tweets where the KG-enhanced method showed
ple, generally rates outputs higher, particularly for KG- the lowest performance. Overall we observed that errors
enhanced responses, while Annotator 3 is more critical. frequently occurred when the input contained named
Human-evaluated results confirm the trends observed in entities or subjects that difered from the primary target.
computer-based scores (for all the models and the annota- For example, in the tweet:
tors the score are higher in the case of the KG-enhanced Finanzia l’invasione degli immigrati: ecco
approach), demonstrating how our method improves the la prova. La vergogna di George Soros,
model’s ability to explicitly address implicit hate speech "Epnagdlirsohn:"eH" edf’uItnadlisat.he immigrant invasion: here is
and suggesting that automatic measures can be informa- the proof. The shame of George Soros, the ’master’
tive for this type of task. of Italy."</p>
          <p>Regarding the assessment of the generated
stereotype the human evaluation reveals divergence tendency: the KG-enhanced output was: "George Soros finanzia
LLaMA shows the average highest scores across the three l’invasione degli immigrati" (English: "George Soros
annotators, and the value appears to be high especially funds the immigrant invasion"), while not conceptually
according to Annotators 1 and 3. Gemma2 shows a simi- incorrect, this difers from the gold standard:"i migranti
lar tendency, especially regarding the annotators 2 and vogliono invadere l’Italia" (English : "The migrants want
3. Finally, Mistral tends to have an overall lower score to invade Italy."). A similar issue occurred in the tweet:
about the stereotypes soundness, suggesting that it may
produce less biased or not realistic content.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>5.3. Human-based vs Computer-based metric</title>
        <sec id="sec-2-3-1">
          <title>To better understand the relationship between automatic</title>
          <p>metrics and human judgment, we compared the results of
BLEU, ROUGE and BERT Score with human evaluation
over a sample of 50 sentences, as seen in Figure 3. The
three plots help identify which metric aligns more closely
with human evaluation.</p>
          <p>From the plots, it is evident that the BERT Score
metric (shown in the third plot) correlates more consistently
with the annotators’ evaluation, suggesting it is a more
reliable indicator of quality for this task. This is due to
the nature of BERT score, which leverages contextual
embeddings to measure similarity on a semantic level.
Conversely, BLEU and ROUGE metrics (depicted in the
ifrst and second plots, respectively), which operate more
on the lexical-syntactic level, show more variability and
several limitations in accurately matching human
judgment.</p>
          <p>Understanding the relationship between automatic and
manual assessment is crucial for contextualizing the
values obtained from each metric and evaluating model
performance in a meaningful way. The comparison also
helps to understand which metrics are more robust and
reliable, especially for tasks requiring deep contextual
and pragmatic understanding.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>Che senso ha ministro Trenta rispettare chi</title>
          <p>non rispetta noi? Che senso ha difendere la
loro cultura o presunta cultura quando essi
disprezzano la nostra? La ministra Trenta
contro Salvini: sbagliato dire che l’Islam è
terrorismo
English: What’s the point, minister Trenta, of
respecting thos who don’t respect us? What’s the
point of defending their culture or so-called
culture when they despise ours? Minister Trenta
against Salvini: it’s wrong to say that Islam is
terrorism"
The KG-enhanced output was "la ministra Trenta
disprezza la cultura italiana." (English: Minister Trenta
despises Italian culture.) whereas the gold standard was:
"i musulmani vanno contro i valori dell’Occidente"
(English: Muslims go against Western values). In other cases,
when the model encounters a target associated with a
high number of stereotypes, it tends to concatenate many
of them into a generic and incoherent output.</p>
          <p>In some cases, both the baseline and the KG-enhanced
approaches struggle to recognize irony and fail to
produce a reliable underlying stereotype. For example,
consider the following sentence:
#Dimartedi Stasera indottrinamento pro
Europa. Alla bisogna sono benvenuti anche gli
stranieri. Bravo #Floris, vai a cager English:
#dimartedi tonight: pro-Europe indoctrination. If
needed, even foreigners are welcome. Well done
#Floris, go to hell.</p>
          <p>Both the baseline and the KG-enhanced approaches
generate the "gli stranieri sono benvenuti" (English:
Immigrants are welcome), failing to detect the subtle irony
in the original message.
(a) BLEU scores compared to all
annotators
(b) ROUGE-L scores compared to all
annotators
(c) BERTScore compared to all
annotators</p>
          <p>Finally, we observed challenges in tweets with com- identifying abusive language. Specifically, we explore
plex hypotactic structures and multiple subjects. In such the role that additional information from a knowledge
cases, models often fail to correctly identify the primary graph may play in the understanding and generation of
target and to produce relevant output. Furthermore, the underlying stereotypes. We compare a baseline few-shot
KG-enhanced method tends to generate overly long re- approach with a knowledge-enhanced method,
leveragsponses in these situations, which can reduce the coher- ing diferent LLMs. We observed that prompts enhanced
ence and precision of the generated content. In summary, with additional information outperformed the baseline
the worst-performing examples often occur because the approach. To better assess the reliability of the automatic
model misidentifies the target of the hate tweet, lead- evaluation metrics, we also conducted a manual
evaluaing to reduced accuracy. However, in many cases, the tion, replicating the task performed by the automatic
metmodel still manages to extract a correct implicit message, rics. The human evaluation confirmed the results,
showwhich, while diferent from the gold standard, is present ing higher scores for the knowledge graph-enhanced
in the tweet. In such cases, the prediction is valid, but approach. While the manual assessment was aligned
the reference annotation fails to recognize it as correct. with the automated results, we observed a high degree
of variability in the scores. This suggests that
evaluating such generated content is inherently subjective and
6. Conclusion can vary based on the annotators’ culture, age, or beliefs.
These findings highlight the importance of
contextualIn this work, we aim to investigate whether large lan- izing evaluation metrics and recognizing that they may
guage models are able to uncover implicit stereotypes carry biases or oversimplify complex phenomena. From
embedded in hate speech messages. This task is impor- the error analysis, we observed that the KG-enhanced
tant as it helps uncover the subtle content of hate speech approach occasionally struggles to manage the quantity
messages and supports hate speech detection models in
of information provided, suggesting that further studies
are needed to better understand the extent to which such
models can efectively integrate additional knowledge.</p>
          <p>To sum up, the findings of this research suggest that
knowledge graph-based approaches are highly
promising, even in the hate speech domain, where they remain
largely underexplored.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>7. Limitation and Future Work</title>
      <p>In this work we focused on the integration of stereotypes,
retrieving targets from the gold standard. This allows us
to concentrate the analysis on the knowledge insertion
process within the LLM, minimizing the introduction of
noise. As future work, we intend to test the approach
using a state-of-the-art target detection model. Although
this may introduce errors due to target misclassifications,
it would enable full autonomy for the proposed method
and enhance its applicability in real-world scenarios.
Target detection methods can also return multiple potential
targets in cases of uncertainty, providing a fuller
stereotype context for posts that may involve more than one
target. While we noticed that diferent stereotypes are
associated to the same target, as a future work we may
consider an approach based on semantic similarity to
select the most contextually relevant stereotypes. This
approach could ofer a more focused context for the prompt
and reduce the likelihood of model misunderstandings.
During the error analysis phase, we identified errors
potentially caused by the ‘lost-in-the-middle’ phenomenon.
Future work should explore in greater depth how models
manage diferent quantities of input information. Finally,
it is important to highlight that the manual evaluation
we conducted—particularly regarding the cultural
shareability of the generated stereotypes, is inherently biased
and reflects the perspectives of the researchers involved
in this study. As future work, it would be interesting to
carry out a large-scale, prospectivist survey to explore
the diversity of opinions on stereotypes and to
investigate the dominant worldview conveyed by diferent large
language models.</p>
    </sec>
    <sec id="sec-4">
      <title>Ethical Considerations</title>
      <p>We acknowledge that when dealing with hate speech,
particularly stereotypes targeting minorities, it is
essential to be mindful of the potential of introducing bias or
unintentionally amplifying hateful content. We made
eforts to control and reduce the presence of bias and
to remain aware of its potential introduction. During
the experimental phase, we prompted LLMs to generate
implied stereotypes, which in some cases resulted in the
generation of hateful or ofensive content. The generated
hateful content is intended solely to remain within the
context of this experimental research. Its occurrence also
provides additional insights into how LLMs can produce
harmful language despite safety filters.
lation, 2002, pp. 311–318.
[42] T. Zhang*, V. Kishore*, F. Wu*, K. Q. Weinberger,</p>
      <p>Y. Artzi, Bertscore: Evaluating text generation with
bert, in: International Conference on Learning
Representations, 2020. URL: https://openreview.net/
forum?id=SkeHuCVFDr.
[43] C.-Y. Lin, ROUGE: A package for automatic
evaluation of summaries, in: Text Summarization
Branches Out, Association for Computational
Linguistics, Barcelona, Spain, 2004, pp. 74–81. URL:
https://aclanthology.org/W04-1013/.
[44] B. Peng, Y. Zhu, Y. Liu, X. Bo, H. Shi, C. Hong,</p>
      <p>Y. Zhang, S. Tang, Graph retrieval-augmented
generation: A survey, arXiv preprint arXiv:2408.08921
(2024).
[45] H. Yu, A. Gan, K. Zhang, S. Tong, Q. Liu, Z. Liu,</p>
      <p>Evaluation of retrieval-augmented generation: A
survey, in: CCF Conference on Big Data, Springer,
2024, pp. 102–120.
[46] S. Tsaneva, D. Dessì, F. Osborne, M. Sabou,
Knowledge graph validation by integrating llms and
human-in-the-loop, Information Processing &amp;
Management 62 (2025) 104145.
[47] G. Faggioli, L. Dietz, C. L. Clarke, G. Demartini,</p>
      <p>M. Hagen, C. Hauf, N. Kando, E. Kanoulas, M.
Potthast, B. Stein, et al., Perspectives on large language
models for relevance judgment, in: Proceedings of
the 2023 ACM SIGIR International Conference on</p>
      <p>Theory of Information Retrieval, 2023, pp. 39–50.
[48] S. T. Fiske, Prejudices in cultural contexts: Shared
stereotypes (gender, age) versus variable
stereotypes (race, ethnicity, religion), Perspectives on
psychological science 12 (2017) 791–799.
(baseline: context = ∅, KG-method: context = [[target, hasStereotype, stereotype],. . . ])
// Examples: post = "a roma nuovi campi nomadi e gli italiani abbandonano le
abitazioni. e noi abbiamo gente come la boldrini che difende tutto questo."
implicit_message = i rom sono pericolosi
post = "Papampero a San Vittore e festa del Pd? Te credo! I detenuti votano
tutti a sinistra, pure rom e islamici."
implicit_message = la sinistra protegge i criminali stranieri
// Task: Extract from the following post
Declaration on Generative AI</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Poletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <article-title>Resources and benchmark corpora for hate speech detection: a systematic review</article-title>
          ,
          <source>Language Resources and Evaluation</source>
          <volume>55</volume>
          (
          <year>2021</year>
          )
          <fpage>477</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pang</surname>
          </string-name>
          , A. van den Hengel,
          <article-title>Deep learning for hate speech detection: a comparative study</article-title>
          ,
          <source>International Journal of Data Science and Analytics</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          , G. Attanasio, Hate-ita:
          <article-title>Hate speech detection in italian social media text</article-title>
          ,
          <source>in: Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>252</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N. B.</given-names>
            <surname>Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sviridova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cabrio</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Villata,</surname>
          </string-name>
          <article-title>An in-depth analysis of implicit and subtle hate speech messages</article-title>
          , in: A.
          <string-name>
            <surname>Vlachos</surname>
          </string-name>
          , I. Augenstein (Eds.),
          <source>Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , Association for Computational Linguistics, Dubrovnik, Croatia,
          <year>2023</year>
          , pp.
          <fpage>1997</fpage>
          -
          <lpage>2013</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .eacl-main.
          <volume>147</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .eacl-main.
          <volume>147</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Allaway</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yerukola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vianna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-J.</given-names>
            <surname>Leslie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          ,
          <article-title>Beyond denouncing hate: Strategies for countering implied biases and stereotypes in language</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Singapore,
          <year>2023</year>
          , pp.
          <fpage>9759</fpage>
          -
          <lpage>9777</lpage>
          . URL: https://aclanthology. org/
          <year>2023</year>
          .findings-emnlp.
          <volume>653</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings-emnlp.
          <volume>653</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nanduri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , T. Wu, M. Sap, BiasX: “
          <article-title>thinking slow” in toxic content moderation with explanations of implied social biases</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Singapore,
          <year>2023</year>
          , pp.
          <fpage>4920</fpage>
          -
          <lpage>4932</lpage>
          . URL: https: //aclanthology.org/
          <year>2023</year>
          .emnlp-main.
          <volume>300</volume>
          /. doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>300</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bombieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fiorini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rospocher</surname>
          </string-name>
          ,
          <article-title>Do llms dream of ontologies?</article-title>
          ,
          <source>ACM Trans. Intell. Syst. Technol</source>
          . (
          <year>2025</year>
          ). URL: https://doi.org/10.1145/ 3725852. doi:
          <volume>10</volume>
          .1145/3725852.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Edge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Trinh</surname>
          </string-name>
          , N. Cheng, J.
          <string-name>
            <surname>Bradley</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mody</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Truitt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Metropolitansky</surname>
            ,
            <given-names>R. O.</given-names>
          </string-name>
          <string-name>
            <surname>Ness</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          ,
          <article-title>From local to global: A graph rag approach to query-focused summarization</article-title>
          , arXiv [17]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Jahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oussalah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Beddia</surname>
          </string-name>
          , N. Arhab, preprint arXiv:
          <volume>2404</volume>
          .16130 (
          <year>2024</year>
          ). et al.,
          <article-title>A comprehensive study on nlp data augmen-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guevara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Desh- tation for hate speech detection: Legacy methods</article-title>
          , pande,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented gener- bert, and llms</article-title>
          ,
          <source>arXiv preprint arXiv:2404</source>
          .
          <article-title>00303 ation with knowledge graphs for customer service (</article-title>
          <year>2024</year>
          ).
          <article-title>question answering</article-title>
          ,
          <source>in: Proceedings of the 47th</source>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. He,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ji</surname>
          </string-name>
          , C.-T. Lu, Don't go to extremes:
          <source>International ACM SIGIR Conference on Research Revealing the excessive sensitivity and calibration and Development in Information Retrieval</source>
          ,
          <year>2024</year>
          , limitations
          <article-title>of LLMs in implicit hate speech detecpp</article-title>
          .
          <volume>2905</volume>
          -
          <fpage>2909</fpage>
          . tion, in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , V. Srikumar (Eds.),
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iyyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Proceedings of the 62nd Annual Meeting of the AsJ</article-title>
          . Wei,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          , T. Luong,
          <article-title>sociation for Computational Linguistics (Volume 1: FreshLLMs: Refreshing large language models with Long Papers), Association for Computational Linsearch engine augmentation</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Mar- guistics, Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>12073</fpage>
          -
          <lpage>12086</lpage>
          . tins, V. Srikumar (Eds.), Findings of the Association URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>652</volume>
          /. for Computational Linguistics:
          <source>ACL</source>
          <year>2024</year>
          , Asso- doi:10.18653/v1/
          <year>2024</year>
          .
          <article-title>acl-long.652. ciation for Computational Linguistics</article-title>
          , Bangkok, [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chiniya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Tyagi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Thailand,
          <year>2024</year>
          , pp.
          <fpage>13697</fpage>
          -
          <lpage>13720</lpage>
          . URL: https: D. Manocha, Cosyn: Detecting implicit hate speech //aclanthology.org/
          <year>2024</year>
          .findings-acl.
          <volume>813</volume>
          /. doi: 10.
          <article-title>in online conversations using a context synergized 18653/v1/2024.findings-acl.813. hyperbolic network</article-title>
          ,
          <source>in: Proceedings of the 2023</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Dai,</surname>
          </string-name>
          <article-title>Conference on Empirical Methods in Natural LanJ</article-title>
          . Sun,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Retrieval-augmented
          <source>guage Processing</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>6159</fpage>
          -
          <lpage>6173</lpage>
          .
          <article-title>generation for large language models: A survey</article-title>
          , [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , Y.-S. Han,
          <source>SharedCon: arXiv preprint arXiv:2312.10997 2</source>
          (
          <year>2023</year>
          ).
          <article-title>Implicit hate speech detection using shared se-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dadvar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Trieschnigg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordelman</surname>
          </string-name>
          , F. De Jong, mantics, in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Srikumar Improving cyberbullying detection with user con-</article-title>
          (Eds.),
          <article-title>Findings of the Association for Computatext</article-title>
          , in: European conference on information re- tional
          <source>Linguistics: ACL</source>
          <year>2024</year>
          , Association for Comtrieval, Springer,
          <year>2013</year>
          , pp.
          <fpage>693</fpage>
          -
          <lpage>696</lpage>
          . putational Linguistics, Bangkok, Thailand,
          <year>2024</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          , Leveraging world knowledge in implicit pp.
          <fpage>10444</fpage>
          -
          <lpage>10455</lpage>
          . URL: https://aclanthology.org/ hate speech detection, in: L.
          <string-name>
            <surname>Biester</surname>
          </string-name>
          , D. Dem- 2024.findings-acl.
          <volume>622</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          . szky,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tetreault</surname>
          </string-name>
          , S. Wilson, findings-acl.622. L.
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          .
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Second</source>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Enhancing llmWorkshop on NLP for Positive Impact (NLP4PI), based hatred and toxicity detection with meta-toxic Association for Computational Linguistics, Abu knowledge graph</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/ Dhabi, United Arab Emirates (Hybrid),
          <year>2022</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>2412</lpage>
          .15268. arXiv:
          <volume>2412</volume>
          .
          <fpage>15268</fpage>
          . 39. URL: https://aclanthology.org/
          <year>2022</year>
          .nlp4pi-
          <fpage>1</fpage>
          .4/. [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          , S. Gabriel, L. Qin,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          , doi:10.18653/v1/
          <year>2022</year>
          .nlp4pi-
          <fpage>1</fpage>
          .4.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <article-title>Social bias frames: Reasoning about social</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Myung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Camacho- and power implications of language</article-title>
          , in: D. Jurafsky, Collados,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <article-title>Exploring cross-cultural dif- J.</article-title>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <article-title>Proceedings ferences in english hate speech annotations: From of the 58th Annual Meeting of the Association for dataset construction to analysis, 2024</article-title>
          . URL: https: Computational Linguistics, Association for Com//arxiv.org/abs/2308.16705. arXiv:
          <volume>2308</volume>
          .16705. putational Linguistics, Online,
          <year>2020</year>
          , pp.
          <fpage>5477</fpage>
          -
          <lpage>5490</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Albladi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bigonah</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>486</volume>
          /. F. Jamshidi,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rahgouy</surname>
          </string-name>
          , N. Raychawdhary, doi:10.18653/v1/
          <year>2020</year>
          .acl-main.486.
          <string-name>
            <surname>D. Marghitu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Seals</surname>
            , Hate speech detection using [23]
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Sohn</surname>
          </string-name>
          ,
          <article-title>Why is it hate speech? large language models: A comprehensive review, masked rationale prediction for explainable hate IEEE Access (</article-title>
          <year>2025</year>
          ).
          <article-title>speech detection</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          -R. Huang,
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>M. ElSherief</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Ziems</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Muchlinski</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Anupindi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Wanner</surname>
            , K.-S. Choi,
            <given-names>P.- J.</given-names>
          </string-name>
          <string-name>
            <surname>Seybolt</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Choudhury</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Latent ha- M. Ryu</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Donatelli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Kurotred: A benchmark for understanding implicit hate hashi</article-title>
          , P. Paggio,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          , speech, in: Proceedings of the 2021 Conference T. K.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Santus</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bond</surname>
          </string-name>
          , S.-H. Na (Eds.),
          <source>Proon Empirical Methods in Natural Language Pro- ceedings of the 29th International Conference on cessing, Association for Computational</source>
          Linguis- Computational
          <string-name>
            <surname>Linguistics</surname>
            , International Committics, Online and
            <given-names>Punta</given-names>
          </string-name>
          <string-name>
            <surname>Cana</surname>
          </string-name>
          , Dominican Republic, tee on Computational Linguistics, Gyeongju, Re2021, pp.
          <fpage>345</fpage>
          -
          <lpage>363</lpage>
          . URL: https://aclanthology.org/ public of Korea,
          <year>2022</year>
          , pp.
          <fpage>6644</fpage>
          -
          <lpage>6655</lpage>
          . URL: https:
          <year>2021</year>
          .emnlp-main.
          <volume>29</volume>
          . //aclanthology.org/
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .577/.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>An</surname>
          </string-name>
          , Chain of explanation: T. Chakraborty, Tox-BART:
          <article-title>Leveraging toxicNew prompting method to generate quality natural ity attributes for explanation generation of imlanguage explanation for implicit hate speech, in: plicit hate speech</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          ,
          <source>Companion Proceedings of the ACM Web</source>
          Confer- V. Srikumar (Eds.),
          <source>Findings of the Association ence</source>
          <year>2023</year>
          , WWW '23,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2023</year>
          , p.
          <fpage>90</fpage>
          -
          <lpage>93</lpage>
          . URL: for
          <source>Computational Linguistics: ACL</source>
          <year>2024</year>
          , Assohttp://dx.doi.org/10.1145/3543873.3587320. doi:10. ciation for Computational Linguistics, Bangkok,
          <volume>1145</volume>
          /3543873.3587320.
          <string-name>
            <surname>Thailand</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>13967</fpage>
          -
          <lpage>13983</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          , S.-Y. //aclanthology.org/
          <year>2024</year>
          .findings-acl.
          <volume>831</volume>
          /. doi: 10.
          <string-name>
            <surname>Yun</surname>
          </string-name>
          ,
          <source>HARE: Explainable hate speech detection</source>
          <volume>18653</volume>
          /v1/
          <year>2024</year>
          .
          <article-title>findings-acl.831. with step-by-step reasoning</article-title>
          , in: H. Bouamor, [33]
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Bonaventura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <string-name>
            <surname>O-dang</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <article-title>Findings of the Association at hodi and haspeede3: A knowledge-enhanced apfor Computational Linguistics: EMNLP 2023, Asso- proach to homotransphobia and hate speech detecciation for Computational Linguistics, Singapore, tion in italian</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>5490</fpage>
          -
          <lpage>5505</lpage>
          . URL: https://aclanthology. volume
          <volume>3473</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2023</year>
          . org/
          <year>2023</year>
          .findings-emnlp.
          <volume>365</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/ [34]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <year>2023</year>
          .findings-emnlp.365. V.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Jezek</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Patti</surname>
          </string-name>
          , Subjectivity
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.</given-names>
            <surname>Tonini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <article-title>How do in stereotypes against migrants in italian: An exwe counter dangerous speech in italy?, in: CEUR perimental annotation procedure</article-title>
          ,
          <source>in: Proceedings Workshop Proceedings</source>
          , volume
          <volume>3878</volume>
          ,
          <source>CEUR-WS, of the 11th Italian Conference on Computational</source>
          <year>2024</year>
          , p.
          <fpage>103</fpage>
          .
          <string-name>
            <surname>Linguistics (</surname>
          </string-name>
          CLiC-it
          <year>2025</year>
          ), CEUR Workshop Pro-
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Schmeisser-Nieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ricci</surname>
          </string-name>
          , S. Frenda, ceedings, Cagliari, Italy,
          <year>2025</year>
          . M.
          <string-name>
            <surname>Taulé</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Bosco</surname>
            , Implicit stereotypes: A corpus- [35]
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Capozzi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>LAI</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Poletto</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Sanbased study for italian</article-title>
          ,
          <source>in: Proceedings of the 10th guinetti</source>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. F.</given-names>
            <surname>RUFFO</surname>
          </string-name>
          , C. Musto, Italian Conference on
          <string-name>
            <surname>Computational Linguistics M. Polignano</surname>
          </string-name>
          , et al.,
          <article-title>Computational linguistics (CLiC-it</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>997</fpage>
          -
          <lpage>1004</lpage>
          . against hate:
          <article-title>Hate speech detection and visualiza-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>F.</given-names>
            <surname>Poletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Patti, tion on social media in the" contro l'odio" project, C. Bosco</article-title>
          , et al.,
          <article-title>Hate speech annotation: Anal- in: 6th Italian Conference on Computational Linysis of an italian twitter corpus</article-title>
          , in: Ceur workshop guistics, CLiC-it
          <year>2019</year>
          ,
          <year>2019</year>
          . proceedings, volume
          <year>2006</year>
          ,
          <article-title>CEUR-</article-title>
          <string-name>
            <surname>WS</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . [36]
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ferrario</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          , N. Guarino,
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>B.</given-names>
            <surname>Cristina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marinella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Benamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Gio- C. Masolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Porello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Sanfilippo</surname>
          </string-name>
          , L. Vieu, vanni, P. Viviana,
          <string-name>
            <given-names>M.</given-names>
            <surname>Véronique</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mariona</surname>
          </string-name>
          , et al.,
          <article-title>Dolce: A descriptive ontology for linguistic and Sterheotypes project. detecting and countering eth- cognitive engineering</article-title>
          , Applied ontology
          <volume>17</volume>
          (
          <year>2022</year>
          )
          <article-title>nic stereotypes emerging from italian, spanish and 45-69. french racial hoaxes</article-title>
          ,
          <source>in: Proceedings of the Sem</source>
          <volume>-</volume>
          [37]
          <string-name>
            <given-names>L.</given-names>
            <surname>Asprino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Daga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          , P. Mulholland,
          <article-title>inar of the Spanish Society for Natural Language Knowledge graph construction with a façade: a uniProcessing: Projects and System Demonstrations ifed method to access heterogeneous data sources (SEPLN-CEDI-PD</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          .
          <article-title>on the web</article-title>
          ,
          <source>ACM Transactions on Internet Tech-</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Khatib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , nology
          <volume>23</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          . T. Caselli,
          <article-title>Language is scary when over-analyzed:</article-title>
          [38]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paranjape</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>BevilacUnpacking implied misogynistic reasoning with qua</article-title>
          , F. Petroni,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Lost in the middle: How argumentation theory-driven prompts</article-title>
          , in: Y. Al
          <article-title>- language models use long contexts</article-title>
          , Transactions Onaizan,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-N.</given-names>
            <surname>Chen</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Association for Computational Linguistics of the 2024 Conference on Empirical Methods in 12</source>
          (
          <year>2024</year>
          )
          <fpage>157</fpage>
          -
          <lpage>173</lpage>
          . URL: https://aclanthology.org/ Natural Language Processing, Association for Com-
          <source>2024.tacl-1</source>
          .9/. doi:
          <volume>10</volume>
          .1162/tacl_a_00638. putational Linguistics, Miami, Florida, USA,
          <year>2024</year>
          , [39]
          <string-name>
            <surname>Gemma</surname>
            <given-names>Team</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gemma</surname>
          </string-name>
          (
          <year>2024</year>
          ). URL: pp.
          <fpage>21091</fpage>
          -
          <lpage>21107</lpage>
          . URL: https://aclanthology.org/ https://www.kaggle.com/m/3301. doi:
          <volume>10</volume>
          .34740/
          <year>2024</year>
          .emnlp-main.
          <volume>1174</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          . KAGGLE/M/3301. emnlp-main.
          <volume>1174</volume>
          . [40]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          , C. Bam-
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          , ford,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Chaplot</surname>
          </string-name>
          , D. de las Casas,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bressand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. G. Lengyel,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Saulnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Lavaud</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>Yih</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Rocktäschel</surname>
          </string-name>
          , et al.,
          <article-title>Retrieval-augmented</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lachaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>Wang, generation for knowledge-intensive nlp tasks</article-title>
          , T. Lacroix,
          <string-name>
            <given-names>W. E.</given-names>
            <surname>Sayed</surname>
          </string-name>
          , Mistral 7b,
          <year>2023</year>
          . URL: https:
          <source>Advances in neural information processing systems //arxiv.org/abs/2310</source>
          .06825. arXiv:
          <volume>2310</volume>
          .
          <fpage>06825</fpage>
          . 33 (
          <year>2020</year>
          )
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          . [41]
          <string-name>
            <given-names>K.</given-names>
            <surname>Papineni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roukos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ward</surname>
          </string-name>
          , W. jing Zhu, Bleu: a
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>N.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Masud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <article-title>method for automatic evaluation of machine trans-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>