<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>WorthIt: Check-worthiness Estimation of Italian Social Media Posts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Agnese Dafara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Ramponi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Tonelli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Humanities, University of Pavia - Pavia</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Digital Humanities group</institution>
          ,
          <addr-line>Fondazione Bruno Kessler - Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Natural Language Processing, University of Stuttgart - Stuttgart</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Check-worthiness estimation is the first and a paramount task in the automated fact-checking pipeline. It allows professional fact-checkers to cope with the increasing amount of mis/disinformative textual content being published online by prioritizing claims that are factual/verifiable and worthy of verification. Despite the long tradition of check-worthiness estimation in NLP, there is currently a lack of annotated resources and associated methods for Italian. Moreover, current datasets typically cover a single topic and focus on a limited time frame, afecting models' generalizability on out-of-distribution data. To fill these gaps, in this paper we introduce WorthIt, the first annotated dataset for factuality/verifiability and check-worthiness estimation of Italian social media posts that covers public discourse on migration, climate change, and public health issues across a large time period of six years. We describe the dataset creation in detail and conduct thorough experimentation with the WorthIt dataset using a wide array of encoder- and decoder-based models. Our results show that fine-tuning monolingual encoder-based models in a multi-task setting provides the best overall performance and that decoder-based models in a few-shot setup still struggle in capturing the relation between factuality/verifiability and check-worthiness. We release our dataset, code, and associated materials to the research community.§</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Automated fact-checking</kwd>
        <kwd>check-worthiness estimation</kwd>
        <kwd>factual/verifiable claim detection</kwd>
        <kwd>resources and evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Given the unprecedented amount of mis/disinformation
spreading online, assisting fact-checkers in their
everyday work by automatizing some of their tasks is becoming
of paramount importance. The identification of content
that is worthy of verification – i.e., check-worthiness
estimation, also referred to as check-worthy claim detection –
represents the first stage in the fact-checking pipeline [ 1]
insofar as it allows professional fact-checkers to reduce
the screening eforts of content that is not worth of
attention, therefore focusing on the verification of potentially
false or misleading information.</p>
      <p>According to Nakov et al. [2], a claim is deemed
checkworthy and calls for the attention of a fact-checker if
it “is likely to be false, is of public interest, and/or
appears to be harmful”, also being not “easy to fact-check
by a layperson” (e.g., “The capital of Italy is Rome”). A
check-worthy claim is both factual and verifiable [2, 3],
i.e., it presents an “assertion about the world that is
checkable” [4], namely it “state[s] a definition, mention[s] a
quantity in the present or in the past, make[s] a verifiable
§ The repository is publicly available on GitHub at: https://github.
com/dhfbk/worthit.
prediction of the future, reference[s] laws, procedures,
and rules of operation, discuss[es] images or videos, [or]
state[s] correlation or causation” [2]. In other words, if a
claim is factual and verifiable, it is possible to determine
its check-worthiness based on whether it is relevant and
may potentially have a broader impact on the general
public [5] (see examples in Figure 1).</p>
      <p>Check-worthy claim detection1 has become a
wellestablished task in NLP since the introduction of the
ifrst CheckThat! evaluation campaign [ 6]. However,
despite the progress and the coverage of multiple languages
in the past CheckThat! editions, no dataset or task for
check-worthiness estimation specifically for the Italian
language has been considered so far. Moreover, current
1In this paper, we refer to the task as “check-worthy claim detection”
or “check-worthiness estimation” interchangeably.
datasets for check-worthiness estimation in other lan- only for areas in which a given language is spoken. In this
guages mostly focus on COVID-19 issues and consist respect, Italian represents an exception because, to our
of posts that were drawn from a relatively small time knowledge, no dataset for check-worthiness estimation
period (e.g., one year and three months [2]), afecting in this language has been developed so far. Recently the
out-of-distribution generalization of models [7]. Check-IT! dataset [20] has been created, which however
contains only fact-checked (i.e. check-worthy) claims.</p>
      <p>
        Contributions In this paper, we address the aforemen- Likewise, the FEVER-IT dataset [21] is a translation into
tioned gaps by developing WorthIt, the first annotated Italian of the widely-used FEVER dataset [22], and
condataset for factuality/verifiability and check-worthiness tains only claims to be verified against textual sources.
estimation for Italian, and by conducting extensive exper- In this work, we address this gap by presenting the novel
iments with encoder- and decoder-based models. Wor- WorthIt dataset, which covers a previously overlooked
thIt covers public discourse from Twitter on migration, language for the task of check-worthiness estimation.
climate change, and public health issues over a large The dataset has been carefully sampled across topics
time frame of six years. The full dataset was annotated and time for better models’ generalizability, since past
by two expert annotators which discussed the cases of works have shown that the performance of automated
disagreement to resolve annotation errors (e.g., due to fact-checking drops under domain shift [23]. The
Worattention drops) while keeping genuine annotation diver- thIt dataset has also been fully annotated by two raters
gences (e.g., due to diferent interpretations), in line with to value human label variation [8].
recent work advocating the importance of considering Concerning the methods for check-worthiness
estimahuman label variation in subjective tasks [8, 9, 10, 11, 12, tion, state-of-the-art results in the CheckThat! evaluation
inter alia]. We fine-tune a wide array of monolingual campaigns are mostly based on fine-tuned encoder-based
and multilingual encoder-based models in single- and models such as BERT, RoBERTa, and DistilBERT [24, 25,
multi-task learning settings, and experiment with four 26] and language-specific variants [
        <xref ref-type="bibr" rid="ref111">16</xref>
        ], often combined
decoder-based models that include Italian in pretrain- with data augmentation [27] and ensembling strategies.
ing data in a few-shot setup after a careful selection of Recently, large language models (LLMs) have been started
representative examples. Results show that multi-task to be used for the task, showing promising performance.
ifne-tuning of encoder-based models provides the best For instance, the best performing system on English at
performance, and that decoder-based models – with or CheckThat! 2024 [17] fine-tuned Llama-2-7B on the
prowithout annotation guidelines in the prompt, either in vided training data and then leveraged prompts
generItalian or English – still struggle in tackling the task ef- ated by ChatGPT for check-worthy claim detection [28].
fectively, even when provided with information about However, previous works do not leverage the synergies
the factuality/verifiability of the post. between factuality/verifiability and check-worthiness,
albeit being strictly related tasks. Our work makes a step
2. Related Work towards this goal by fine-tuning encoder-based models
in a multi-task learning setting and experimenting with
sequential prompting using decoder-based models.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. WorthIt Dataset</title>
      <sec id="sec-2-1">
        <title>In this section, we describe the dataset creation process, from data collection (Section 3.1) to data annotation (Section 3.2). We then present data statistics (Section 3.3).</title>
        <sec id="sec-2-1-1">
          <title>3.1. Data Collection</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>We collect social media posts pertaining to migration, cli</title>
        <p>mate change, and public health issues using the Twitter
APIs.3 To mitigate temporal bias in the dataset, we focus
on a large time frame of six full years (from 2017-01-01
to 2022-12-31) and retain messages in Italian about the
aforementioned topics by using a manually curated list
of over 400 keywords derived from reliable glossaries and</p>
      </sec>
      <sec id="sec-2-3">
        <title>3Tweets were retrieved in 02/2023 when the APIs for research pur</title>
        <p>poses were still available for free.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Check-worthy claim detection is a popular task within</title>
        <p>
          the NLP community mostly thanks to the series of
CheckThat! shared tasks organized by the CLEF initiative.2
Indeed, check-worthy claim detection is the only task
that has been proposed at all seven CheckThat!
editions [
          <xref ref-type="bibr" rid="ref111">6, 13, 14, 15, 2, 16, 17</xref>
          ]. Several datasets for
training check-worthiness estimation models in diferent
languages have been created and released, starting from
English and Arabic at CheckThat! 2018 [6] to Arabic,
Bulgarian, Dutch, English, Spanish, and Turkish in later
editions [
          <xref ref-type="bibr" rid="ref111">2, 16, 17</xref>
          ]. Besides CheckThat! datasets,
additional resources have been developed over the years,
mainly focused on specific events like COVID-19 [
          <xref ref-type="bibr" rid="ref60">18</xref>
          ]
or political news [19]. English is the most represented
language for check-worthiness estimation, but the
scientific community has recently started to focus on the
development of resources for other languages too, since
check-worthy claims can refer to events that are relevant
2https://www.clef-initiative.eu/.
scientific manuals (see Appendix A). Following Nakov the cases in which their annotations diverged, and
conet al. [2], we further filter out posts containing ≤ 5 to- solidated the guidelines by specifying how to deal with
kens4 and sort the remaining messages by their sum of special cases (e.g., in the presence of reported speech; see
likes and retweets. We then select the top- ( = 10) Appendix B). Then, they both labeled the full set of posts
posts exhibiting the highest number of likes and retweets in four rounds of annotation. Each round involved a
disfor each month and topic subset, therefore focusing on cussion phase aimed at resolving annotation errors (e.g.,
the messages with the highest impact to the society while due to attention slips) while keeping instances
exhibitsimultaneously mitigating topic and temporal biases. We ing genuine disagreement (e.g., diferent interpretations).
further account for the potential presence of authors’ This makes WorthIt the first check-worthiness
estimawriting style biases that can occur when many posts au- tion dataset that goes beyond the “single ground truth”
thored by the same users are included in the dataset: we assumption in subjective annotation.
therefore retain only the most impactful post authored
by the same user in each data subset. Overall, we collect Inter-annotator agreement We computed the
inter2,160 posts evenly distributed across topics (i.e., 720 for annotator agreement (IAA) on the full dataset for both
each topic) and time periods (i.e., 360 for each year) for factuality/verifiability and check-worthiness using
Kripfactuality/verifiability and check-worthiness annotation. pendorf’s alpha (  ) [29]. We obtain 0.8322 for
factualiAll posts have been then anonymized by replacing user ty/verifiability and 0.6909 for check-worthiness. As
exmentions, URLs, email addresses, and phone numbers pected, albeit substantial, the IAA for check-worthiness
with placeholders (i.e., [USER], [URL], [EMAIL], and is lower than that for factuality/verifiability due to the
[PHONE], respectively) and newline characters (i.e., \n genuine disagreement that we retain on purpose.
and \r) have been replaced with single spaces.
        </p>
        <sec id="sec-2-4-1">
          <title>3.3. Data Analysis and Statistics</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>3.2. Data Annotation</title>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>WorthIt comprises 2,160 posts distributed across topics</title>
        <p>Each post has been annotated with two labels, namely and time periods as shown in Figure 2, in which we also
i) one denoting whether the content of the post is factu- highlight the overlap in posts with Faina [30], a
previal/verifiable – either yes or no – and ii) one indicating ously released dataset for fine-grained fallacy detection.
its check-worthiness – with labels in a 5-point Likert Specifically, WorthIt includes the same posts from 2019
scale: definitely yes, probably yes, neither yes nor to 2022 that are in Faina and further includes messages
no, probably no, or definitely no. It is worth noting from 2017 and 2018 time periods. This opens
opportunithat, as opposed to determining factuality/verifiability, ties for studying the interplay between check-worthiness
estimating check-worthiness is a partly subjective task. and fallacious argumentation in future work as well as
inThis motivates us to create WorthIt with parallel labels vestigations on human label variation, especially because
by the annotators on all posts so that future studies on annotators are the same for both datasets.
human label variation can be conducted. The
annotation guidelines closely follow the ones used in
CheckThat! shared tasks and are provided in Appendix B.</p>
        <p>Annotators Annotation was conducted by two expert
annotators. Both annotators are native speakers of Italian
and have naturally been exposed to public discourse on
migration, climate change, and public health in the
Italian context. They identify themselves as a woman and a
man, with age ranges 20–30 and 30–40. They have a
background in linguistics and natural language processing
and conducted annotation as part of their work.</p>
        <p>Annotation process Annotators were provided with Overall, social media posts in WorthIt have an
averannotation guidelines for determining the factuality/ver- age token length of 38.6 and the full dataset comprises
ifiability and check-worthiness of social media posts (Ap- 83,315 tokens, of which 28,562, 26,667, and 28,086 are part
pendix B). After conducting a pilot annotation phase of migration, climate change, and public health posts,
on a small subset of the messages, annotators discussed respectively. In Table 1 we summarize the annotation
statistics for factuality/verifiability and check-worthiness
for both annotators (1 and 2). While 1,413–1,432
4Computed using the it_core_news_sm spaCy model (v3.5). posts (65.4%–66.3%) are considered as factual/verifiable</p>
        <p>Data splits We divide WorthIt into  training and test
sets using -fold cross-validation ( = 5) preserving the
label distribution across splits. For development, we rely
on the training portions only and further divide them into
training and development sets for the purpose of model
variant and prompt selection (Section 4.2). Specifically,
for encoder-based models we split them into five 80%/20%
training/development sets, while for decoder-based
models we divide them into two equal parts: the first half
is used for selecting examples for few-shot prompting,
while the second half serves as development set. All texts
were lowercased for the purpose of the experiments.
by annotators, we stress that the check-worthiness of a
post can be estimated only if the post itself is deemed
as factual/verifiable. Indeed, only 1,011 (46.8%) and 784
(36.3%) posts over the total are classified as check-worthy
(i.e., either with the label probably yes or definitely Models For the experiments with encoder-based
modyes) by 1 and 2, respectively. We observe that the els, we use four monolingual models specifically trained
overall statistics for factuality/verifiability are similar on Italian data, namely AlBERTo [31],5 UmBERTo [32],6
among annotators, while those for check-worthiness, as and dbmdz’s Italian BERT models [33] in their base7 and
expected, vary more. Specifically, 1 appears to have xxl8 versions (henceforth referred to as BERT-it base and
been more inclined to assign clear-cut check-worthiness BERT-it xxl). Moreover, we employ widespread
multiscores, whereas 2 distributed its ratings more across lingual models that include Italian in pretraining data,
the scale. While in our experiments (Section 4) we do not namely mBERT [34]9 and XLM-RoBERTa [35].10 For
finedirectly leverage this information, our dataset is released tuning, we use the MaChAmp toolkit (v0.4.2) [36] and
to the community with disaggregated labels using the full select the best hyperparameter configuration based on
av5-point Likert scale to encourage work on fine-grained erage Pos F1 score on the development sets (Appendix C).
check-worthiness estimation and human label variation. As regards decoder-based models, we choose two
Italian and two multilingual models, all instruction-tuned.</p>
        <p>Specifically, we select LlaMAntino-3-ANITA-8B [ 37]11
4. Experiments and Minerva-7B [38]12 as monolingual models, while we
use Qwen2.5-7B [39]13 and Llama3.1-8B [40]14 as
multiWe conduct experiments on check-worthiness estimation lingual models. We choose these models because they
with both encoder- and decoder-based models using the are widely used, freely available, and do not require very
newly-introduced WorthIt dataset. In this section, we large computational resources that could be impractical
thoroughly detail our experimental setup (Section 4.1) in real-world scenarios. Predicted labels are extracted
and the model variant and prompt selection process (Sec- from models’ outputs using regular expressions. If no
tion 4.2). Then, we present test set results (Section 4.3).
5Version: m-polignano-uniba/bert_uncased_L-12_H-768_A
4.1. Experimental Setup 6-V1e2rs_iiotna:lMiuasni_xamlabt3crht/0umberto-commoncrawl-cased-v1
Task setup We cast the check-worthiness task as a 78VVeerrssiioonn:: ddbbmmddzz//bbeerrtt--bbaassee--iittaalliiaann--uxnxcla-suendcased
binary classification problem and consider factuality/ver- 9Version: google-bert/bert-base-multilingual-cased
ifiability as auxiliary information that can be leveraged 10Version: FacebookAI/xlm-roberta-base
by models to improve performance on the task. Given 11Version: swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO
-ITA
that each post has annotations provided by all annota- 12Version: sapienzanlp/Minerva-7B-instruct-v1.0
tors (i.e., 1 and 2) for both factuality/verifiability and 13Version: Qwen/Qwen2.5-7B-Instruct
check-worthiness, for the purpose of the experiments 14Version: meta-llama/Meta-Llama-3.1-8B-Instruct
matching label is found in the output,15 the response is average precision (mAP) scores for them to get
addirecorded as “unknown”. Hyperparameter details are in tional insights on performance when ranking posts by
Appendix C. Overall, we employ six encoder-based and check-worthiness. Moreover, for decoder-based models
four decoder-based models, for a total of ten models. we include the number of “unknown” outputs (i.e., those
not matching a label in the label set) to assess their ability
Prompts and example sets For decoder-based mod- to follow the instructions.
els, we design prompts in two languages (Italian and
English) with or without annotation guidelines, leading to 4.2. Model Variant and Prompt Selection
four diferent prompt configurations: Italian with
guidelines (it_g), Italian without guidelines (it_ng), English
with guidelines (en_g), and English without guidelines
(en_ng). All models are prompted in a few-shot setup
with five carefully-selected examples of posts and
associated labels (Section 4.2).16 All prompts are in Appendix D.</p>
      </sec>
      <sec id="sec-2-6">
        <title>We select the most promising setting (i.e., model vari</title>
        <p>ant, set of few-shot examples, and prompt configuration)
based on average Pos F1 score on the development sets.</p>
        <p>While for encoder-based approaches the model selection
was mainly a matter of tuning hyperparameter values
(see Section 4.1 and additional details in Appendix C), for
decoder-based models this involved the selection of the
most promising set of examples as well as the prompt
configuration (i.e., language and guidelines).</p>
        <p>Multi-task fine-tuning and sequential prompting
We hypothesize that factuality/verifiability information
can help to predict the check-worthiness of a post. We
thus design diferent fine-tuning and prompting settings
for encoder- and decoder-based models, respectively, to
test this hypothesis. Specifically, for encoder-based
models we compare a standard single task approach (i.e.,
ifne-tuning a model with check-worthiness labels only)
with an approach that leverages both
factuality/verifiability and check-worthiness information in a multi-task
learning framework (i.e., using check-worthiness as a
main task and factuality/verifiability as an auxiliary task
with diferent task loss weights  fv and  cw; see
Appendix C). We compute the multi-task learning loss as
 = ∑︀  , where  is the loss for the task , i.e.,
either factuality/verifiability ( fv) or check-worthiness
(cw), and   is the weight given to the task. For
decoderbased models, we instead test a standard setting in which
the models are prompted directly for check-worthiness
(not seq) and a two-step sequential prompting approach
(seq) (prompt are in Appendix D). In the latter case, the
model is firstly instructed to classify the post based on its
factuality/verifiability, then the output label is
incorporated into a prompt which instructs the model to assess
the check-worthiness of the same post.</p>
        <p>Few-shot example set selection We create five
different sets of few-shot examples (i.e., post texts and
associated labels) by diversifying them across topics and
annotation combinations for factuality/verifiability and
check-worthiness, focusing on examples that are similar
to those that are discussed in the annotation guidelines.</p>
        <p>Each set is drawn from one of the five training splits
used during development and contains five examples.
Table 2 reports the composition of each set with respect
to topics and annotations. To select the most promising
example set to be used in the test phase, we prompt all
decoder-based models with these example sets. In Table 2
we also report the Pos F1 obtained by using each
example set, averaged on all models, development sets, and
prompt configurations across seq and not seq settings
(calculated over a total of 138,400 data points).17
Example set #1 leads to the highest average Pos F1 score and
also exhibits the smallest standard deviation (Table 2);
therefore, we select this set for the test phase (refer to
Appendix D for post texts and labels included in the
example set). It is worth noting that this is the only set that
does not include any post annotated as factual/verifiable
Evaluation metrics We use the F1 score for the pos- but not check-worthy (+-), suggesting that models may
itive check-worthy class (Pos F1) as our main metric, learn more efectively from examples that are either both
in line with previous work on check-worthy claim de- factual/verifiable and check-worthy or neither. In Table 3,
tection [2, 16, 17, inter alia]. For completeness, we also we report the percentages of factuality/verifiability and
report positive precision and recall scores (Pos Prec and check-worthiness label combinations outputted by
modPos Rec, respectively), as well as accuracy (Acc) for test els when prompted using each example set over all the
set results. Since encoder-based models provide confi- possible configurations in the seq setting (69,200 data
dence scores for the output labels, we also compute mean points). We observe that even if the sets have diferent
15Allowed labels for factuality/verifiability: {factual, fattuale, not[-_
]factual, non[-_ ]fattuale}; allowed labels for check-worthiness:
{check[-_ ]worthy, not[-_ ]check-worthy, non[-_ ]check-worthy}.
16Testing a smaller/larger number of examples is left for future work.
17Each development split for decoder-based models consists of 865
examples (i.e., 50% of the training portion; see Section 4.1). Therefore,
we have 865 outputs per development set (5× ) → 4,325 outputs
per model’s configuration (4 × ) → 17,300 outputs per model (4× )
→ 69,200 outputs per setting (2× ) → 138,400 outputs in total.
distributions of label combinations, this does not
influence significantly the distribution of the labels generated
by models: in all cases, models frequently produce an
invalid pair -fv +cw, while they tend to avoid the opposite
one (i.e., +fv -cw).
18865 outputs per development set (5× ) → 4,325 outputs per example</p>
        <p>set (5× ) → 21,625 outputs in total.</p>
        <p>Best prompt selection To select the prompts for the
test phase, we compare average Pos F1 scores on the
development splits obtained by all decoder-based models</p>
        <sec id="sec-2-6-1">
          <title>4.3. Results</title>
          <p>when prompted with it_g, it_ng, en_g, and en_ng
configurations (21,625 data points for each configura- We compute the results for the selected configurations of
tion)18 in both seq and not seq settings. Results are in encoder- and decoder-based models across the  = 5 test
Table 4. All the best performing models do not use guide- splits, presenting average scores and standard deviations
lines; therefore, we decide not to include guidelines in the across the applicable metrics as detailed in Section 4.1.
prompts in further experiments. We keep both English
and Italian prompt versions for the test phase, as some
models perform better with Italian (particularly Minerva).</p>
          <p>We also observe that the best results in the seq setting
are overall higher than in the direct check-worthiness
task (i.e., not seq). We keep both settings for testing to
better highlight performance diferences.</p>
        </sec>
      </sec>
      <sec id="sec-2-7">
        <title>Encoder-based models Results for encoder-based</title>
        <p>models are shown in Table 5. We observe that using
factuality/verifiability as an auxiliary task in a multi-task
learning framework helps to improve the Pos F1
performance across all models. The best scores are obtained by
BERT-it xxl, followed by UmBERTo and BERT-it base, all
ifne-tuned in a multi-task setting. Specifically,
BERTit xxl fine-tuned using both factuality/verifiability and
check-worthiness information achieves a Pos F1 score</p>
        <p>Model
LlaMAntino-3-ANITA-8B
Minerva-7B
Qwen2.5-7B
Llama3.1-8B
seq
seq
mBERT
seq
seq
seq
seq
of 0.7473 (+1.41 points increase compared to the single
suggests that XLM-RoBERTa can be a viable approach
task version) and a mAP score of 0.8095 on the check- for multilingual check-worthiness estimation.
worthiness estimation task. Notably, XLM-RoBERTa in a
multi-task setting shows only -3.35 points than the best Decoder-based models Results are presented in
TaBERT-it xxl configuration in terms of Pos F 1 score, de- ble 6. Decoder-based models in a few-shot setup perform
spite being pretrained on a mixture of languages. It also slightly worse on average than fine-tuned encoder-based
outperforms AlBERTo in the multi-task setup and ob- models, but still achieve competitive results. Moreover,
tains comparable results in the single task setting. This three models perform better when prompted in Italian.</p>
      </sec>
      <sec id="sec-2-8">
        <title>Notably, LlaMAntino-3-ANITA-8B – despite being pre</title>
        <p>trained on Italian data – performs better with English
prompts and achieves the highest score in the seq
setting (i.e., 0.6771 Pos F1 score). The two Italian models,
LlaMAntino-3-ANITA-8B and Minerva-7B, reach the best
results in the seq setup, while the multilingual models
Qwen2.5-7B and Llama3.1-8B perform better when
directly prompted for check-worthiness (i.e., in the not seq
setup). Overall, factuality and verifiability information
do not seem to significantly aid decoder-based models
in predicting check-worthiness, as they are unable to
leverage this information efectively (see Section 5 for an
in-depth analysis). The lowest performance is observed
with Minerva-7B, which is also the only model to produce
“unknown” outputs – up to an average of 127 “unknown”
labels when prompted in English in the seq setting.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Analysis and Discussion</title>
      <p>Ranking of posts by check-worthiness Aggregate
check-worthiness estimation scores (e.g., Pos F1) give a
useful picture of models’ performance; however, knowing
how the models rank the posts by check-worthiness is
paramount for fact-checkers since they can only screen
a limited number of posts in their daily work (say, ). In
Figure 3, we report the ratio of posts correctly classified
as check-worthy within the top- recommended
checkworthy posts (P@) by all encoder-based models,19 with
 ∈ {5, 10, 25, 50, 100}. We observe that P@ is in the
range of 0.90–0.95 and 0.80–0.85 points on average when
the posts’ screening budget is set to  = 25 and  = 100,
respectively. This indicates that these models can help
fact-checkers in their daily routine.
UmBERTo</p>
      <p>mBERT</p>
      <p>XLM-RoBERTa</p>
      <p>Relationship between fv and cw To assess whether
decoder-based models capture the relationship between
factuality/verifiability and check-worthiness, we ana- classify check-worthiness independently. This is a
parlyzed their outputs in the seq setup. Figure 4 shows ticularly important limitation, as it can potentially lead
the frequencies of the four possible combinations of la- to fact verification eforts being wasted on content that
bels both in the models’ outputs (i.e., +fv +cw, +fv -cw, is not factual. In contrast, all models except
LlaMAntino-fv +cw, and -fv -cw; calculated over 8,650 data points) 3-ANITA-8B rarely assign the opposite combination, +fv
and in the manual annotations (2,160 data points). The -cw, which is instead valid within our framework and
most frequent label combination in the models’ outputs represents a consistent portion of annotated posts (27.9%).
is +fv +cw, accounting for more than half of the predic- LlaMAntino-3-ANITA-8B favors either two negative
lations for Minerva-7B and Llama3.1-8B, reaching 66.2% bels (-fv -cw) or two positive labels (+fv +cw), while
for the latter. Interestingly, the second most frequent assigning mixed label combinations significantly less
ofcombination is -fv +cw: we consider this as problematic, ten. A side efect of this is that it produces the - fv +cw
because non-factual or non-verifiable posts should not be combination less frequently than the other models.
Overclassified as check-worthy. This suggests that decoder- all, our analysis shows that models i) tend to avoid the
based models do not grasp this correlation and instead combination +fv -cw, preferring to align the two labels
rather than diversifying them, especially when they rely
on positive factuality/verifiability, and ii) tend to produce
19Ionnltyhissianncaelywsiitsh,wdeecroedpeorr-tbPa@sed smcoordeeslsfoirt eisncnoodterp-obsassiebdlemtoodgeelst the invalid label combination -fv +cw. We stress that
confidence scores for labels generated as part of raw outputs. this tendency is not due to the examples given in the
prompts (cf. Table 3), but is rather a general preference of
those models, which seem to ignore the relation between
factuality/verifiability and check-worthiness.</p>
      <p>Correlation between models’ outputs To assess if
there is a pairwise correlation between encoder- and
decoder-based models’ outputs, we calculate the Pearson
correlation coeficient ( ) between all models’ predictions.</p>
      <p>The heatmap in Figure 5 summarizes the results across
the  = 5 test splits. We consider the best-performing
setup for each model, namely the multi-task setting
for encoder-based models (see Table 5) and the setup
that led to the best performance for each decoder-based
model (i.e., language and setting; see Table 6).
Encoderbased models exhibit strong positive mutual correlation
( ≥ 0.65; top-left section in Figure 5), indicating high
consistency in the predictions. In contrast, decoder-based
models display low inter-model correlation indicating
greater output variability. Among them,
LlaMAntino-3ANITA-8B shows the highest alignment with
encoderbased models, reaching  = 0.54 with UmBERTo and
BERT-it xxl. Conversely, Minerva-7B consistently shows
no or very weak correlation with other models – with 
ranging from 0.00 to 0.06 – revealing that its outputs are
largely unrelated with those of all other models.</p>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusion</title>
      <p>We introduce WorthIt, the first dataset of Italian
social media posts annotated for factuality/verifiability
and check-worthiness that spans multiple years and
topics and includes human label variation. We conduct
thorough check-worthiness estimation experiments with
encoder- and decoder-based models. Results show that
the former models in a multi-task setting reach the best
results, while the latter models systematically classify
non-factual/verifiable posts as check-worthy, failing to
capture the relation between the two concepts.</p>
      <p>WorthIt’s partial overlap with a dataset for fallacy
detection, faina [30], opens new research avenues for
combining the two tasks. Further opportunities include
modeling human label variation for the check-worthiness
task using the released parallel annotations and
experimenting with additional models, training setups, and
prompting strategies. Finally, the wide temporal
coverage and the diverse set of topics represented in
WorthIt open the field to studies on out-of-distribution
generalization of check-worthiness estimation models.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>This work has been funded by the European Union’s</title>
        <p>Horizon Europe research and innovation program
under grant agreement No. 101070190 (AI4Trust). We also
gratefully acknowledge funding from the German Federal
Ministry of Research, Technology and Space (BMFTR)
under the grant 01IS23072 for the Software Campus project
MULTIVIEW.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Appendix</title>
    </sec>
    <sec id="sec-7">
      <title>A. Search Keywords</title>
      <sec id="sec-7-1">
        <title>In the guidelines, we further include information on</title>
        <p>how to deal with special cases to minimize ambiguity.</p>
        <p>All the cases provided to annotators are outlined below.
☞ Reported speech, including quotations, references
We report the full list of search keywords, divided by to newspaper and TV, is always factual/verifiable. E.g.:
topic, in Table 7. Within squared brackets are the gram- “‘What is done to migrants is criminal’ #PopeFrancis on
matical gender and number variants (if any) that we in- #CTCF #Rai3” is factual/verifiable
cluded for each keyword.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>B. Annotation Guidelines</title>
      <p>For factuality/verifiability annotation, a post can be either
factual/verifiable (i.e., yes label) or non factual/verifiable
(i.e., no). For posts that are factual/verifiable, a
checkworthiness label in a 5-point Likert scale must also be
assigned. Possible labels are: definitely yes, probably
yes, neither yes nor no, probably no, and definitely
no. For both annotation tasks, we strictly follow the
☞ Generic sentences are not factual/verifiable because
they contain imprecise information (e.g., frequent use of
indefinite quantifiers such as various, some, many). E.g.:
guidelines by Nakov et al. [2] and translate them to Italian. “Three months after the collapse of the #MorandiBridge.
The annotation guidelines are presented below.</p>
      <p>[ Factuality/verifiability
Il post contiene un’afermazione fattuale che può
essere verificata? A titolo di esempio, sono
fattuali/verificabili i post che riportano una definizione,
menzionano una quantità nel presente o nel
passato, fanno una previsione verificabile del futuro,
fanno riferimento a leggi, procedure e norme
operative, discutono di immagini o video, e indicano
correlazioni o causalità.
[ Check-worthiness
Credi che l’afermazione contenuta nel post
dovrebbe essere verificata da un fact-checker
professionista? Questa domanda richiede un giudizio
soggettivo basato sulle seguenti domande:
1. L’afermazione espressa nel post potrebbe</p>
      <p>essere falsa?
2. L’afermazione espressa nel post potrebbe
essere di interesse pubblico e/o avere
impatto sulla collettività?
3. L’afermazione espressa nel post potrebbe
danneggiare la società, un gruppo, un
singolo o un’entità?
L’annotazione è necessaria solo se il post è stato
classificato come fattuale/verificabile. Nota:
affermazioni facilmente verificabili dagli utenti
(es. “Gli abitanti della Cina sono la metà di quelli
dell’Italia”) non sono da ritenere check-worthy.</p>
      <p>☞ If the claim is in a subordinate clause, the post is not
factual/verifiable. However, it is factual/verifiable if the
claim is salient and conveys the main information. E.g.:
“Dear #novax who appeals to art.32 of the Constitution, you
should know that the Constitutional Court with ruling no.
307/1990 has decided that a treatment can become
mandatory if it serves to protect oneself and the health of others. So,
if needed, you vaccinate or leave.” is factual/verifiable
From the government only many promises, zero facts and
a totally insuficient decree. ” is not factual/verifiable
☞ Personal opinions are not factual/verifiable, as there
is no clear evidence to support them. E.g.: “Put Salvini
back at the Interior Ministry, he is the only one who can
handle migrants arrivals.” is not factual/verifiable
☞ When the implicit subject can be reconstructed, the
sentence can be factual/verifiable. E.g.: “ When he was
minister and closed the ports he said go ahead and prosecute
me. Then he was investigated and hid behind
parliamentary immunity. When he was minister he insulted Carola
Rackete. Then they propose him a TV debate with her and
he declines the invitation. And they call him Captain.” is
factual/verifiable
☞ Descriptions of images/videos with URLs are
factual/verifiable when they contain an externally verifiable
fact. E.g.: “I receive directly from a Sudanese boy these
images. The migrants are leaving the UNHCR center 15 km
from #Agadez and marching towards the city.” is
factual/verifiable
☞ Posts about weather conditions or temperatures
are considered factual/verifiable when the information
is precise, they specify the type of event described, the
exact location and time. Posts about temperature are not
check-worthy. E.g.: “The situation now in #Catania. I
think there is a small problem with climate change. [URL]”
is not factual/verifiable
☞ Posts describing events (demonstrations, marches,
strikes, rallies, initiatives, assemblies, meetings,
presentations) are always factual/verifiable. They can include
the expressions everyone for, see you on, together with.</p>
      <p>They are generally not check-worthy. E.g.:
“#StopFalsePromises! In the streets of Rome with [USER] for global
climate strike! #ClimateStrike” is factual/verifiable
Search keywords used for collecting posts in WorthIt, with grammatical gender and number variants (if any) indicated using
squared brackets. Note that these exactly match the keywords that have been used to collect the faina dataset [30].
Migration: apolid[e,i]; apolidia; centr[o,i] di accoglienza; centr[o,i] di identificazione ed espulsione; centr[o,i] di permanenza per il rimpatrio; centri di permanenza
per i rimpatri; centr[o,i] di permanenza temporanea; centr[o,i] per il rimpatrio; centri per i rimpatri; corridio[io,i] umanitar[io,i]; domand[a,e] d’asilo; domand[a,e] di
asilo; emigrant[e,i]; emigrat[o,i,a,e]; emigrazion[e,i]; espatr[io,i]; fattor[e,i] di spinta; immigrant[e,i]; immigrat[o,i,a,e]; immigrazion[e,i]; ius sanguinis; migrant[e,i];
migrator[io,i,ia,ie]; migrazion[e,i]; minor[e,i] stranier[o,i] non accompagnat[o,i]; minor[e,i] stranier[a,e] non accompagnat[a,e]; non-refoulemen[t,ts]; permess[o,i]
di soggiorno; procedur[a,e] d’asilo; procedur[a,e] di asilo; protezion[e,i] sussidiari[a,e]; protezion[e,i] umanitari[a,e]; push facto[r,rs]; refoulemen[t,ts];
reinsediament[o,i]; respingiment[o,i]; richiedent[e,i] asilo; rifugiat[o,i,a,e]; rimpatr[io,i]; rimpatriat[o,i,a,e]; sfollat[o,i,a,e]; vittim[a,e] della tratta; vittim[a,e] di tratta
Climate change: acidificazione dell’oceano; acidificazione degli oceani; aerosol atmosferic[o,i]; allagament[o,i]; alluvion[e,i]; alluvional[e,i]; ambientalismo di
facciata; anidride carbonica; antropocene; aridità; bilanc[io,i] climatic[o,i]; bilanc[io,i] energetic[o,i]; bilanc[io,i] idrologic[o,i]; biocombustibil[e,i]; biodegradabil[e,i];
biodegradabilità; biodiversità; biossido di carbonio; cambiament[o,i] climatic[o,i]; cambiament[o,i] del clima; carbon cost; carbon footprint; carbon pricing; carbon tax;
cost[o,i] del carbonio; climate; climate change; climate cris[is,es]; climatic[o,a,i,he]; climatologia; co2; combustibil[e,i] fossil[e,i]; confin[e,i] planetar[io,i]; consum[o,i]
di suolo; crisi climatic[a,he]; deforestazion[e,i]; desalinizzazion[e,i]; desertificazion[e,i]; diossido di carbonio; disboscament[o,i]; dissalazion[e,i]; ecological footprint;
ecologismo di facciata; economi[a,e] circolar[e,i]; efetto serra; emission[e,i]; energi[a,e] rinnovabil[e,i]; esondazion[e,i]; event[o,i] meteorologic[o,i] estrem[o,i];
fenomen[o,i] meteorologic[o,i] estrem[o,i]; finanza sostenibile; fonte di energia rinnovabile; fonti di energia rinnovabil[e,i]; forzant[e,i] radiativ[o,i]; gas serra; gas
silvestre; glacialism[o,i]; glaciazion[e,i]; greenwashing; impronta carbonica; impronta di carbonio; impronta ecologica; innalzamento de[l,i] mar[e,i]; innalzamento
del livello de[l,i] mar[e,i]; innalzamento dei livelli de[l,i] mar[e,i]; inondazion[e,i]; inquinamento atmosferico; inquinamento dell’atmosfera; isol[a,e] di calore; isol[a,e]
urban[a,e] di calore; limit[e,i] planetar[io,i]; meteorologia; microclima; mobilità sostenibile; mutament[o,i] climatic[o,i]; olocene; ondat[a,e] di caldo; ondat[a,e]
di calore; paleoclima; particellato; particolato; pedoclima; permafrost; permagelo; prezz[o,i] del carbonio; proiezion[e,i] climatic[a,he]; report di sostenibilità;
riscaldamento climatico; riscaldamento globale; risch[io,i] climatic[o,i]; scenar[io,i] climatic[o,i]; sciogliment[o,i] dei ghiacciai; siccità; sistem[a,i] climatic[o,i];
sostenibilità ambientale; surriscaldamento climatico; surriscaldamento globale; svilupp[o,i] sostenibil[e,i]; tass[a,e] sul carbonio; transizion[e,i] ecologic[a,he];
transizion[e,i] energetic[a,he]; uso d[el,i] suolo; utilizzazion[e,i] del suolo; utilizzo d[el,i] suolo; variabilità climatic[a,he]
Public health: agend[a,e] di prenotazione; alfabetizzazione alla salute; alfabetizzazione sanitaria; assistenz[a,e] domiciliar[e,i]; assistenz[a,e] ospedalier[a,e];
assistenz[a,e] sanitari[a,e]; assistenza universale; aziend[a,e] ospedalier[a,e]; aziend[a,e] sanitari[a,e]; bisogn[o,i] sanitar[io,i]; calendar[io,i] di prenotazione;
caric[o,hi] di malattia; centro unificato di prenotazione; città san[a,e]; class[e,i] di priorità; comportament[o,i] a rischio; comportament[o,i] di salute; copertur[a,e]
sanitari[a,e]; copertur[a,e] universal[e,i]; cur[a,e] medic[a,he]; cur[a,e] sanitari[a,e]; degent[e,i]; degenz[a,e]; determinant[e,i] della salute; determinant[e,i] di salute;
dimission[e,i] ospedalier[a,e]; dispositiv[o,i] medic[o,i]; disuguaglianz[a,e] di salute; disuguaglianz[a,e] nella salute; disuguaglianz[a,e] sanitari[a,e]; educazione
alla salute; educazione sanitaria; epidemi[a,e]; epidemic[o,a,i,he]; epidemiologia; epidemiologic[o,a,i,he]; equità di salute; equità nella salute; equità sanitari[a,e];
esenzion[e,i] dal ticket; esenzion[e,i] ticket; fattor[e,i] di rischio; indicator[e,i] di salute; investiment[o,i] nella sanità; investiment[o,i] per la salute; investiment[o,i]
per la sanità; isol[a,e] san[a,e]; istitut[o,i] di cura; istituto di sanità pubblica; istituto superiore di sanità; list[a,e] di attesa; malatti[a,e] infettiv[a,e]; ministero
della salute; ministero della sanità; misur[a,e] sanitari[a,e]; ospedali; ospedalier[o,i,a,e]; ospedalizzazion[e,i]; ospitalizzazion[e,i]; pandemi[a,e]; politic[a,he]
sanitari[a,e]; post[o,i] letto; prestazion[e,i] ambulatorial[e,i]; prestazion[e,i] sanitari[a,e]; prestazion[e,i] specialistic[a,he] ambulatorial[e,i]; prevenzione delle
malattie; prevenzione di malattie; prevenzione primaria; prevenzione sanitaria; prevenzione secondaria; prevenzione terziaria; programmazion[e,i] sanitari[a,e];
promozione della salute; promozione di salute; pronto soccorso; ricover[o,i]; salute globale; salute per tutti; salute pubblica; sanità; sanità pubblica; sanitar[io,i,ia,ie];
serviz[io,i] infermieristic[o,i]; serviz[io,i] medic[o,i]; serviz[io,i] sanitar[io,i]; settor[e,i] sanitar[io,i]; sicurezza dell[a,e] cur[a,e]; struttur[a,e] di ricovero; struttur[a,e]
ospedalier[a,e]; struttur[a,e] sanitari[a,e]; terapi[a,e] intensiv[a,e]; trattament[o,i] di salute; trattament[o,i] medic[o,i]; trattament[o,i] sanitar[io,i]; uguaglianz[a,e]
di salute; uguaglianz[a,e] nella salute; uguaglianz[a,e] sanitari[a,e]; vaccin[o,i]; vaccinazion[e,i]</p>
    </sec>
    <sec id="sec-9">
      <title>C. Hyperparameters</title>
      <sec id="sec-9-1">
        <title>For encoder-based models, we use default MaChAmp</title>
        <p>(v0.4.2) [36] hyperparameter values and tune the most
crucial ones during development. The search space for
them is indicated within brackets in Table 8, with best
values underlined. The best loss weight value for the
auxiliary factuality/verifiability task is set to 0.50 for
UmBERTo, BERT-it base, and mBERT, to 0.75 for
XLMRoBERTa, and to 1.00 for AlBERTo and BERT-it xxl.</p>
        <p>For decoder-based models, we use the Hugging Face
Transformers library using default hyperparameter
values and setting the max_new_tokens parameter to 30.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Since all models are instruction-tuned, we structure our</title>
        <p>inputs as conversational prompts using the following
format: {"role":
"user",
"content":
"prompt"}.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>D. Prompts and Examples</title>
      <sec id="sec-10-1">
        <title>We present the prompt templates used for factuality/ver</title>
        <p>ifiability and check-worthiness tasks. For prompts using
guidelines, $[FV|CW]_GUIDELINES placeholders are
replaced with text in the desired language from Table 9.
$[FV|CW]_EXAMPLES placeholders are replaced with</p>
      </sec>
      <sec id="sec-10-2">
        <title>Ora”) is included only in the seq setting, with $FV_LABEL</title>
        <p>representing the factuality/verifiability label obtained for
the same post using the factuality/verifiability prompt.
Examples used for few-shot decoder-based models’ prompting on the test set. Examples refer to set #1 (see Table 2).
Guidelines for both tasks in Italian and English used for prompting decoder-based models in configurations with guidelines.
$FV_GUIDELINES Italian: “Linee guida:\\Un post è fattuale quando contiene informazioni salienti che possono essere verificate esternamente. Tali informazioni
possono essere trovate ovunque, comprese subordinate, sostantivi e hashtag. I discorsi riportati e le citazioni sono sempre fattuali. Anche i post che descrivono
eventi e attività sono sempre fattuali. I post sul meteo o sulla temperatura e le descrizioni di foto e video sono fattuali solo quando le informazioni sono precise e la
località è nota. Al contrario, le afermazioni generiche o vaghe e le opinioni personali non sono fattuali perché non esistono prove chiare a sostegno.” English:
“Guidelines:\\A post is factual when it contains salient information that can be externally verified. Such information can be found everywhere, including subordinates
clauses, nouns and hashtags. Reported discourses and references are always factual. Similarly, posts describing events and activities are always factual. Posts about
weather or temperature, as well as photo and video descriptions, are factual only when the information is precise and the location is known. On the other hand,
generic or vague statements and personal opinions are not factual because there is no clear evidence to support them.”
$CW_GUIDELINES Italian: “Linee guida:\\Un post può essere check-worthy solo se è fattuale. Un post è considerato check-worthy se è rilevante per la società
e può causare danno o modificare le opinioni delle persone. Le afermazioni generiche e le opinioni non sono check-worthy. I post che descrivono eventi climatici e
meteorologici di solito non sono check-worthy perché non contengono informazioni sensibili. Allo stesso modo, i post che menzionano che una specifica attività è
in corso di svolgimento di solito non sono check-worthy.” English: “Guidelines:\\A post can be check-worthy only if it is factual. A post is check-worthy if it is
relevant to society and can cause harm or modify people’s opinions. Generic statements and opinions are not check-worthy. Posts describing climate and weather
events are usually not check-worthy because they do not contain sensitive information. Similarly, posts mentioning that a specific activity is taking place are
usually not check-worthy.”
è solo maggio. e questo #caldo mi terrorizza. ecco. l’ho detto. #crisiclimatica cosa diamine stiamo aspettando???
it’s only May. and this #heat terrifies me. there. I said it. #climatecrisis what the hell are we waiting for???
ma è tipo la seconda volta che i rifugiati recuperati in mare sono 49. mi è preso il sospetto che la libia stia trollando salvini.
but it’s like the second time that the refugees rescued at sea are 49. I got the suspicion that Libya is trolling Salvini.
ho scritto e riscritto che #inceneritore è proposta anti-europea: ue avrebbe eliminato esenzione dell’incenerimento dal pagamento co2 non
più tardi del 2028 perché dannoso e rendendolo ancora meno conveniente. sono stato smentito: oggi hanno votato. dal 2026! [URL]
I’ve written and rewritten that the #incinerator is an anti-European proposal: the EU would have removed the exemption of incineration from CO2
payments no later than 2028 because it’s harmful, making it even less cost-efective. I was contradicted: they voted today. from 2026! [URL]
il fatto che zaia rivoglia il personale “novax” sospeso è la certificazione del danno procurato alla salute pubblica per scelte politiche scellerate
e criminali. semplice.
the fact that Zaia wants the suspended “novax” staf back is proof of the damage caused to public health by reckless and criminal political decisions.
simple.
lei pensa ai fratelli migranti in serbia [URL]
she thinks of the migrant brothers in Serbia [URL]
v Prompt for factuality/verifiability (en)
v Prompt for factuality/verifiability (it)</p>
      </sec>
      <sec id="sec-10-3">
        <title>Classify the post as “factual” or “not factual”.</title>
        <p>Answer only with “factual” or “not factual”.</p>
      </sec>
      <sec id="sec-10-4">
        <title>Classifica il post come “fattuale” o “non fattuale”.</title>
        <p>Rispondi solo con “fattuale” o “non fattuale”.
$FV_GUIDELINES
$FV_EXAMPLES
$POST_TEXT =
$CW_GUIDELINES
$CW_EXAMPLES
$FV_GUIDELINES
$FV_EXAMPLES
$POST_TEXT =
$CW_GUIDELINES
$CW_EXAMPLES
$POST_TEXT =
During the preparation of this work, the author(s) did not use any generative AI tools or services.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <year>2022</year>
          .emnlp-main.
          <volume>731</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poesio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Artstein</surname>
          </string-name>
          , The reliability of anaphoric [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <article-title>A survey annotation, reconsidered: Taking ambiguity into ac-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>Association for Computational Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <article-title>shop on Frontiers in Corpus Annotations II: Pie in</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          178-
          <fpage>206</fpage>
          . doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00454</fpage>
          . the Sky, Association for Computational Linguis[2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , G. Da San Martino, tics, Ann Arbor, Michigan,
          <year>2005</year>
          , pp.
          <fpage>76</fpage>
          -
          <lpage>83</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          , W. Za- https://aclanthology.org/W05-0311/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>ghouani</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Shaar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mubarak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nikolov</surname>
            , [10]
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Welty</surname>
          </string-name>
          ,
          <article-title>Truth is a lie: Crowd truth</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2022 Check- and the seven myths of human annotation</article-title>
          ,
          <source>AI</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>That! lab task 1 on identifying relevant claims in Magazine 36 (</article-title>
          <year>2015</year>
          )
          <fpage>15</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .1609/aimag.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          tweets,
          <source>in: Proceedings of the Working Notes of v36i1.2564.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>CLEF 2022 - Conference</surname>
            and Labs of the Evaluation [11]
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Cabitza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Campagner</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Basile</surname>
          </string-name>
          , Toward a
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Forum</surname>
          </string-name>
          , CEUR-WS.org, Bologna, Italy,
          <year>2022</year>
          .
          <article-title>URL: perspectivist turn in ground truthing for predic-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-28.pdf. tive computing,
          <source>Proceedings of the AAAI Confer</source>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Panchendrarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <source>Claim detection ence on Artificial Intelligence</source>
          <volume>37</volume>
          (
          <year>2023</year>
          )
          <fpage>6860</fpage>
          -
          <lpage>6868</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>for automated fact-checking: A survey on monolin- doi:10</source>
          .1609/aaai.v37i6.
          <fpage>25840</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>gual, multilingual and cross-lingual research</article-title>
          , Nat- [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          , What can we learn
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>ural Language Processing Journal</source>
          <volume>7</volume>
          (
          <year>2024</year>
          )
          <article-title>100066. from collective human opinions on natural lan-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>doi:10</source>
          .1016/j.nlp.
          <year>2024</year>
          .
          <volume>100066</volume>
          .
          <article-title>guage inference data?</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            , [4]
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Konstantinovskiy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Price</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Babakar</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Zubi- Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>tent automated claim detection, Digital Threats 2 Linguistics</source>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>9131</fpage>
          -
          <lpage>9143</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1145/3412869. //aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>734</volume>
          /. doi:10. [5]
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kovatchev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lease</surname>
          </string-name>
          , The state of 18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>734</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>human-centered NLP technology for fact-checking</article-title>
          , [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Karadzhov, M. Mo-
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>Information Processing &amp; Management</source>
          <volume>60</volume>
          (
          <year>2023</year>
          ) htarami, G. Da San Martino, Overview of the
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          103219. doi:
          <volume>10</volume>
          .1016/j.ipm.
          <year>2022</year>
          .103219. CLEF-2019 CheckThat! lab: Automatic identifica[6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Màrquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <source>T. El- tion and verification of claims. Task</source>
          <volume>1</volume>
          : Check-
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>sayed</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Suwaileh</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Zaghouani</surname>
          </string-name>
          , S. Kyuchukov, worthiness, in: Working Notes of CLEF 2019
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>CLEF-2018 CheckThat</surname>
          </string-name>
          <article-title>! lab on automatic identifi- CEUR-WS</article-title>
          .org, Lugano, Switzerland,
          <year>2019</year>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>cation and verification of political claims</article-title>
          .
          <source>Task</source>
          <volume>1</volume>
          : https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2380</volume>
          /paper_269.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <article-title>Check-worthiness</article-title>
          ,
          <source>in: Working Notes of CLEF</source>
          <year>2018</year>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          .org, Avignon, France,
          <year>2018</year>
          . URL:
          <string-name>
            <surname>https: R. Suwaileh</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Haouari</surname>
          </string-name>
          , G. Da San Martino,
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2125</volume>
          /invited_paper_13.pdf. P. Nakov, Overview of CheckThat! 2020 English: [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramponi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>Neural unsupervised do- Automatic identification and verification of claims</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <article-title>main adaptation in NLP-A survey</article-title>
          , in: D. Scott, in social media,
          <source>in: Working Notes of CLEF 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Bel</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          Zong (Eds.),
          <source>Proceedings of the 28th - Conference and Labs of the Evaluation Forum,</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>International Conference on Computational Lin- CEUR-WS.org, Thessaloniki</source>
          , Greece,
          <year>2020</year>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>guistics</surname>
          </string-name>
          , International Committee on Compu- https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_265.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>tational Linguistics</surname>
            , Barcelona, Spain (Online), [15]
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Shaar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hasanain</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Hamdan</surname>
            ,
            <given-names>Z. S.</given-names>
          </string-name>
          <string-name>
            <surname>Ali</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <year>2020</year>
          , pp.
          <fpage>6838</fpage>
          -
          <lpage>6855</lpage>
          . URL: https://aclanthology.org/ F. Haouari,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          , Y. S. Kar-
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          2020.coling-main.
          <volume>603</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . tal,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          , G. Da San Martino, A. Barrón-
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <source>coling-main.603</source>
          .
          <string-name>
            <surname>Cedeño</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Miguez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Beltrán</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Elsayed</surname>
            , P. Nakov, [8]
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>The “problem” of human label variation: Overview of the CLEF-</article-title>
          2021
          <source>CheckThat! lab task 1</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <source>ceedings of the 2022 Conference on Empirical Meth- of CLEF 2021 - Conference and Labs of the Evalu-</source>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>for Computational</surname>
            <given-names>Linguistics</given-names>
          </string-name>
          , Abu Dhabi,
          <year>United 2021</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-28.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <given-names>Arab</given-names>
            <surname>Emirates</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>10671</fpage>
          -
          <lpage>10682</lpage>
          . URL: https: pdf.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          //aclanthology.org/
          <year>2022</year>
          .emnlp-main.
          <volume>731</volume>
          /. doi:10. [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Cheema</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. K.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>Shahi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hakimov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hasanain</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Míguez</surname>
            , [23]
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Valer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramponi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Tonelli</surname>
          </string-name>
          , When you doubt,
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <article-title>of the CLEF-2023 CheckThat! lab task 1 on check- Italian under domain shift</article-title>
          , in: F. Boschetti, G. E.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <source>in: Working Notes of the Conference and Labs of of the 9th Italian Conference on Computational</source>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <article-title>the Evaluation Forum (CLEF 2023), CEUR-WS.org, Linguistics (CLiC-it</article-title>
          <year>2023</year>
          ), CEUR Workshop Pro-
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <surname>Thessaloniki</surname>
          </string-name>
          , Greece,
          <year>2023</year>
          . URL: https://ceur-ws. ceedings, Venice, Italy,
          <year>2023</year>
          , pp.
          <fpage>433</fpage>
          -
          <lpage>440</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          /paper-019.pdf. https://aclanthology.org/
          <year>2023</year>
          .clicit-
          <volume>1</volume>
          .52/. [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weering</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          , [24]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tran</surname>
          </string-name>
          , Accenture
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , at CheckThat! 2021:
          <article-title>Interesting claim identifica-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2024 tion and ranking with contextually sensitive lex-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <article-title>CheckThat! lab task 1 on check-worthiness estima- ical training data augmentation</article-title>
          , in: G. Faggioli,
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <article-title>the Conference and Labs of the Evaluation Forum ceedings of the Working Notes of CLEF 2021 - Con-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <source>(CLEF</source>
          <year>2024</year>
          ),
          <article-title>CEUR-WS</article-title>
          .org, Grenoble, France,
          <year>2024</year>
          .
          <article-title>ference and Labs of the Evaluation Forum</article-title>
          , CEUR-
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-24.pdf. WS.org, Bucharest, Romania,
          <year>2021</year>
          , pp.
          <fpage>659</fpage>
          -
          <lpage>669</lpage>
          . [18]
          <string-name>
            <given-names>N. Salek</given-names>
            <surname>Faramarzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hashemi Chaleshtori</surname>
          </string-name>
          , H. Shi- URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-55.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <string-name>
            <surname>razi</surname>
            ,
            <given-names>I. Ray</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <article-title>Claim extraction</article-title>
          and dy- [25]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Frick</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Vogel</surname>
          </string-name>
          , J.-E. Choi, Fraunhofer SIT
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <article-title>namic stance detection in COVID-19 tweets</article-title>
          , in: at CheckThat! 2023:
          <article-title>Enhancing the detection of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <source>ference</source>
          <year>2023</year>
          ,
          <article-title>Association for Computing Machin- optical character recognition and model souping,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          <string-name>
            <surname>ery</surname>
          </string-name>
          , New York, NY, USA,
          <year>2023</year>
          , pp.
          <fpage>1059</fpage>
          -
          <lpage>1068</lpage>
          . in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , M. Vlachos
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          <source>doi:10.1145/3543873</source>
          .3587643. (Eds.), Working Notes of the Conference and Labs of [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dhar</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
          </string-name>
          ,
          <article-title>Leveraging expectation maxi- the Evaluation Forum (CLEF 2023), CEUR-WS</article-title>
          .org,
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          <article-title>mization for identifying claims in low resource Thessaloniki</article-title>
          , Greece,
          <year>2023</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>350</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          <article-title>Indian languages</article-title>
          , in: S. Bandyopadhyay, S. L. //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          /paper-029.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          <string-name>
            <surname>Devi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          Bhattacharyya (Eds.), Proceedings of the [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sawinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wecel</surname>
          </string-name>
          , E. Ksiezniak, M. Strózyna,
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          18th International Conference on Natural Language W. Lewoniewski,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stolarski</surname>
          </string-name>
          , W. Abramowicz,
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          <article-title>Processing (ICON), NLP Association of India (NL- OpenFact at CheckThat! 2023: Head-to-head GPT</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          <source>PAI)</source>
          , Silchar, India,
          <year>2021</year>
          , pp.
          <fpage>307</fpage>
          -
          <lpage>312</lpage>
          . URL:
          <article-title>https: vs. BERT - A comparative study of transformers</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          //aclanthology.org/
          <year>2021</year>
          .
          <article-title>icon-main.37. language models for the detection of check-worthy</article-title>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Passaro</surname>
          </string-name>
          , T. Caselli, Check-IT!
          <article-title>: A cor- claims</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
          </string-name>
          , G. Faggioli, N. Ferro,
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Boschetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Lebani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Novielli ence and Labs of the Evaluation Forum (CLEF</article-title>
          <year>2023</year>
          ),
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          (Eds.),
          <source>Proceedings of the 9th Italian Conference CEUR-WS.org, Thessaloniki</source>
          , Greece,
          <year>2023</year>
          , pp.
          <fpage>453</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          <string-name>
            <surname>on Computational</surname>
          </string-name>
          <article-title>Linguistics (CLiC-it</article-title>
          <year>2023</year>
          ),
          <article-title>CEUR 472</article-title>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          /paper-040.
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          <string-name>
            <given-names>Workshop</given-names>
            <surname>Proceedings</surname>
          </string-name>
          , Venice, Italy,
          <year>2023</year>
          , pp.
          <fpage>227</fpage>
          -
          <lpage>pdf</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          235. URL: https://aclanthology.org/
          <year>2023</year>
          .clicit-
          <volume>1</volume>
          .29/. [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Savchev</surname>
          </string-name>
          , AI Rational at CheckThat! 2022: Us[21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Scaiella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Costanzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Passone</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Croce, ing transformer models for tweet classification</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          <article-title>for fact verification in Italian</article-title>
          , in: F.
          <string-name>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <source>Proceedings of the Working Notes of CLEF 2022</source>
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          <source>ceedings of the 10th Italian Conference on Compu- CEUR-WS.org</source>
          , Bologna, Italy,
          <year>2022</year>
          , pp.
          <fpage>656</fpage>
          -
          <lpage>659</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          <string-name>
            <surname>tational Linguistics (</surname>
          </string-name>
          CLiC-it
          <year>2024</year>
          ), CEUR Workshop URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-52.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref72">
        <mixed-citation>
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          , Pisa, Italy,
          <year>2024</year>
          , pp.
          <fpage>898</fpage>
          -
          <lpage>908</lpage>
          . URL: [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Panchendrarajan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Zubiaga, FactFind-
        </mixed-citation>
      </ref>
      <ref id="ref73">
        <mixed-citation>
          https://aclanthology.org/
          <year>2024</year>
          .clicit-
          <volume>1</volume>
          .97/. ers at CheckThat! 2024:
          <string-name>
            <surname>Refining</surname>
            check-worthy [22]
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Atanasova</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>I. Augenstein</given-names>
          </string-name>
          ,
          <article-title>Gener- statement detection with LLMs through data prun-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref74">
        <mixed-citation>
          (Eds.),
          <source>Proceedings of the 2020 Conference on the Conference and Labs of the Evaluation Forum</source>
        </mixed-citation>
      </ref>
      <ref id="ref75">
        <mixed-citation>
          <article-title>Empirical Methods in Natural Language Process-</article-title>
          (
          <source>CLEF</source>
          <year>2024</year>
          ),
          <article-title>CEUR-WS</article-title>
          .org, Grenoble, France,
          <year>2024</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref76">
        <mixed-citation>
          <article-title>ing (EMNLP), Association for Computational Lin</article-title>
          - pp.
          <fpage>520</fpage>
          -
          <lpage>537</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref77">
        <mixed-citation>
          <string-name>
            <surname>guistics</surname>
          </string-name>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>3168</fpage>
          -
          <lpage>3177</lpage>
          . URL: https: paper-
          <fpage>47</fpage>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref78">
        <mixed-citation>
          //aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>256</volume>
          /. doi:10. [29]
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorf</surname>
          </string-name>
          , Answering the call
        </mixed-citation>
      </ref>
      <ref id="ref79">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <year>2020</year>
          .
          <article-title>emnlp-main.256. for a standard reliability measure for coding data,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref80">
        <mixed-citation>
          <source>Communication Methods and Measures</source>
          <volume>1</volume>
          (
          <year>2007</year>
          )
          <article-title>the Association for Computational Linguistics:</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref81">
        <mixed-citation>
          77-
          <fpage>89</fpage>
          . doi:
          <volume>10</volume>
          .1080/19312450709336664. System Demonstrations, Association for Compu[30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramponi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dafara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          ,
          <article-title>Fine-grained tational</article-title>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>176</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref82">
        <mixed-citation>
          <article-title>fallacy detection with human label variation</article-title>
          , in: URL: https://aclanthology.org/
          <year>2021</year>
          .eacl-demos.
          <volume>22</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref83">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ritter</surname>
          </string-name>
          , L. Wang (Eds.), Proceed- doi:10.18653/v1/
          <year>2021</year>
          .eacl-demos.
          <volume>22</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref84">
        <mixed-citation>
          <article-title>ings of the 2025 Conference of the Nations of the [</article-title>
          37]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , G. Semeraro, LLaMAntino-
        </mixed-citation>
      </ref>
      <ref id="ref85">
        <mixed-citation>
          <article-title>Americas Chapter of the Association for Compu- 3-ANITA-8B-</article-title>
          <string-name>
            <surname>Inst-DPO-ITA</surname>
            <given-names>model</given-names>
          </string-name>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref86">
        <mixed-citation>
          <source>gies (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , Association for Com- LLaMAntino-3
          <string-name>
            <surname>-ANITA-</surname>
          </string-name>
          8B
          <article-title>-</article-title>
          <string-name>
            <surname>Inst-</surname>
          </string-name>
          DPO-ITA,
        </mixed-citation>
      </ref>
      <ref id="ref87">
        <mixed-citation>
          <string-name>
            <surname>putational Linguistics</surname>
          </string-name>
          , Albuquerque, New Mex- accessed:
          <fpage>2025</fpage>
          -05-01.
        </mixed-citation>
      </ref>
      <ref id="ref88">
        <mixed-citation>
          <string-name>
            <surname>ico</surname>
          </string-name>
          ,
          <year>2025</year>
          , pp.
          <fpage>762</fpage>
          -
          <lpage>784</lpage>
          . URL: https://aclanthology. [38]
          <string-name>
            <given-names>R.</given-names>
            <surname>Orlando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Moroni</surname>
          </string-name>
          , P.-L. Huguet
          <string-name>
            <surname>Cabot</surname>
          </string-name>
          , S. Co-
        </mixed-citation>
      </ref>
      <ref id="ref89">
        <mixed-citation>
          org/
          <year>2025</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>34</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2025</year>
          . nia, E. Barba,
          <string-name>
            <given-names>S.</given-names>
            <surname>Orlandini</surname>
          </string-name>
          , G. Fiameni, R. Nav-
        </mixed-citation>
      </ref>
      <ref id="ref90">
        <mixed-citation>
          <article-title>naacl-long.34. igli, Minerva LLMs: The first</article-title>
          family of large [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Semeraro, language models trained from scratch on Italian</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref91">
        <mixed-citation>
          <article-title>standing model for NLP challenging tasks based R</article-title>
          . Sprugnoli (Eds.),
          <source>Proceedings of the 10th Italian</source>
        </mixed-citation>
      </ref>
      <ref id="ref92">
        <mixed-citation>
          aro (Eds.),
          <source>Proceedings of the Sixth Italian</source>
          Confer- it
          <year>2024</year>
          ), CEUR Workshop Proceedings, Pisa, Italy,
        </mixed-citation>
      </ref>
      <ref id="ref93">
        <mixed-citation>
          <article-title>ence on Computational Linguistics, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>707</fpage>
          -
          <lpage>719</lpage>
          . URL: https://aclanthology.org/
        </mixed-citation>
      </ref>
      <ref id="ref94">
        <mixed-citation>
          <string-name>
            <surname>Bari</surname>
          </string-name>
          , Italy,
          <year>2019</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2481</volume>
          /
          <year>2024</year>
          .clicit-
          <volume>1</volume>
          .77/.
        </mixed-citation>
      </ref>
      <ref id="ref95">
        <mixed-citation>
          paper57.pdf . [39]
          <string-name>
            <surname>Qwen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Hui</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zheng</surname>
            , [32]
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Parisi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Francia</surname>
          </string-name>
          , P. Magnani, UmBERTo: B.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref96">
        <mixed-citation>
          <source>word masking</source>
          ,
          <year>2020</year>
          . URL: https://github.com/ J. Lin,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dang</surname>
          </string-name>
          , et al.,
          <source>Qwen2.5 technical report,</source>
        </mixed-citation>
      </ref>
      <ref id="ref97">
        <mixed-citation>
          musixmatchresearch/umberto, accessed:
          <fpage>2025</fpage>
          -
          <lpage>05</lpage>
          - arXiv preprint arXiv:
          <volume>2412</volume>
          .15115 (
          <year>2025</year>
          ). URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref98">
        <mixed-citation>
          01. //arxiv.org/abs/2412.15115. [33]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          ,
          <string-name>
            <surname>Italian</surname>
            <given-names>BERT</given-names>
          </string-name>
          and ELECTRA models, [40]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grattafiori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jauhri</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Pandey,
        </mixed-citation>
      </ref>
      <ref id="ref99">
        <mixed-citation>
          2020. doi:
          <volume>10</volume>
          .5281/zenodo.4263142,
          <string-name>
            <surname>accessed</surname>
            <given-names>: A.</given-names>
          </string-name>
          <string-name>
            <surname>Kadian</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Al-Dahle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Letman</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Mathur,
        </mixed-citation>
      </ref>
      <ref id="ref100">
        <mixed-citation>
          2025-05-
          <fpage>01</fpage>
          . A.
          <string-name>
            <surname>Schelten</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vaughan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            , [34]
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Devlin</surname>
            , M.-
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Toutanova</surname>
          </string-name>
          , BERT: A.
          <string-name>
            <surname>Hartshorn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Sravankumar,
        </mixed-citation>
      </ref>
      <ref id="ref101">
        <mixed-citation>
          <article-title>language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Do- 3 herd of models</article-title>
          ,
          <source>arXiv preprint arXiv:2407.21783</source>
        </mixed-citation>
      </ref>
      <ref id="ref102">
        <mixed-citation>
          <string-name>
            <surname>ran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019</source>
          Con- (
          <year>2024</year>
          ). URL: https://arxiv.org/abs/2407.21783.
        </mixed-citation>
      </ref>
      <ref id="ref103">
        <mixed-citation>
          <string-name>
            <given-names>Language</given-names>
            <surname>Technologies</surname>
          </string-name>
          , Volume
          <volume>1</volume>
          (Long and Short
        </mixed-citation>
      </ref>
      <ref id="ref104">
        <mixed-citation>
          <string-name>
            <surname>tics</surname>
          </string-name>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref105">
        <mixed-citation>URL: https://aclanthology.org/N19-1423/. doi:10.</mixed-citation>
      </ref>
      <ref id="ref106">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <fpage>N19</fpage>
          -1423. [35]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          , V. Chaud-
        </mixed-citation>
      </ref>
      <ref id="ref107">
        <mixed-citation>
          <article-title>ceedings of the 58th Annual Meeting of the Associa-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref108">
        <mixed-citation>
          <string-name>
            <given-names>Computational</given-names>
            <surname>Linguistics</surname>
          </string-name>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref109">
        <mixed-citation>
          8451. URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
        </mixed-citation>
      </ref>
      <ref id="ref110">
        <mixed-citation>
          <volume>747</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>747</volume>
          . [36]
          <string-name>
            <surname>R. van der Goot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Üstün</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramponi</surname>
          </string-name>
          , I. Sharaf,
        </mixed-citation>
      </ref>
      <ref id="ref111">
        <mixed-citation>
          <source>16th Conference of the European Chapter of</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>