<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the CLEF-2025 CheckThat! Lab Task 3 on Fact-Checking Numerical Claims</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Venktesh V</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vinay Setty</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Avishek Anand</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boushra Bendou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maram Hasanain</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Houda Bouamor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriel Iturra-Bocaz</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petra Galuščáková</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Firoj Alam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carnegie Mellon University in Qatar</institution>
          ,
          <country country="QA">Qatar</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Delft University of Technology</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Qatar Computing Research Institute</institution>
          ,
          <country country="QA">Qatar</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Stavanger</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>We present an overview of the CheckThat! Lab 2025 Task 3, part of CLEF 2025. The task focuses on verifying claims with numerical quantities and temporal expressions. Numerical claims are defined as those requiring validation of explicit or implicit quantitative or temporal details. It is conducted in three languages: Arabic, Spanish, and English. A total of 258 valid runs were submitted by 13 unique teams across languages, with 4 participants in Spanish and Arabic. 10 teams participated in fact-checking English numerical claims. Among these teams, the use of transformer pre-trained language models (PLMs) was the most frequent. A few teams also employed Large Language Models (LLMs). We provide a description of the dataset, the task setup, including evaluation settings, and a brief overview of the participating systems. As is customary in the CheckThat! Lab, we release all the datasets as well as the evaluation scripts to the research community. This will enable further research on identifying challenges with fact-checking numerical claims that can assist various stakeholders, such as fact-checkers, financial research analysts, and policymakers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        There has been growing interest in developing tools [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], methods [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and benchmarks [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] to enhance
the fact-checking process. Automating fact-checking is challenging, as many claims are complex and
require sophisticated reasoning for accurate validation, especially those involving numerical data.
Numerical claims often appear more credible due to the Numeric-Truth efect [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], leading to uncritical
acceptance. Recent studies show verifying numerical claims is more dificult than non-numerical
ones [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. For example, the social media claim that “CDC quietly deletes 6,000 COVID vaccine deaths
from its website” exaggerates a clerical correction, causing unnecessary panic. This demonstrates the
need for automated verification of such misleading claims.
      </p>
      <p>
        This task focuses on verifying claims with numerical quantities and temporal expressions. Numerical
claims are defined as those requiring validation of explicit or implicit quantitative or temporal details.
Participants must classify each claim as True, False, or Conflicting based on a short list of evidence.
Each claim is accompanied by the top-100 pieces of evidence retrieved using BM25 from our collection.
The collection was carefully curated through pooling evidences from retrieval using diferent query
understanding mechanisms [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. These evidences can be used after re-ranking to perform claim
verification with a classification or generative model that can perform the task of Natural Language
Inference (NLI). The objective here is to also evaluate the numerical reasoning capabilities of the claim
verification model. The task is available in English, Spanish, and Arabic.
      </p>
      <p>
        The CheckThat! 2025 lab was held in the framework of CLEF 2025 [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ].1 Figure 1 shows the
      </p>
      <p>T1: Should we
check an opinion
piece?</p>
      <p>T2: Can we extract</p>
      <p>claims and
normalize them?</p>
      <p>T4b: Can we identify
mentioned scientific
papers?</p>
      <p>T3: Can we
fact-check numerical
claims?
check-worthiness</p>
      <p>estimation
T4a: Can we retrieve
scientific claims?
verified claim</p>
      <p>retrieval
past verified
claim
supporting
evidence
retrieval</p>
      <p>claim
verification</p>
      <p>TRUE
FALSE
OTHER
full CheckThat! identification and verification pipeline, highlighting the four tasks targeted in this
seventh edition of the lab: Task 1 on subjectivity, Task 2 on claims extraction &amp; normalization, Task 3
on numerical claims (this paper), and Task 4 on scientific web discourse.</p>
      <p>The remainder of the paper is organized as follows: Section 3 describes the datasets released with the
task. We present the evaluation setup in Section 4. Section 5 discusses the system submissions and the
oficial results. Section 2 presents some related work. Finally, we provide some concluding remarks in
Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Automated claim-verification is key to mitigating growing misinformation [
        <xref ref-type="bibr" rid="ref12 ref13 ref2">12, 13, 2</xref>
        ]. Existing works
on automated fact-checking primarily focus on synthetic claims collected from Wikipedia [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 14, 13</xref>
        ]
that are not representative of real-world claims. More eforts have been made to build systems for
real-world claims in domains like politics [
        <xref ref-type="bibr" rid="ref15 ref16 ref17 ref8">8, 15, 16, 17</xref>
        ], science [
        <xref ref-type="bibr" rid="ref18 ref19 ref20">18, 19, 20</xref>
        ], health [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and climate
change [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>
        To combat real-world misinformation, verifying claims containing numerical information is especially
important. Such claims, citing statistics, figures, or time spans are often perceived as more credible,
a phenomenon known as the numeric truth efect [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While real-world fact-checking benchmarks
have been proposed, they do not particularly focus on numerical claims [
        <xref ref-type="bibr" rid="ref23 ref3 ref8">23, 8, 3</xref>
        ]. There are synthetic
datasets that require tabular data to verify the claims [
        <xref ref-type="bibr" rid="ref24 ref25 ref26">24, 25, 26</xref>
        ], but these claims and tables do not
necessarily contain numerical quantities. Recent eforts by [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] aim to create more realistic claims from
Wikipedia by identifying cited statements, but these do not reflect the typical distribution of claims
verified by fact-checkers.
      </p>
      <p>
        Among those that focus on simple statistical claims, [
        <xref ref-type="bibr" rid="ref28 ref29">28, 29</xref>
        ], the authors propose a weak supervision
approach using freebase. These claims are not only synthetic in nature, but they can be answered with
simple KB facts such as Freebase and does not require numerical understanding or reasoning. Similarly,
[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] explore the extraction of formulae for checking numerical consistency in financial statements by
also relying on Wikidata.
      </p>
      <p>
        QuanTemp [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] was the first real-world open-domain benchmark for fact-checking numerical claims.
It comprises real-world claim containing quantitative or temporal information and require verification
of these numerical information to verify the claim which in turn requires numerical reasoning. The goal
of the benchmark was to also test numerical contextualization and numerical reasoning capabilities
of transformer based NLI models [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] and LLMs [
        <xref ref-type="bibr" rid="ref33 ref34">33, 34</xref>
        ]. The benchmark was also constructed in an
open-domain setting to improve retrieval and ranking capabilities for fact-checking. In the original
work, we demonstrate that LLMs fall short and are ill-suited for verifying such numerical claims when
compared to focused fine-tuning of smaller NLI models pre-trained to interpret numerical quantities.
This is further extended to other languages like Arabic and Spanish for Task 3 in this iteration of
CheckThat!.
      </p>
      <p>
        The focus on claim verification has been a focus previous editions of the CLEF CheckThat!lab in
2018-2023 [
        <xref ref-type="bibr" rid="ref35 ref36 ref37 ref38">35, 36, 37, 38</xref>
        ]. The initial edition [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ] proposed Task 2 which dealt with verifying claims
made by politicians as part of debates or speeches and was ofered in English and Arabic. The data
was collected from 2016 US presidential campaign and participants were asked to classify the claims to
one of true, false or half-true categories. Subsequent iterations ofered a task (Task 3) verification of
claims in news articles with associated topics [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] with veracity prediction being on a four point scale:
true, false, partially true or other. CLEF CheckThat!2022 particularly focused on verifying the central
claims in news articles [40].
      </p>
      <p>Following this tradition, we ofer a claim veracity prediction task but particularly focused on numerical
claims across three languages: English, Spanish and Arabic.</p>
      <p>Example: Claim decomposition example
Claim: Discretionary spending has increased over 20-some percent in two years
if you don’t include the stimulus. If you put in the stimulus, it’s over 80
percent.
[Decomposition]: [Q1]: Has discretionary spending increased in the past two years?
[Q2]: Does the increase in discretionary spending exclude the stimulus?
[Q3]: Is there evidence to support the claim that
Claim: A claim is designated as numeric if the numeric aspect is one of the crucial
aspects to be verified in the claim. While there culd be other important aspects
that may determine if a claim is correct or not, if numerical claim is one of
the aspects it is still a numerical claim.</p>
      <p>Examples: For instance in the claim Example 1: “A man wearing justice for breonna taylor shot and
killed 3 men in a retired cops bar."
Here there are several aspects to be verified: that the man was wearing the said t-shirt linking him to
BLM protests, he killed 3 men and the act was carried out at retired cops bar.</p>
      <p>While there are 3 aspects one crucial aspect is verification of fact that if he killed 3 men. because if this
number say was misrepresented it would cause more panic due to spread of such misinformation.
Example 2: “The chattsigarh police conveyed that naxal terrorists were involved in blast on January 22,
1794 " Here while the important aspect to be verified seems to be the terrorist group part, it is also a
temporal claim as it is crucial to verify if blasts occured on said date.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Datasets</title>
      <p>The dataset contains multigenre content in Arabic, English, and Spanish. The dataset is collected
from various fact-checking domains through Google Fact-check Explorer API2, complete with detailed
metadata and an evidence corpus sourced from the web. Our pipeline filters out numerical claims for
the task. An overview of dataset statistics is shown in Table 1.
2https://toolbox.google.com/factcheck/apis
Extraction of quantitative segments: To extract numerical claims and filter out non-numerical ones,
we employ the constituency parse of the original claim as shown in Figure 4. Figure 4 demonstrates
how quantitative segments are identified using constituency parse of an English claim.</p>
      <p>The nodes with the cardinal number POS tag “CD” are identified from the constituency parse. To
avoid false positives (for example: “The one and only”), we then parse these nodes’ ancestors and
extract noun phrases from their least common ancestors. Using these noun phrases as root nodes, we
perform a prefix traversal of their subtrees. We then filter from set of all claims, those with at least one
quantitative segment, as numerical claims for our dataset.</p>
      <p>This approach still has one limitation, as it may include claims with non-quantitative terms like
“Covid-19” or “F-35". To remedy this, we require more than one quantitative segment, excluding any
nouns like “Covid-19” mentions, to qualify as a numerical claim. Our self-assessment of 1000 sample
claims from English from the dataset indicates a 95% accuracy of our approach.</p>
      <p>We follow the same process for Arabic. For extraction of Spanish numerical claims, the process
is similar with exception of POS tag used for identifying numerical spans. The POS tag corresponds
to “NUM" instead of “CD". We follow a similar manual evaluation process for Spanish and observe
an accuracy of 97% demonstrating that our approach to identify quantitative segments and extract
numerical claims is robust across languages.</p>
      <p>
        Data characteristics: We use the train and validation sets from the English dataset released in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and
also curate Arabic and Spanish claims.
      </p>
      <p>
        English: For the English test set we collect new real-world English numerical claims additionally to
the evaluation set released in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to avoid label leakage. We also analyze the distribution of diferent
categories of numerical claims namely: statistical, comparative, interval and temporal with examples
shown in Table 2.
      </p>
      <p>Arabic: The Arabic dataset only consists of claims belonging to the categories True and False for
verification, as real-world distribution of conflicting claims for Arabic is too low. The claims in Arabic
dataset collected from diverse online sources, including news outlets, social media, and professional
fact-checking websites. Our goal was to curate a dataset that captures a wide range of topics such as
politics, health, and economics, where numerical misinformation is common. We also incorporated
verified claims from AraFacts [ 41], a large dataset of professionally fact-checked Arabic claims. A
distribution of diferent categories of numerical claims along with the examples in as shown in Table 4.</p>
      <p>Spanish: We employ a similar approach adopted for English to filter numerical claims with the
exception of change in POS tag used. The guidelines shown in Figure 3 were employed by an expert
in the language to verify the correctness of the curated numerical claims. A distribution of diferent
categories of numerical claims along with the examples in as shown in Table 3.</p>
      <p>Evidence Collection: Evidence for claims in all languages were obtained from search engines by
excluding fact-checking websites to avoid leakage of fact-checker justification and verdict. For each
claim, we decompose them to yes/no type sub-questions as shown in Figure 2, and issue the original
claim and generated sub-questions as queries to the search engines. Additionally, for English, evidences
are also obtained by other decomposition approaches like subclaim generation to increase diversity of
evidence pool. All evidences are pooled to form the collection.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation Settings</title>
      <p>The numerical claim verification task primarily involved classifying the claim given the evidence pool
to one of the three classes True, False or Conflicting by verifying aspects of the claim against evidence.
While it was posed as a three-way classification task for English and Spanish, it was a two-way /True
or False) classification task for Arabic.
AKðPñ» ðQ ®Ë ©KQå PAJ.Jk@ ­Ë@</p>
      <p>.  ñJË
Interval
Comparison</p>
      <sec id="sec-4-1">
        <title>4.1. Metrics</title>
        <p>We ranked the participants based on the Macro-F1 scores and not accuracy to account for class imbalance.
Additionally, participants were also asked to report per-class F1 scores to measure if the systems
performed reasonable well across diferent veracity classes.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evidence and Retrieval Setup</title>
        <p>Apart from the evidence collection mentioned in Section 3, we also provide the evidences from
firststage retrieval. For each claim we retrieve top-100 evidences using BM25 based on Elasticsearch
implementation. Top-100 evidences were independently provided for each decomposition approach to
ofer flexibility for participants to use any decomposition approach.</p>
        <p>Participants were encouraged to apply any re-ranking method to the provided evidence. They were
also allowed to use their own retrieval approaches over the full document collection, as the BM25 results
were ofered solely for convenience. For claim verification, participants were free to use any model of
their choice.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Overview of the Systems</title>
      <sec id="sec-5-1">
        <title>5.1. Results</title>
        <p>Table 5 shows the performance of the oficial submissions on the test set. The oficial run was the last
valid blind submission by each team. The table shows the runs ranked on the basis of the macro-F1 and
includes all three languages.</p>
        <p>A total of 258 valid runs were submitted by 13 unique teams across all languages. Among them,
10 teams participated in fact-checking English numerical claims, while 4 teams submitted runs for
Spanish and Arabic. The final leaderboard in Table 5 includes 12 teams across languages, as one of the
submissions was withdrawn from the leaderboard.</p>
        <p>
          English: A total of 10 teams participated in fact-checking numerical claims in English. Most of the
teams employed BM25 evidence followed by re-ranking using cross-encoder. Several teams employed
creative approaches like data augmentation by translation from claims in Arabic and Spanish. At least 6
teams employ LLM based approach to reason over the evidence and provide claim veracity predictions.
Only one of the teams managed to outperform the baseline reported in [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. Only the system from LIS
outperformed the strongest baseline from the original English benchmark [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] though this result does
not seem to be statistically significant.
        </p>
        <p>Arabic: Four teams participated in Arabic with LIS being the top performing system with surprisingly
high macro-F1 of 96.15 %. This system is the same as their system employed for English claims. Since they
employ LLMs with ability to handle multiple languages, their solution generalizes beyond English. The
high gains in Arabic can also be explained by the task not having a Conflicting task, and it’s reduction to
binary classification of True or False compared to English and Spanish. The main advantage of system
from LIS over other systems which underperform in Arabic seems to be improvement of retrieval using
dense retrieval approaches like Linq-Embed-Mistral and fine-tuning LLMs instead of prompting for
veracity prediction / NLI.</p>
        <p>Spanish: We observe a trend in Spanish similar to that of English. Spanish proves to be much harder
due to the three-way classification task, where conflicting is hard to detect due to multi-aspect nature of
the claim and granularity of reasoning required to identify half-true aspects of the original claim. While
LIS team outperforms other teams, they only manage to attain a macro-F1 of 50.34. This could also be
explained by lack of MTEB evaluations in Spanish for Linq-Embed-Mistral. Hence, better alternatives
to this retrieval model could exist. It also highlights the need for improving numerical reasoning and
parsing capabilities of LLMs when employed for claim verification. As very few benchmarks exist for
Spanish fact-checking and almost none for numerical fact-checking, the drop in performance could be
explained by lack of suficient training data for fine-tuning the LLM employed for the veracity prediction
part.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Overview of the Systems</title>
        <p>Among all participating teams, LIS was the top performer across all languages. TIFIN, NGU_Research,
DS@GT-CheckThat! performed well in the respective languages they participated. Most teams
employed generative models like gpt-4o-mini or Qwen LLMs to decompose claims, followed by BM25
based retrieval for retrieving evidence and transformer based cross-encoder models for re-ranking the
evidences. For claim verification, fine-tuned transformer based NLI models were employed by some
teams where transformers were trained as discriminative models on the training sets provided. Some
teams employed prompting based approaches to leverage LLMs like gpt-4o-mini or reasoning models
like deepseek-r1 to perform claim verification.</p>
        <p>
          Team LIS [42] used QwQ-32B to generate question followed by Linq-Embed-Mistral to retrieve
evidence from the corpus by combining the questions and claims. Mistral-Small-24B-Instruct-2501 was
ifne-tuned to obtain the final veracity labels. The Qwen model seem to overcome certain limitations
associated with GPT-3.5 and GPT-4 series models used in baselines [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Particularly LIS demonstrated
that LLMs can be improved on task of claim verification through fine-tuning and by employing reasoning
based LLMs to improve claim decomposition.
        </p>
        <p>Team DS@GT-CheckThat! [43] performed pre-processing to normalize the number and dates of
the claims and decomposed questions from these claims. They employed GPT-4o-mini to decompose
the claims. BM25 was employed for first stage retrieval to prioritize documents relevant to the claim
and sub-questions. This is followed by re-ranking the documents using
cross-encoder/ms-marcoMiniLM-L-12-v2 or mixedbread-ai/mxbai-rerank-large-v1. The main workhorse model for the veracity
classification was ModernBERT - an optimized model based on the BERT architecture, that can natively
support longer sequence length.</p>
        <p>Team TIFIN [44] employed inverse class weighting in the claim verification step to address class
imbalance and give greater importance to minority classes. They also used strategies such as
oversampling to balance training examples and label smoothing to prevent the model from becoming
overconfident in its predictions. Additionally, they incorporated Focal Loss to fine-tune the verification
model microsoft/deberta-large-mnli using LoRA, allowing the model to focus on harder examples. To
further enhance performance, they used the ibm-granite/granite-3.3-8b-instruct model to summarize
contexts before feeding them to the verification model. An interesting insight presented by the authors
is that performance does not scale linearly with model size – smaller LLMs outperformed 70B-scale
LLaMA models, challenging assumptions based on scaling laws. Their approach also demonstrated the
benefits of multilingual data augmentation in improving claim verification performance.</p>
        <p>Team NGU_Research [47] employed a hybrid retrieval approach, experimenting with various
pretrained encoder-based models – including BGE, E5, and Gemini – as embedding models. They
ultimately selected pretrained embeddings from OpenAI’s text-embedding-3-large model, combined
with BM25 filtering using Qdrant database collections for each language. For claim verification, they
used DeepSeek and GPT-4o-mini on the retrieved evidence.</p>
        <p>Team ClaimIQ [45] presented their core approach of fine-tuning LLMs using Low-Rank Adaptation
(LoRA) for the task of claim verification, combined with existing retrieval and ranking strategies to
procure evidence. The authors observed that this approach outperformed prompting-based methods
and other NLI models on the validation set. However, these performance gains did not generalize
to the evaluation set, highlighting that fine-tuning larger LLMs can lead to overfitting. This further
underscores the challenging nature of the task, which requires models not only to classify but also to
perform numerical contextualization and reasoning.</p>
        <p>Team Fraunhofer_SIT [46] follows a three-stage architecture: (1) evidence candidate retrieval
using dense vectors, (2) re-ranking using a fine-tuned cross-encoder, and (3) final claim classification
using a large NLI model (Roberta-large-MNLI). The authors first pre-compute document embeddings
using sentence-transformers/all-MiniLM-L6-v2 and store them in a FAISS index to enable eficient
nearest-neighbor retrieval. At inference time, the top-100 candidate evidence snippets are retrieved for
each claim. These are then re-ranked using a custom cross-encoder based on
cross-encoder/ms-marcoMiniLM-L-6-v2, fine-tuned in a contrastive learning setup detailed as follows.</p>
        <p>To improve re-ranking, the authors opted not to use the vanilla MS MARCO model employed in
the baseline. Instead, they leveraged weak supervision by using gold-labeled evidence snippets for
training and validation claims. These gold snippets were summarized using LLaMA 3.1-8B to remove
irrelevant or noisy content, resulting in cleaner positive examples. The cross-encoder was fine-tuned
using a contrastive approach, where the claim served as the anchor, the summarized gold evidence
as the positive, and the top-100 BM25-retrieved documents as negatives. Although they ranked fifth
on the leaderboard, their approach demonstrates that enhancing retrieval can significantly boost
downstream NLI performance without relying on expensive LLM-based methods for claim verification.</p>
        <p>The KSU team employed BM25 evidence with cross-encoder based re-ranking. Additionally, since
evidence snippets are long, they employed a unsloth/Qwen3-8B-bnb-4bit model to filter out most
important snippets. This was followed by a fine-tuned NLI model over the snippets for veracity
prediction.</p>
        <p>
          However, even the top performing system from LIS did not reach the best possible performance
attainable using the provided evidence collection, as outlined in [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. This reinforces the observation
in [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] that LLMs struggle to contextualize and accurately interpret numerical information in claims
and evidence. It highlights the challenging nature of the task, which requires reasoning over mixed
modalities of numerical and textual data, the ability to contextualize and compare numerical values,
and performing numerical reasoning for claim verification. These findings also demonstrate that the
task is far from being solved.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>
        We presented an overview of Task 3 of the CLEF 2025 CheckThat!lab, which focused on fact-checking
numerical claims. Approaches ranged from prompting Large Language Models (LLMs) to fine-tuning
open-source LLMs of smaller parameter scales. Some participants also focused on improving evidence
re-ranking and demonstrated this can improve claim verification performance with smaller transformers
models. Some participants also employed creative data enrichment techniques by translating claims
from other languages to train the NLI model on a larger, well-balanced augmented dataset. While
some efort was made on improving loss functions used to train the NLI model this did not yield any
statistically significant improvements. Only the top-1 result in English leaderboard outperformed the
strong baseline reported by organizers in [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. However, the gains of this system over the baseline
was not statistically significant and also falls short of the upper bound by a large margin reported in
[
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. This demonstrates that numerical fact-checking is not yet solved and might require exploration of
improving LLMs or NLI models capabilities to perform numerical contextualization and reasoning. It
also demonstrates a need to scale and improve reasoning capabilities of LLMs which could be the focus
on future iterations of CLEF CheckThat!.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and spelling
check and rewording some text. After using this tool, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s content. No other generative AI or model was
used in preparation of this paper.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The work of F. Alam and M. Hasanain is partially supported by NPRP 14C-0916-210015 from the Qatar
National Research Fund, part of Qatar Research Development and Innovation Council (QRDI).
2018, Avignon, France, September 10-14, 2018, Proceedings, 2018, pp. 372–387. doi:10.1007/
978-3-319-98932-7_32.
[40] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez, T. Caselli,
M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov, N. Babulkov, Y. S.
Kartal, J. Beltrán, The clef-2022 checkthat! lab on fighting the covid-19 infodemic and fake news
detection, in: M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog, K. Nørvåg, V. Setty (Eds.),
Advances in Information Retrieval, Springer International Publishing, Cham, 2022, pp. 416–428.
[41] Z. Sheikh Ali, W. Mansour, T. Elsayed, A. Al-Ali, AraFacts: the first large arabic dataset of naturally
occurring claims, in: Proceedings of the Sixth Arabic Natural Language Processing Workshop,
2021, pp. 231–236.
[42] Q. T. Le, I. Badache, A. Yacoub, M. E.-A. Hamri, Lis at checkthat! 2025: Multi-stage open-source
large language models for fact-checking numerical claims, in: [51], 2025.
[43] M. Heil, A. Pramov, Ds@gt at checkthat! 2025: Evaluating context and tokenization strategies for
numerical fact verification, in: [51], 2025.
[44] B. Hazarika, P. Devadiga, P. K. Rajpoot, A. U. Baliga, K. Gurumurthy, M. Jain, M. Sharma, A.
Shrivastava, A. Suneesh, A. B. Suresh, Tifin at checkthat! 2025: X-verify - multi-lingual nli-based fact
checking with condensed evidence, in: [51], 2025.
[45] A. S. Anik, M. F. K. Chowdhury, A. Wyckof, S. R. Choudhury, ClaimIQ at CheckThat! 2025:
Comparing prompted and fine-tuned language models for verifying numerical claims, in: [ 51],
2025.
[46] A. Runewicz, P. M. Ranly, I. Vogel, M. Steinebach, Fraunhofer SIT at CheckThat! 2025:
Multiinstance evidence pooling for numerical claim verification, in: [51], 2025.
[47] M. A. Abdallah, R. M. Fekry, S. R. El-Beltagy, NGU_Research at checkthat! 2025: An LLM based
hybrid fact-checking pipeline for numerical claims, in: [51], 2025.
[48] L. Duesterwald, A. Arora, C. Cardie, CornellNLP at CheckThat! 2025:hybrid llama–gpt-4 ensembles
with confidence filtering for numerical claim verification, in: [51], 2025.
[49] M. d. C. Toapanta Bernabé, M. A. García Cumbreras, L. A. Ureña López, D. Mora, UGPLN at
CheckThat! 2025: Meta-ensemble transformers for numerical claim verification in spanish, in:
[51], 2025.
[50] G. Acosta, E. Morales, H. Gómez-Adorno, UCOM_UNAM_PLN @ Checkthat 2025: Evaluating
llms in a two-step architecture for numerical fact checking, in: [51], 2025.
[51] G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 - Conference and Labs
of the Evaluation Forum, CLEF 2025, Madrid, Spain, 2025.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          , Factcheck editor:
          <article-title>Multilingual text editor with end-to-end fact-checking</article-title>
          ,
          <source>in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>2744</fpage>
          -
          <lpage>2748</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          , A survey on
          <source>automated fact-checking, Transactions of the Association for Computational Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>178</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Augenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Chaves</given-names>
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. G. Simonsen,</surname>
          </string-name>
          <article-title>MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>4685</fpage>
          -
          <lpage>4697</lpage>
          . URL: https://aclanthology.org/D19-1475. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1475.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <article-title>Averitec: A dataset for real-world claim verification with evidence from the web</article-title>
          , in: A.
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Globerson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Saenko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hardt</surname>
          </string-name>
          , S. Levine (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>36</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2023</year>
          , pp.
          <fpage>65128</fpage>
          -
          <lpage>65167</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2023/ ifle/cd86a30526cd1af61d6f89f107634e4-Paper-Datasets_and_Benchmarks.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sagara</surname>
          </string-name>
          ,
          <article-title>Consumer understanding and use of numeric information in product claims</article-title>
          , University of Oregon,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Venktesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <article-title>Quantemp: A real-world open-domain benchmark for fact-checking numerical claims</article-title>
          ,
          <source>in: 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2024</year>
          ,
          <article-title>Association for Computing Machinery</article-title>
          (ACM),
          <year>2024</year>
          , pp.
          <fpage>650</fpage>
          -
          <lpage>660</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cocarascu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>FEVEROUS: fact extraction and verification over unstructured and structured information</article-title>
          , in: J.
          <string-name>
            <surname>Vanschoren</surname>
          </string-name>
          , S. Yeung (Eds.),
          <source>Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks</source>
          <volume>1</volume>
          ,
          <string-name>
            <given-names>NeurIPS</given-names>
            <surname>Datasets</surname>
          </string-name>
          and
          <source>Benchmarks</source>
          <year>2021</year>
          ,
          <year>December 2021</year>
          , virtual,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sriram</surname>
          </string-name>
          , E. Choi, G. Durrett,
          <article-title>Generating literal and implied subquestions to factcheck complex claims</article-title>
          , in: Y.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kozareva</surname>
          </string-name>
          , Y. Zhang (Eds.),
          <source>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>3495</fpage>
          -
          <lpage>3516</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .emnlp-main.
          <volume>229</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .emnlp-main.
          <volume>229</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Luu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Fact-checking complex claims with program-guided reasoning</article-title>
          ,
          <source>arXiv preprint arXiv:2305.12744</source>
          (
          <year>2023</year>
          ). URL: https: //arxiv.org/abs/2305.12744. arXiv:
          <volume>2305</volume>
          .
          <fpage>12744</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <surname>V. V.</surname>
          </string-name>
          ,
          <article-title>The clef-2025 checkthat! lab: Subjectivity, fact-checking, claim normalization, and retrieval</article-title>
          , in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>467</fpage>
          -
          <lpage>478</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Venktesh</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! Lab: Subjectivity, fact-checking, claim normalization, and retrieval</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>FEVER: a large-scale dataset for fact extraction and VERification</article-title>
          , in: M.
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Stent (Eds.),
          <source>Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , New Orleans, Louisiana,
          <year>2018</year>
          , pp.
          <fpage>809</fpage>
          -
          <lpage>819</lpage>
          . URL: https://aclanthology.org/N18-1074. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N18</fpage>
          -1074.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cocarascu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>The fact extraction and VERification over unstructured and structured information (FEVEROUS) shared task</article-title>
          , in: R.
          <string-name>
            <surname>Aly</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Christodoulopoulos</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Cocarascu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mittal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Schlichtkrull</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Vlachos (Eds.),
          <source>Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .fever-
          <volume>1</volume>
          .1/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          . fever-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bordia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dognin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Bansal, HoVer: A dataset for many-hop fact extraction and claim verification</article-title>
          , in: T. Cohn,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>3441</fpage>
          -
          <lpage>3460</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .findings-emnlp.
          <volume>309</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . findings-emnlp.
          <volume>309</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , “
          <article-title>liar, liar pants on fire”: A new benchmark dataset for fake news detection, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Vancouver, Canada,
          <year>2017</year>
          , pp.
          <fpage>422</fpage>
          -
          <lpage>426</lpage>
          . URL: https://aclanthology.org/P17-2067. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P17</fpage>
          -2067.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Alhindi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Muresan</surname>
          </string-name>
          ,
          <article-title>Where is your evidence: Improving fact-checking by justification modeling</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)</source>
          ,
          <source>Association for Computational Linguistics</source>
          , Brussels, Belgium,
          <year>2018</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>90</lpage>
          . URL: https:// aclanthology.org/W18-5513. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W18</fpage>
          -5513.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ostrowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Augenstein, Multi-hop fact checking of political claims</article-title>
          , in: Z.
          <string-name>
            <surname>-H. Zhou</surname>
          </string-name>
          (Ed.),
          <source>Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, International Joint Conferences on Artificial Intelligence Organization</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>3892</fpage>
          -
          <lpage>3898</lpage>
          . URL: https://doi.org/10.24963/ijcai.
          <year>2021</year>
          /536. doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2021</year>
          /536, main Track.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wadden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Zuylen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <article-title>Fact or fiction: Verifying scientific claims</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>7534</fpage>
          -
          <lpage>7550</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>609</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . emnlp-main.
          <volume>609</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vladika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Matthes</surname>
          </string-name>
          ,
          <article-title>Scientific fact-checking: A survey of resources and approaches</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>16859</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wadden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kuehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Augenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Generating scientific claims for zero-shot scientific fact checking</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2203</volume>
          .
          <fpage>12990</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kotonya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>Explainable automated fact-checking for public health claims</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2010</year>
          .09926.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Diggelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boyd-Graber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bulian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciaramita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leippold</surname>
          </string-name>
          ,
          <article-title>Climate-fever: A dataset for verification of real-world climate claims</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <year>2012</year>
          .00614.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <article-title>Averitec: A dataset for real-world claim verification with evidence from the web</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.13117. arXiv:
          <volume>2305</volume>
          .
          <fpage>13117</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Tabfact: A large-scale dataset for table-based fact verification</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1909</year>
          .02164.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cocarascu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          , Feverous:
          <article-title>Fact extraction and verification over unstructured</article-title>
          and structured information,
          <year>2021</year>
          . arXiv:
          <volume>2106</volume>
          .
          <fpage>05707</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <article-title>SCITAB: A challenging benchmark for compositional reasoning and claim verification on scientific tables</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Singapore,
          <year>2023</year>
          , pp.
          <fpage>7787</fpage>
          -
          <lpage>7813</lpage>
          . URL: https: //aclanthology.org/
          <year>2023</year>
          .emnlp-main.
          <volume>483</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>483</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kamoi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          , G. Durrett, Wice:
          <article-title>Real-world entailment for claims in wikipedia</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>01432</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>Identification and verification of simple claims about statistical properties</article-title>
          ,
          <source>in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Lisbon, Portugal,
          <year>2015</year>
          , pp.
          <fpage>2596</fpage>
          -
          <lpage>2601</lpage>
          . URL: https: //aclanthology.org/D15-1312. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D15</fpage>
          -1312.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <article-title>An extensible framework for verification of numerical claims, in: Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Valencia, Spain,
          <year>2017</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>40</lpage>
          . URL: https://aclanthology.org/E17-3010.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <article-title>Towards automatic numerical cross-checking: Extracting formulas from text</article-title>
          ,
          <source>in: Proceedings of the 2018 World Wide Web Conference</source>
          , WWW '18,
          <string-name>
            <given-names>International</given-names>
            <surname>World Wide Web Conferences Steering Committee</surname>
          </string-name>
          , Republic and Canton of Geneva, CHE,
          <year>2018</year>
          , p.
          <fpage>1795</fpage>
          -
          <lpage>1804</lpage>
          . URL: https://doi.org/10.1145/3178876.3186166. doi:
          <volume>10</volume>
          .1145/3178876.3186166.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>V. V</given-names>
            ,
            <surname>A. Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <article-title>Quantemp: A real-world open-domain benchmark for factchecking numerical claims</article-title>
          ,
          <source>in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>650</fpage>
          -
          <lpage>660</lpage>
          . URL: https://doi.org/10.1145/3626772.3657874. doi:
          <volume>10</volume>
          .1145/3626772.3657874.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Moshfeghi, Elastic:
          <article-title>Numerical reasoning with adaptive symbolic compiler, 2022</article-title>
          . URL: https://arxiv.org/abs/2210.10105. arXiv:
          <volume>2210</volume>
          .
          <fpage>10105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>M.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shankarampeta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cocarascu</surname>
          </string-name>
          , E. Simperl,
          <article-title>Exploring the numerical reasoning capabilities of language models: A comprehensive analysis on tabular data, 2023</article-title>
          . URL: https://arxiv.org/abs/2311.02216. arXiv:
          <volume>2311</volume>
          .
          <fpage>02216</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          , G. Da San Martino, P. Nakov, Overview of CheckThat! 2020 English:
          <article-title>Automatic identification and verification of claims in social media</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          , T. Mandl,
          <string-name>
            <surname>The</surname>
            <given-names>CLEF</given-names>
          </string-name>
          -
          <year>2021</year>
          <article-title>CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news</article-title>
          ,
          <source>in: Advances in Information Retrieval - 43rd European Conference on IR Research</source>
          , volume
          <volume>12657</volume>
          <source>of ECIR '21</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>639</fpage>
          -
          <lpage>649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2022 CheckThat! lab task 2 on detecting previously fact-checked claims</article-title>
          , in: N. Faggioli, Guglielmo andd Ferro,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , M. Potthast (Eds.), Working Notes of CLEF 2022-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2022</year>
          , Bologna, Italy,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          , G. Da San Martino, M. Hasanain,
          <string-name>
            <given-names>R. N.</given-names>
            <surname>Nandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Azizov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Panayotov</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2023 CheckThat! lab task 4 on factuality of reporting of news media</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , Vlachos, Michalis (Eds.), Working Notes of CLEF 2023-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2023</year>
          , Thessaloniki, Greece,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Màrquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kyuchukov</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Martino, Overview of the CLEF-</article-title>
          2018
          <source>CheckThat! Lab on Automatic Identification and Verification of Political Claims: 9th International Conference of the CLEF Association</source>
          , CLEF
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>