<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MMA at CheckThat! 2025: Multilingual Claim Normalization of Social-Media Posts⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariam Saeed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mazen Yasser</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marwan Torki</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nagwa ElMakky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Systems Engineering, Alexandria University</institution>
          ,
          <country country="EG">Egypt</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the work of the MMA team for Task 2 - Claim Normalization of the CheckThat! 2025 shared task. The task aims to convert noisy and informal social media posts into concise, check-worthy claims suitable for fact-checking. Our experiments focused on the monolingual setup of the task. While we submitted runs for all languages provided in the task, we provided additional experiments tailored for Arabic. We explored a range of approaches, including T5-based sequence-to-sequence models, zero-shot prompting, fine-tuning LLMs, and data augmentation techniques. The best-performing configurations were selected for each language. In particular, our Arabic model, using umt5 with data augmentation, achieved a strong score of 0.4584, placing third among all submissions. Spanish achieved the highest score of 0.5094 with the base umt5 model, while languages such as Polish and German showed lower performance. These results demonstrate the efectiveness of our multilingual strategies and the impact of data augmentation in improving performance, particularly for low-resource languages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Claim Normalization</kwd>
        <kwd>Misinformation</kwd>
        <kwd>Social Media</kwd>
        <kwd>Fact-Checking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The digital age has transformed social media platforms into primary channels for information. However,
this rapid and unregulated flow of content has facilitated the widespread circulation of false or misleading
information shared without malicious intent [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Unlike traditional news articles, social media posts
often lack formal structure and editorial oversight, frequently characterized by personal opinions,
emotional language, and an absence of standardized formatting. This unstructured nature complicates
the assessment of content veracity. In addition, the design of social media platforms, which often
rewards engagement over accuracy, can help with the spread of such content [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Traditional methods
are slow and labor-intensive, making them insuficient for keeping up with the fast pace of online
content [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        To mitigate previous problems, researchers are developing automated fact-checking systems and
collaborative verification methods to address the challenges posed by the diverse formats, varying
source credibility, and complex user interactions in online environments [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A key step in enhancing
the eficiency and efectiveness of fact-checking is extracting the main claims from complex social media
posts. This process involves distilling lengthy or ambiguous content into concise, clear, and verifiable
statements, enabling fact-checkers to focus on specific points rather than extraneous information [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
This task is particularly crucial during critical events such as elections, public health crises, or conflicts,
where misinformation can significantly influence public opinion and behavior [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In this study, we present our approach for Task 2 of the CLEF2025 CheckThat! Lab [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which
focuses on claim extraction and normalization from unstructured social media content. We developed
multiple transformer-based models to extract the main claims from posts, aiming to transform informal
content into concise, standardized statements. To enhance the robustness of our models, we employed
data augmentation techniques, including the collection of relevant posts from fact-checking resources
and the utilization of large language models (LLMs) to refine the collected data into suitable training
examples. Our experiments demonstrate that our contributions can enhance this process across 13
languages provided in the shared task’s monolingual setup, thereby supporting the broader goal of
efective misinformation detection and fact-checking.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>To better understand how to extract and normalize social media claims, we look at related work in four
main areas. First, we explain how claims are defined and show the importance of normalization. Second,
we explore how sequence-to-sequence models help generate claims from longer posts. Then, we discuss
how data augmentation can improve model performance when training data is limited. Finally, we
investigate the use of large language models (LLMs) to perform claim extraction and normalization
more efectively even with little or no fine-tuning.</p>
      <sec id="sec-2-1">
        <title>2.1. Claim Definition and Normalization</title>
        <p>
          A claim in the context of fact-checking is typically defined as a declarative statement asserting a piece
of information as true, which can be subsequently verified for its accuracy [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. On social media, claims
are often embedded within informal language, opinions, and multimedia content, making their precise
identification challenging. The unstructured and often ambiguous nature of these platforms necessitates
a clear definition of what constitutes a verifiable claim to enable efective automated processing [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          Claim normalization refers to the process of transforming an extracted claim into a standardized,
concise, and context-independent statement that is suitable for verification [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This often involves
removing redundancies and converting informal or colloquial language into a more formal and
unambiguous representation. Normalization is crucial because the same underlying claim can be expressed in
various ways across diferent posts or platforms. Standardizing these expressions allows fact-checkers to
consolidate eforts, identify duplicate claims, and eficiently retrieve relevant evidence from knowledge
bases or previously checked claims [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Research in this area has explored rule-based systems, semantic
similarity measures, and more recently, generative models to achieve efective normalization [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Sequence-to-Sequence Transformer Models for Claim Generation</title>
        <p>
          Sequence-to-sequence (Seq2Seq) models, particularly those based on the Transformer architecture
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], have become the standard for many natural language generation tasks, including abstractive
summarization, machine translation, and dialogue generation. Their ability to capture long-range
dependencies and contextual information through attention mechanisms makes them well-suited for
transforming longer, unstructured social media posts into concise claims. In the context of claim
extraction, these models can be trained to transform a social media post into a shorter, claim-like
sentence that encapsulates its core verifiable assertion.
        </p>
        <p>
          Several studies have demonstrated the efectiveness of Transformer-based models like BART [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
and T5 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] for abstractive summarization, a task closely related to claim generation. These models are
pre-trained on large text corpora and can be fine-tuned on specific datasets to generate coherent and
relevant summaries. Adapting these architectures for claim extraction involves fine-tuning them on
datasets where social media posts are paired with their corresponding main claims [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The goal is
to generate a statement that is not an extraction of a sentence from the original post but a potentially
novel sentence that accurately represents the core claim.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Data Augmentation</title>
        <p>
          The performance of deep learning models, including Seq2Seq Transformers, is heavily dependent on
the availability of large, high-quality training datasets. For specialized tasks like claim extraction from
social media, particularly in multiple languages, such datasets are often scarce or expensive to create.
Data augmentation techniques are employed to artificially expand the training set, thereby improving
model generalization and robustness [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          Common NLP data augmentation methods include back-translation, translating a sentence to a
target language and then back to the original, synonym replacement, and paraphrasing [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. For
Seq2Seq tasks, techniques like noising, sentence shufling, and synthetic data generation using
pretrained language models have also been explored. Studies have shown that data augmentation can
significantly boost the performance of Transformer models in low-resource scenarios or for tasks
requiring nuanced understanding, such as claim generation, by exposing the model to a wider variety
of linguistic expressions of similar underlying claims.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Leveraging LLMs for Claim Extraction and Normalization</title>
        <p>
          Recent advancements in Large Language Models (LLMs), such as GPT-3.5, GPT-4 [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], and
instructiontuned models such as Qwen [
          <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
          ], have significantly enhanced their ability to perform complex natural
language processing tasks without the need for task-specific fine-tuning. This zero-shot capability
is particularly beneficial for tasks like claim extraction and normalization, which often sufer from a
scarcity of annotated data.
        </p>
        <p>
          Moreover, the inherent reasoning abilities of LLMs have been harnessed to improve the accuracy of
claim verification. Kojima et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] showed that prompting LLMs with phrases such as "Let’s think
step by step" can significantly enhance their zero-shot reasoning performance across various tasks,
including fact verification .
        </p>
        <p>
          In the realm of claim extraction, LLMs have demonstrated proficiency in identifying and structuring
claims from unstructured text. For instance, Liu et al. [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] introduced a self-prompting framework
that enables LLMs to perform zero-shot relation extraction efectively by generating synthetic samples
that encapsulate specific relations, thereby guiding the model without explicit training data . Similarly,
Sundriyal et al. [24] propose the CACN framework, leveraging chain-of-thought prompting and
incontext learning with large language models to enhance the claim normalization process.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>We investigate multilingual fine-tuning with T5-based architectures, zero-shot and fine-tuned large
language models, and data augmentation techniques. Each approach is described in detail in the
following subsections. The quantitative analysis and comparison between diferent approaches are
given in section 4.</p>
      <sec id="sec-3-1">
        <title>3.1. Multilingual T5-Based Training</title>
        <p>We adopted the google/umt5 model[25], a multilingual version of T5 designed for cross-lingual
generation tasks. We fine-tuned it for post-to-claim generation and explored two training configurations:
• Unified Multilingual Training : We trained a single umt5 model on the entire multilingual
dataset. This approach leverages shared cross-lingual representations, enabling low-resource
languages to benefit from richer ones through transfer learning.
• Language-Specific Training : We trained separate umt5 models per language. While this setup
goes beyond cross-lingual transfer, it allows each model to better specialize in the linguistic
nuances of its respective language, potentially improving claim generation fidelity.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Zero-Shot Prompting</title>
        <p>
          We evaluated the generative capability of Qwen 2.5 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] (Qwen2.5-7B and Qwen2.5-32B) in a zero-shot
setting. This setting assesses the out-of-the-box generalization performance of the pre-trained model
for claim normalization. The models were prompted directly with the following instruction:
        </p>
        <sec id="sec-3-2-1">
          <title>System Prompt</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Given a social media post. Your task is to extract the post claim.</title>
          <p>You must filter the noise from the post.</p>
          <p>You must respond with the same language of the post.</p>
          <p>Ignore repetition, rhetorical flourishes, and background anecdotes.</p>
          <p>Keep only the essential elements:
• Who is involved (names, age, nationality).
• What is being claimed will happen or has happened.
• Where/When only if they are integral to the claim.</p>
          <p>The claim must only include words from the post and do not use any external words.
The claim must be in the style of title for an article.</p>
          <p>The claim must be Declarative Short Sentence with preserving names and places.</p>
          <p>DO NOT add, omit, or translate information. Preserve names and wording where possible.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Instruction Fine-Tuning with Key Point Intermediate Step</title>
        <p>To improve the performance of the Qwen2.5-7B model, we applied parameter-eficient fine-tuning using
Low-Rank Adaptation (LoRA)[26]. The training pipeline involved a two-stage strategy:
1. Key Point Extraction: Using Qwen2.5-32B in a zero-shot setting, we extracted key points from
each training example (post and normalized claim) with the following system instruction:</p>
        <sec id="sec-3-3-1">
          <title>Training System Prompt</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>Given a social media post, your task is to:</title>
          <p>- Identify and extract the important key points presented in this post.
- Then, you have to use all these key points to extract the main claim of that post.
You must answer in the same language as the given post. The output must be in the
following format:
&lt;keypoints&gt;
[’key1’,’key2’,....]
&lt;/keypoints&gt;
&lt;claim&gt;
claim of the post
&lt;/claim&gt;</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>This produced a curated intermediate dataset of post-key point pairs.</title>
          <p>2. Claim Generation from Key Points: We fine-tuned Qwen2.5-7B model on this augmented
dataset to map input posts to final claims via the extracted key points. The objective was to focus
the model’s attention on important information and reduce the noise from irrelevant content.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Arabic-Specific Data Augmentation and Modeling</title>
        <p>While previous sections focused on multilingual and monolingual experiments across 13 languages, we
dedicated additional efort to improving performance on Arabic, given its linguistic complexity. This
involved both model selection and targeted data augmentation tailored to Arabic content.</p>
        <p>We experimented with the following variants of T5 models, specifically ara-t5 models [ 27], specialized
for Arabic:
• ara-t5-v2, a general-purpose Arabic T5 model,
• ara-t5, trained for title generation tasks,
• ara-t5-tweet, fine-tuned on Arabic tweet data.</p>
        <p>In addition to modeling variations, we enriched the Arabic training data using the Google Fact Check
Tools API [28]. This API provides access to a curated set of verified claims and related content. We
scraped Arabic post-claim pairs from this resource and incorporated them into our training pipeline.
The goal was to increase the amount of high-quality, claim-specific data for Arabic, which is typically
underrepresented.</p>
        <p>These augmented samples were used to fine-tune the multilingual umt5, which showed superior
performance and was ultimately selected for final submission. An example of the augmented Arabic
data is shown below:</p>
        <sec id="sec-3-4-1">
          <title>Scrapped sample</title>
        </sec>
        <sec id="sec-3-4-2">
          <title>Post:</title>
          <p>Ahﻧأ ﺖﻋدا wﻳdyﻓ ﻊVAqﻣو ًاCw} ﻲﻋAmtﺟﻻا ﻞ}اwtﻟا ﻊﻗاwﻣ Ylﻋ ‹AﺑAsﺣو ‹Aﺤf} ﺖﻟواdﺗ
ةrybﻛ ‹Aymﻛ Ahyﻓ rh\ﺗو ,Tnsﻟا xأC Tlfﺣ ءAhtﻧا dﻌﺑ مwyﻟا حAb} HﻳCAﺑ عCاwJ rh\uﺗ
Yﻟإ دwﻌﺗ Tﻟواdtmﻟا wﻳdyfﻟا ﻊVAqﻣو CwOﻟAﻓ ,ﺢyﺤ} ryﻏ ءAﻋدﻻا ا@ﻫ نأ ﻻإ .‹AﻳAfnﻟا ﻦﻣ
دwﻌﻳ AhSﻌﺑ .Tnsﻟا xأC ‹ﻻAftﺣا dﻌﺑ HﻳCAbﻟ ﺖsyﻟو ,Tfltﺨﻣ ﺦﻳCاwﺗ ﻲﻓو ىrﺧأ ﻦﻛAﻣأ
.Ahsfﻧ HﻳCAﺑ ﻲﻓ TqﺑAF ‹ﻻAftﺣا ﻦﻣ rﺧﻵا AhSﻌﺑو ,AyﻟAWﻳإ ﻲﻓ ﻲﻟwﺑAﻧ Yﻟإ
Claim:
xأC Tlfﺣ dﻌﺑ HﻳCAﺑ عCاwJ rh\uﺗ Ahﻧأ ﻲﻋdﺗ ﻲtﻟا Tﻟواdtmﻟا wﻳdyfﻟا ﻊVAqﻣو CwOﻟا
.ىrﺧأ ﻦﻛAﻣأ ﻦﻣ وأ Tmﻳdﻗ wﻳdyﻓ ﻊVAqﻣو Cw} ﻲﻫ Tnsﻟا</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Datasets and Metrics</title>
        <p>
          For our experiments, we utilized the dataset provided by the CheckThat! 2025 Task 2: Claim
Normalization [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The task is ofered in two distinct setups:
• Monolingual Setup: This setup provides training, development, and test datasets for 13
languages: Arabic, German, English, French, Hindi, Marathi, Indonesian, Punjabi, Portuguese,
Spanish, Tamil, Thai, and Polish.
• Zero-Shot Setup: For the remaining 7 languages: Bengali, Czech, Greek, Korean, Romanian,
Telugu, and Dutch, only the test datasets are provided, without accompanying training or
development data.
        </p>
        <p>Our participation was exclusively in the monolingual setup, focusing on languages with complete
datasets. Table 1 presents the aggregated statistics of the dataset across the training, development, and
test splits for these 13 languages.</p>
        <p>For evaluation, we employed the METEOR score [29], as specified by the task organizers. METEOR
is designed to evaluate the quality of machine-generated text by comparing it to reference texts.
It calculates a harmonic mean of unigram precision and recall, with recall weighted higher than
precision. Additionally, METEOR incorporates features such as stemming, synonymy matching, and a
fragmentation penalty to account for word order, making it more aligned with human judgment.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Training Setup</title>
        <p>We conducted our experiments using a single NVIDIA A6000 GPU. Our training encompassed two
primary approaches: fine-tuning T5-based models and fine-tuning large language models (LLMs) using
Low-Rank Adaptation (LoRA).
4.2.1. T5-Based Models
For fine-tuning T5-based models, we employed the following hyperparameters:
• Number of epochs: 20
• Learning rate: 5e-4
• Learning rate scheduler: Cosine with 90 warmup steps
• Optimizer: AdamW
• Weight decay: 0.01
• Efective batch size : 32</p>
        <p>During evaluation, we selected the best checkpoint and then utilized the following decoding
parameters:
• Maximum sequence length: 512
• Number of beams: 5
• Top-p: 0.85
• Top-k: 40
4.2.2. Large Language Models with LoRA
For fine-tuning large language models using LoRA, we adopted a parameter-eficient approach, adjusting
only a subset of the model’s parameters. The hyperparameters for this setup were:
• Number of epochs: 5
• Learning rate: 1e-5
• Efective batch size : 8
• LoRA rank: 8</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results and Analysis</title>
        <p>We conducted a comprehensive evaluation across multiple training paradigms to assess the impact of
multilingual modeling, zero-shot generalization, and parameter-eficient fine-tuning. Our experiments
use the METEOR score as the evaluation metric across 13 languages in the monolingual setting.
Multilingual vs. Language-Specific Models. We first trained a single umt5 model jointly on
all 13 languages. While this setup benefits from shared multilingual patterns, it showed limitations
in capturing language-specific nuances. To address this, we trained a separate umt5 model for each
language. As shown in Table 2, language-specific models outperformed the multilingual one in several
languages. However, in a few cases like Marathi and Punjabi, the multilingual model demonstrated
better performance, likely due to limited monolingual data for those languages. Since the monolingual
setup outperformed the multilingual one in most cases and is better suited to capturing the linguistic
characteristics of each language, we adopted it for our final submission across all languages—except for
Arabic, where the multilingual umt5 model with data augmentation achieved superior results.
Zero-Shot Inference with LLMs. We evaluated the zero-shot capabilities of two large language
models Qwen2.5-7B and Qwen2.5-32B. As expected, the 32B variant achieved better performance
across almost all languages, highlighting the positive correlation between model size and generalization
in zero-shot settings. However, variants lagged behind fine-tuned models, especially in low-resource
languages like Hindi and Punjabi.</p>
        <p>Parameter-Eficient Fine-Tuning. The results were competitive, achieving an average improvement
over the zero-shot 7B baseline. For example, in Tamil, LoRA fine-tuning reached 0.437 compared to
0.156 in zero-shot 7B and 0.295 in zero-shot 32B. This demonstrates that parameter-eficient tuning can
significantly close the gap to full fine-tuning with much lower resource requirements.
Arabic-Specific Modeling. In addition to our multilingual experiments, we performed a focused
evaluation on Arabic using various variants of the AraT5 model to evaluate their suitability for claim
normalization.</p>
        <p>Their METEOR scores on the development and the test sets are shown in Table 3. Among them,
ara-t5-v2 yielded the best performance. However, despite promising results, these models were
outperformed by the multilingual umt5, especially when combined with data augmentation strategies.</p>
        <p>To further boost performance, we extended the training data using scraping post-claim examples
using the Google Fact Check Tools API. As shown in Table 4, the best result (0.4584 on the test set) was
obtained using umt5 trained with the full augmented dataset, confirming the advantage of cross-lingual
modeling with targeted data enhancement.</p>
        <p>Final Results on the Test Set. Table 4 presents our final submitted results along with the final rank
on the leaderboard. Due to evaluation issues with the Thai language, we submitted predictions for 12
out of the 13 target languages. For each language, the best-performing configuration was selected. The
Arabic model achieved a strong score of 0.4584 using umt5 with targeted data augmentation, while
Spanish attained the highest overall score of 0.5094 with the base umt5 model. On the other hand,
languages such as Polish and German yielded comparatively lower scores, suggesting the need for
additional data augmentation or model adaptation to improve performance.</p>
        <p>UMT5 (All) UMT5 (Mono) Qwen2.5-7B Zero-shot Qwen2.5-32B Zero-shot Qwen2.5-7B-LoRA</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we presented our contribution to Task 2 of the CLEF2025 CheckThat! Lab, addressing the
multilingual challenge of claim extraction and normalization from social media content. We developed
transformer-based models, primarily focusing on the monolingual setup across 13 languages. Our
approach combined multilingual T5 variants, zero-shot prompting of large language models, and
parameter-eficient fine-tuning with LoRA, alongside a novel keypoint-guided generation strategy.
Data augmentation, especially via the Google Fact Check Tools API, played a vital role in improving
model performance in low-resource settings, e.g., for Arabic language. Our experiments demonstrated
the efectiveness of these methods, with umt5 models and keypoint-assisted strategies yielding strong
results. While our work is limited to the monolingual track, future eforts will target improved keypoint
generation and broader support for zero-shot settings. Our findings highlight the potential of tailored
multilingual systems in enhancing claim verification pipelines and combating misinformation online.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, we used ChatGPT and Grammarly for grammar and spelling
checks, as well as for paraphrasing and rewording. After using these tools, we carefully reviewed and
edited the content as needed and take full responsibility for the final version of this publication.
ifndings-emnlp.769/. doi: 10.18653/v1/2024.findings-emnlp.769.
[24] M. Sundriyal, T. Chakraborty, P. Nakov, From chaos to clarity: Claim normalization to empower
fact-checking, 2024. URL: https://arxiv.org/abs/2310.14338. arXiv:2310.14338.
[25] H. W. Chung, X. Garcia, A. Roberts, Y. Tay, O. Firat, S. Narang, N. Constant, Unimax: Fairer
and more efective language sampling for large-scale multilingual pretraining, in: The Eleventh
International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023,
OpenReview.net, 2023. URL: https://openreview.net/forum?id=kXwdL1cWOAi.
[26] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, LoRA: Low-rank
adaptation of large language models, in: International Conference on Learning Representations,
2022. URL: https://openreview.net/forum?id=nZeVKeeFYf9.
[27] E. M. B. Nagoudi, A. Elmadany, M. Abdul-Mageed, AraT5: Text-to-text transformers for Arabic
language generation, in: Proceedings of the 60th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin,
Ireland, 2022, pp. 628–647. URL: https://aclanthology.org/2022.acl-long.47.
[28] Google fact check tools, 2024. https://toolbox.google.com/factcheck/explorer.
[29] A. Lavie, A. Agarwal, Meteor: an automatic metric for mt evaluation with high levels of correlation
with human judgments, in: Proceedings of the Second Workshop on Statistical Machine Translation,
StatMT ’07, Association for Computational Linguistics, USA, 2007, p. 228–231.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vosoughi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Aral,</surname>
          </string-name>
          <article-title>The spread of true and false news online</article-title>
          , science
          <volume>359</volume>
          (
          <year>2018</year>
          )
          <fpage>1146</fpage>
          -
          <lpage>1151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , H. Liu,
          <article-title>Fake news detection on social media: A data mining perspective</article-title>
          ,
          <source>ACM SIGKDD explorations newsletter 19</source>
          (
          <year>2017</year>
          )
          <fpage>22</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <article-title>Understanding the promise and limits of automated fact-checking, Reuters Institute for the Study of Journalism (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Corney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          , G. Da San Martino,
          <article-title>Automated fact-checking for assisting human fact-checkers</article-title>
          , in: Z.
          <string-name>
            <surname>-H. Zhou</surname>
          </string-name>
          (Ed.),
          <source>Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI21</source>
          ,
          <source>International Joint Conferences on Artificial Intelligence Organization</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>4551</fpage>
          -
          <lpage>4558</lpage>
          . URL: https://doi.org/10.24963/ijcai.
          <year>2021</year>
          /619. doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2021</year>
          /619, survey Track.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          , A survey on
          <source>automated fact-checking, Transactions of the Association for Computational Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>178</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>La Gatta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Luceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pierri</surname>
          </string-name>
          , E. Ferrara,
          <article-title>Retrieving false claims on twitter during the russia-ukraine conflict</article-title>
          ,
          <source>in: Companion proceedings of the ACM web conference</source>
          <year>2023</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>1317</fpage>
          -
          <lpage>1323</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! lab task 2 on claim normalization</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.), Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2025</year>
          , Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>Fever: a large-scale dataset for fact extraction and verification</article-title>
          , arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>05355</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>Fact checking: Task definition and dataset construction</article-title>
          ,
          <source>in: Proceedings of the ACL 2014 workshop on language technologies and computational social science</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, M. Hasanain,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hamdan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          , et al.,
          <source>Overview of checkthat!</source>
          <year>2020</year>
          :
          <article-title>Automatic identification and verification of claims in social media</article-title>
          ,
          <source>in: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 11th International Conference of the CLEF Association, CLEF</source>
          <year>2020</year>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , Proceedings 11, Springer,
          <year>2020</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. McConville</surname>
          </string-name>
          ,
          <article-title>Mumin: A large-scale multilingual multimodal fact-checked misinformation social network dataset</article-title>
          ,
          <source>in: Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3141</fpage>
          -
          <lpage>3153</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanselowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sorokin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Ukp-athene: Multi-sentence textual entailment for claim verification</article-title>
          , arXiv preprint arXiv:
          <year>1809</year>
          .
          <volume>01479</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, Bart:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>13461</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          . URL: http://jmlr.org/papers/v21/
          <fpage>20</fpage>
          -
          <lpage>074</lpage>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Transformer-Based Language Model Fine-Tuning Methods for COVID-</article-title>
          19
          <string-name>
            <surname>Fake News</surname>
            <given-names>Detection</given-names>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>92</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -73696-
          <issue>5</issue>
          _
          <fpage>9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. S. Saini,</surname>
          </string-name>
          <article-title>Enhancing performance of deep learning models with diferent data augmentation techniques: A survey, in: 2020 international conference on intelligent engineering and management (ICIEM)</article-title>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zou</surname>
          </string-name>
          , Eda:
          <article-title>Easy data augmentation techniques for boosting performance on text classification tasks</article-title>
          , arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>11196</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Gpt-4
          <source>technical report</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2303.08774. arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <year>qwen2</year>
          .
          <article-title>5: A party of foundation models</article-title>
          ,
          <year>2024</year>
          . URL: https://qwenlm.github.io/blog/qwen2. 5/.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <source>Qwen3 technical report</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.09388. arXiv:
          <volume>2505</volume>
          .
          <fpage>09388</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Iwasawa</surname>
          </string-name>
          ,
          <article-title>Large language models are zero-shot reasoners</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2205.11916. arXiv:
          <volume>2205</volume>
          .
          <fpage>11916</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <article-title>Unleashing the power of large language models in zero-shot relation extraction via self-prompting</article-title>
          , in: Y.
          <string-name>
            <surname>Al-Onaizan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>Y.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2024</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>13147</fpage>
          -
          <lpage>13161</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>