<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Saivineetha at CheckThat! 2025: Exploring Fine-Tuning and Zero-Shot Approaches for Claim Normalization⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Baddepudi Venkata Naga Sri Sai Vineetha</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Independent Contributor (TCS Research)</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents our participation in the CLEF 2025 CheckThat! Lab's Task 2 which focuses on claims extraction and normalization. This task aims to decompose social media posts into simpler, comprehensible forms. The task spans across 20 languages - English, Arabic, Bengali, Czech, German, Greek, French, Hindi, Korean, Marathi, Indonesian, Dutch, Punjabi, Polish, Portuguese, Romanian, Spanish, Tamil, Telugu, Thai. Our study focuses on two diferent languages, Hindi and Telugu. Our approach involves Parameter-Eficient Fine-Tuning (PEFT) on multi-lingual Large Language Model (LLM) for Hindi dataset and zero-shot inferencing for Telugu dataset. Our proposed method is ranked third in Hindi with METEOR score of 0.2996 and fourth in Telugu with METEOR score of 0.3774 in the organizer's leaderboard.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Model (LLM)</kwd>
        <kwd>Claim Normalization</kwd>
        <kwd>Parameter-Eficient Fine-tuning (PEFT)</kwd>
        <kwd>Zero-shot inference</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Sundriyal et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] introduced the novel task of Claim Normalization which aims to simplify the
noisy social media posts into a simple, concise form. The authors proposed CACN, an approach that
leverages chain-of-thought prompting and claim worthiness estimation to interpret complex claims.
The authors have also introduced CLAN dataset which has more than 6k social media posts and their
normalized claims. Papageorgiou et al. [8] employed pre-trained LLM to extract factual sentences from
news text. Fact and claim extractions helps in fact-checking in news articles. They also proposed an
approach using graph convolutional networks to capture more complex relations from the text. Wang
et al. [9] introduced a task of claim clarification. Claim clarification involves rewriting the ambiguous
parts of claims thereby enhancing the content and removing redundant information. They evaluated
the performance of claim clarification task across various LLMs. The authors proposed a semantic
evaluation approach based on sliding window.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Datasets</title>
        <p>The dataset for Claim Normalization task comprises of noisy, unstructured social media post and their
normalized claims. The task comprises of two settings - monolingual and zero-shot. Monolingual
consists of training, development and test datasets provided for specific languages. This approach
ensures that the model learns the language specific patterns. The datasets of English, German, French,
Spanish, Portugese, Hindi, Marathi, Punjabi, Tamil, Arabic, Thai, Indonesian, and Polish follow this
setting. Zero-shot has only test data and no training or development datasets are provided. This
approach evaluates the performance of the model to unseen languages. This includes datasets of
languages Dutch, Romanian, Bengali, Telugu, Korean, Greek, and Czech.</p>
        <p>For this task, we conducted experiments on languages Hindi and Telugu. For Hindi, the dataset is
monolingual. Train data has 1081 posts and their normalized claims, development dataset has 50 posts,
while test dataset has about 100 posts. For Telugu, the dataset is zero-shot. It has 116 posts.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Proposed Approach</title>
        <p>In this research, inspired by the performance of multi-lingual LLMs we choose Gemma 2 -9B model to
perform fine-tuning for Hindi and Gemma 3 -12B model for zero-shot prompting for Telugu. In this
study, we utilized the multilingual Gemma 2 9B model for fine-tuning on the Hindi dataset. This model
was selected due to its strong support for multiple languages and its compatibility with Kaggle’s GPU
infrastructure, which enabled eficient fine-tuning within the available computational constraints. For
the Telugu dataset, we employed the Gemma 3 12B model in inference-only mode, as the objective was
limited to evaluating performance without additional fine-tuning.
3.2.1. Hindi
The Gemma 2 9B instruct model [6] is a multi-lingual model by Google which has been trained on diverse
languages. We further fine-tuned this instruct model on Hindi dataset having posts and normalized
claims provided for the task. We performed Parameter Eficient Fine-Tuning (PEFT) using quantization
with Low Rank Adaptors (QLoRA) with 4-bit quantization. The dataset provided has two columns post
and normalized claim. The dataset is formatted with the prompt defined in Fig 1. Table 1 represents the
hyperparameters used for fine-tuning.</p>
        <p>We performed PEFT fine-tuning on train dataset and evaluation on development dataset.
3.2.2. Telugu
The Gemma 3 12B instruct model [7] is a multi-lingual LLM which excels in various languages and
text generation tasks. We performed zero-shot prompting using Telugu dataset having the social media
posts. Fig 2 represents the prompt used for inference.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results and Analysis</title>
      <sec id="sec-4-1">
        <title>4.1. Baseline Model</title>
        <p>We used Google’s UMT5 base as baseline model as mentioned in CheckThat! Lab task 2. We fine-tune
UMT5 model with Hindi dataset for 5 epochs with learning rate 5e-4. For Telugu dataset, we performed
inferencing on UMT5 model using the same prompt as in Fig 2.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results and Analysis</title>
        <p>The oficial metric for Claim Normalization competition for Task 2 of CheckThat! Lab is METEOR score.
The METEOR score is calculated between the predicted normalized claims available from the model
and the test gold outputs which are manually curated. The METEOR score lies between 0 and 1 where
1 indicates the predicted claims more closely match the test gold outputs.</p>
        <p>We have used METEOR [10], ROUGE-2 [11], BLEU [12] and cosine similarity metrics to compare the
predicted and actual test data.
4.2.1. Hindi
We evaluated the normalized claims obtained from the fine-tuned model and compared the metrics
against the test gold outputs released for this task. We perform comparison of baseline model Google’s
UMT5 base which is finetuned on Hindi dataset and our proposed model.</p>
        <p>Table 3 shows the comparison of the baseline model with the proposed model for Hindi language. The
METEOR score of the proposed model increased by 9 times, ROUGE-2 score approximately increased
by 3 times. BLEU score increased by 10% and cosine similarity by 2 times.
4.2.2. Telugu
We evaluated the normalized claims obtained from zero-shot inference and compared the metrics against
the test gold outputs released for this task. We performed inference on Google’s UMT5 base model
as baseline results for Telugu language. We perform the comparison between these and the zero-shot
inference with the proposed prompt.</p>
        <p>Table 4 shows the comparison of the baseline model with the inference performed for Telugu language.
The METEOR score of the proposed model increased approximately by about 3 times, ROUGE-2 score
increased by 2 times, BLEU score increased by 10% and cosine similarity by approximately 1.5 times.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This study aims to build a model that helps to normalize claims from social media posts as part of
CheckThat! Lab 2025. We explored fine-tuning LLM on Hindi language for new dataset. This highlights
the capabilities of LLM models when trained on new domain or language such as Hindi language in
this task. We have performed zero-shot inferencing on Telugu dataset. This shows the capabilties of
zero-shot inference on LLMs using clear and structured prompts. Our submissions attained third place
in Hindi with METEOR score of 0.2996 and fourth in Telugu with METEOR score of 0.3774.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author has not employed any Generative AI tools.
[4] M. Sundriyal, T. Chakraborty, P. Nakov, Overview of the CLEF-2025 CheckThat! Lab Task 2 on</p>
      <p>Claim Normalization, 2025.
[5] D. K. Gajulamandyam, S. Veerla, Y. Emami, K. Lee, Y. Li, J. S. Mamillapalli, S. Shim, Domain
Specific Finetuning of LLMs Using PEFT Techniques, in: 2025 IEEE 15th Annual Computing and
Communication Workshop and Conference (CCWC), IEEE, 2025, pp. 00484–00490.
[6] G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard,
B. Shahriari, A. Ramé, et al., Gemma 2: Improving open language models at a practical size, arXiv
preprint arXiv:2408.00118 (2024).
[7] G. Team, A. Kamath, J. Ferret, S. Pathak, N. Vieillard, R. Merhej, S. Perrin, T. Matejovicova, A. Ramé,</p>
      <p>M. Rivière, et al., Gemma 3 technical report, arXiv preprint arXiv:2503.19786 (2025).
[8] E. Papageorgiou, I. Varlamis, C. Chronis, Harnessing Large Language Models and Deep Neural</p>
      <p>Networks for Fake News Detection, Information 16 (2025) 297.
[9] Y. Wang, B. He, X. Chen, L. Sun, Can LLMs Clarify? Investigation and Enhancement of Large
Language Models on Argument Claim Optimization, in: Proceedings of the 31st International
Conference on Computational Linguistics, 2025, pp. 4066–4077.
[10] S. Banerjee, A. Lavie, Meteor: An automatic metric for mt evaluation with improved correlation
with human judgments, in: Proceedings of the acl workshop on intrinsic and extrinsic evaluation
measures for machine translation and/or summarization, 2005, pp. 65–72.
[11] C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries, in: Text Summarization
Branches Out, Association for Computational Linguistics, Barcelona, Spain, 2004, pp. 74–81. URL:
https://www.aclweb.org/anthology/W04-1013.
[12] K. Papineni, S. Roukos, T. Ward, W. jing Zhu, Bleu: a method for automatic evaluation of machine
translation, 2002, pp. 311–318.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>From chaos to clarity: Claim normalization to empower fact-checking</article-title>
          ,
          <source>arXiv preprint arXiv:2310.14338</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. V.</given-names>
            ,
            <surname>The</surname>
          </string-name>
          <string-name>
            <given-names>CLEF</given-names>
            -2025 CheckThat! Lab: Subjectivity,
            <surname>Fact-Checking</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Claim</given-names>
            <surname>Normalization</surname>
          </string-name>
          , and Retrieval, in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>467</fpage>
          -
          <lpage>478</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Venktesh</surname>
          </string-name>
          , Overview of the CLEF-2025 CheckThat! Lab: Subjectivity,
          <string-name>
            <surname>Fact-Checking</surname>
            ,
            <given-names>Claim</given-names>
          </string-name>
          <string-name>
            <surname>Normalization</surname>
          </string-name>
          , and Retrieval, in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>