<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Misinformation Detection in Russo-Ukrainian Conflict Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thibault Ehrhart</string-name>
          <email>thibault.ehrhart@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphaël Troncy</string-name>
          <email>raphael.troncy@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grégoire Burel</string-name>
          <email>gregoire.burel@open.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harith Alani</string-name>
          <email>harith.alani@open.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EURECOM</institution>
          ,
          <addr-line>450 Route des Chappes, 06410 Biot</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Open University</institution>
          ,
          <addr-line>Walton Hall, Milton Keynes MK7 6AA</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Misinformation on social media poses a significant challenge during major geopolitical events, where rapid dissemination of misleading content can distort public understanding. The PROMID 2025 Subtask 3 focuses on identifying misinformation in tweets related to the 2022 Russo-Ukrainian conflict, a task complicated by extreme class imbalance, multilingual content, and heterogeneous metadata. In the provided dataset, misinformation accounts for only 1.05% of all tweets, making it dificult for transformer-based models to learn generalizable patterns. To address this challenge, we evaluate two approaches that both rely on the RoBERTa-large transformers based model: a baseline model trained solely on the original PROMID dataset, and an augmented model that incorporates an additional 5,022 Ukraine-related misinformation tweets coming from the Fact-checking Observatory (FCO). Our results show that while the baseline model achieves high precision, it performs poorly in recall due to overfitting on the limited misinformation examples. In contrast, the augmented model substantially improves misinformation detection, increasing F1-score from 0.4682 to 0.6967 and weighted F1 from 0.8516 to 0.9059. Our ifndings demonstrate that targeted data augmentation is an efective strategy for mitigating severe class imbalance and enhancing generalization in misinformation detection tasks. This constitutes the ClimateSense approach in the public leaderboard that was ranked 1st on the final test set of the PROMID 2025 Subtask 3 1. Our approach is fully reproducible using the code at https://github.com/climatesense-project/promid2025-task3.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Misinformation detection</kwd>
        <kwd>Data augmentation</kwd>
        <kwd>RoBERTa</kwd>
        <kwd>Fact checking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The proliferation of misinformation on social media platforms has become a critical challenge,
particularly during major geopolitical events such as the 2022 Russo-Ukrainian conflict. The ability to
automatically detect misinformation is essential for maintaining information integrity and public trust.
This paper presents our approach to PROMID Subtask 3, which focuses on binary classification of
tweets related to the 2022 Russo-Ukrainian conflict as either misinformation or genuine content. The
task is part of the PROMID (Prompt Recovery for MisInformation Detection) shared task, which aims
to explore methods for identifying misinformation in human and LLM-generated texts [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ].
      </p>
      <p>The task presents several significant challenges that make it particularly dificult for machine learning
approaches. First, the dataset shows a severe class imbalance, with misinformation tweets representing
only 1.05% of the training data (364 misinformation versus 34,174 non-misinformation tweets). Second,
misinformation content appears in multiple languages and varies significantly in linguistic
characteristics. Third, the social media context provided (both textual and metadata features) difers between
misinformation and non-misinformation instances.</p>
      <p>We hypothesize that the extreme class imbalance in the PROMID dataset may lead models to overfit
to the limited misinformation examples, learning superficial patterns rather than generalizable features
of misinformation. To test this hypothesis, we develop and compare two approaches: (1) Baseline
Approach, fine-tuning RoBERTa-large directly on the original PROMID dataset with class weighting
and oversampling to address imbalance, and (2) Augmented Approach, incorporating external
Ukrainerelated misinformation data to increase minority class representation before fine-tuning.</p>
      <p>Our comparative analysis reveals that while the baseline model performs well on the majority class,
it struggles to correctly identify misinformation due to overfitting. By incorporating additional
Ukrainerelated misinformation data, our augmented approach improves the detection of misinformation and
achieves better overall performance while demonstrating enhanced generalization capabilities. On
the oficial test set, our system achieved a weighted F1-score of 0.91, ranking 1st out of 11
participating teams. We release the source code of our approach at https://github.com/climatesense-project/
promid2025-task3.</p>
      <p>The remainder of this paper is organized as follows. Section 2 reviews related work on misinformation
detection using transformer-based models. Section 3 describes our methodology, including dataset
analysis, preprocessing, and model architecture. Section 4 presents our experimental results and error
analysis. Finally, Section 5 concludes with a discussion of limitations and future works.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Research on tweet-level misinformation detection has increasingly focused on large pretrained language
models, which consistently outperform traditional machine-learning approaches such as Support Vector
Machines (SVMs) and Long Short-Term Memory networks (LSTMs). Transformer-based architectures,
including BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], RoBERTa [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], DeBERTa [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and XLM-R [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], capture richer contextual and semantic
cues in short social media posts, which makes them particularly efective for this task [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Toraman et al. demonstrate that transformers models significantly outperform classical baselines on
MiDe22, a multilingual (English/Turkish) misinformation dataset covering multiple events, including
the 2022 Russo-Ukrainian conflict [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Their results show that DeBERTa achieves the highest
F1score of 83.95% on English misinformation detection, while XLM-R performs best on Turkish (82.82%
F1). Similarly, Weinzierl and Harabagiu report strong performance on the VaccineLies corpus [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
while Hossain et al. observes lower performance on COVID-19 misinformation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], highlighting that
dataset characteristics such as linguistic heterogeneity, topic complexity, and class distribution strongly
influence model efectiveness. These findings reinforce that misinformation detection performance is
highly dataset-specific and does not transfer reliably across domains.
      </p>
      <p>
        Transformer-based models have also been successful in related tasks such as rumor detection and
propaganda identification. Anggrainingsih et al. demonstrated that BERT-based sentence embeddings
improve accuracy over non-transformer approaches by capturing nuanced linguistic and discourse cues
in short, noisy texts [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In the propaganda detection domain, the SemEval shared tasks on propaganda
technique classification [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] have shown that fine-tuned transformers can identify specific manipulation
techniques in news articles with high accuracy [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Our work builds on these foundations by applying transformer-based classification to
Ukrainerelated misinformation while specifically addressing the extreme class imbalance through external data
augmentation [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Task Definition and Dataset Overview</title>
        <p>PROMID Subtask 3 focuses on misinformation detection in social media texts. The objective is to
classify tweets related to the 2022 Russo-Ukrainian conflict as either misinformation (positive class)
or non-misinformation (negative class). The dataset comprises manually annotated tweets collected
using the Twitter API during the first year of the 2022 Russo-Ukrainian conflict. A key challenge of
this subtask is the highly imbalanced class distribution, which tests how well models perform under
these conditions. The dataset contains misinformation tweets in multiple languages, with additional
metadata (e.g. account age, bot account indicators). Performance is evaluated using Precision, Recall,
and weighted-average F1-score.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. PROMID Training Dataset</title>
          <p>
            The PROMID training dataset is derived from the work of Shahi and Mejova [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], who analyzed
moderation of misinformation around the 2022 Russo-Ukrainian conflict. The dataset was collected using the
AMUSED framework [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], a systematic approach for annotating multimodal social media data. It
contains 34,538 tweets collected during the first year of the Russo-Ukrainian conflict. The dataset exhibits
a severe class imbalance, with only 364 misinformation tweets (1.05%) and 34,174 non-misinformation
tweets (98.95%). Each tweet in the dataset includes 32 features capturing tweet content and author
metadata, including tweet engagement metrics (favorite count, retweet count), user profile information
(followers count, friends count, account creation date, verification status), bot detection scores
(Botometer score with calculation date, manual bot check labels), temporal features (account age relative to
conflict start), and content metadata (language, hashtags, geolocation).
          </p>
          <p>Notably, misinformation tweets have a substantially richer metadata profile, with an average feature
ifll rate of 80.8% compared to 52.8% for non-misinformation tweets.</p>
          <p>To further characterize the dataset, we compute additional descriptive statistics for the textual content,
metadata coverage, and language distribution of tweets. Table 1 summarizes these statistics.</p>
          <p>Several patterns emerge from Table 1. Misinformation tweets tend to be longer, averaging 211.1
characters (30.9 tokens) compared to 180.5 characters (22.9 tokens) for non-misinformation content.
Interestingly, genuine tweets contain significantly more hashtags (4.2 vs. 1.2) and mentions (0.7 vs. 0.2).
The proportion of English content in misinformation tweets is also much higher (70.3% vs. 49.1%).</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. FCO Ukraine Dataset</title>
          <p>
            To address the extreme class imbalance, we incorporate additional misinformation about the 2022
Russo-Ukrainian conflict coming from the Fact-checking Observatory (FCO) 1 [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. The unfiltered
FCO data consists of more than 6 million Ukraine-related misinformation tweets collected between
November 2021 and June 2023.
          </p>
          <p>
            The FCO website was initially created in 2020 as an efort to track misinformation and the impact of
fact-checking during the COVID-19 pandemic by automatically generating human-readable weekly
1Fact-checking Observatory, https://fcobservatory.org/.
reports about misinformation and fact-checks spread on X. The website was extended in late 2021 to
include a section dedicated to the Russian invasion of Ukraine. To date, the website has generated 156
weekly COVID-19 misinformation reports and 83 Russo-Ukrainian war reports. The analysis of the
social media posts led to insights about the spread of fact-checks misinformation on social media and
its impact on misinformation [
            <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
            ].
          </p>
          <p>The FCO reports are generated through an automated pipeline that automatically collects relevant
URLs from organizations that have been vetted by the Poynter Institute’s International Fact-checking
Network (IFCN)2 and then tracks their mention on X before generating visual reports every week using
predefined templates. During the data collection process, each misinforming URL is given a normalised
score between +1 (completely true claim) and −1 (completely false claim) based on the fact-cheking
organisation rating. As a result, the collected data consists of posts with mentions of misinformation
URLs or mentions of fact-checking URLs. For our approach to PROMID Subtask 3, we used this data to
enrich the provided PROMID training dataset to alleviate the extreme class imbalance of the provided
dataset.</p>
          <p>The unfiltered dataset consists of more than 6 million X posts obtained from more than 6k fact-checks.
We only select posts that have a rating of −1 and contain misinformation (i.e., we remove posts that
share fact-checking URLs). This results in 30,447 tweets from 23,366 distinct users. We then filter
out non-English content and remove duplicates. The final set contains 5,022 unique tweets. This
augmentation increases the number of misinformation samples from 364 to 5,386, which represents a
14.75× increase.</p>
          <p>The combined dataset contains 39,560 tweets, comprising 5,386 misinformation (13.6%) and 34,174
non-misinformation (86.4%) samples. We split the data into training (85%, 33,626 tweets) and validation
(15%, 5,934 tweets) sets using stratified sampling to preserve the class distribution across splits.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Text Preprocessing</title>
        <p>Our preprocessing pipeline follows a minimal approach. We begin by converting all text entries to
string format and removing any empty tweets that may have resulted from data collection errors. We
deliberately retain URLs, hashtags, and mentions, as these may carry semantic information relevant to
the misinformation detection.</p>
        <p>The preprocessed text is then tokenized using the RoBERTa tokenizer with a maximum sequence
length of 256 tokens. We apply padding to shorter sequences and truncation to longer ones to standardize
the length across the dataset.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model Architecture</title>
        <p>
          We use RoBERTa-large [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] as our base model, which contains 355 million parameters. We add a
classification head consisting of a dropout layer (p=0.1) followed by a linear layer that maps the
1024dimensional [CLS] token representation to 2 output classes. The model is fine-tuned end-to-end using
the following hyperparameters:
• Batch size: 16
• Learning rate: 2e-5 with linear decay
• Optimizer: AdamW with weight decay of 0.01
• Warmup: 10% of total training steps
• Gradient clipping: max norm of 1.0
• Training epochs: 4
        </p>
        <p>These hyperparameters were selected based on preliminary experiments on a held-out portion of the
training data.
2The International Fact-Checking Network (IFCN), https://www.poynter.org/ifcn.</p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Handling Class Imbalance</title>
          <p>Given the severe class imbalance, we implement two complementary strategies to prevent the model
from defaulting to majority class prediction:</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>Weighted Cross-Entropy Loss.</title>
          <p>using the formula:</p>
          <p>We compute class weights inversely proportional to class frequencies
 =</p>
          <p>·  
where  is the total number of samples,  is the number of classes, and  is the number of samples
in class . For the augmented dataset, this yields weights of 0.5788 for non-misinformation and 3.6726
for misinformation.
(1)
Weighted Random Sampling. During training, we apply weighted random sampling to construct
each batch, with sampling probabilities inversely proportional to class frequencies.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>3.3.2. Training Details</title>
          <p>
            All experiments are conducted on a single NVIDIA L40S GPU with 48GB memory. Training the
augmented model for 4 epochs takes approximately 10 minutes. We use the Hugging Face Transformers
library [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] version 4.56.1 with PyTorch 2.8.0+cu126.
          </p>
          <p>Table 2 shows the training progression for the augmented model across 4 epochs.</p>
          <p>The model achieves rapid convergence, with training accuracy reaching 93.01% after the first epoch
and 99.26% by the fourth epoch. The slight dip in validation F1-score at epoch 2 suggests some initial
instability, but performance recovers and peaks at 0.9352 in epoch 4. We select the epoch 4 checkpoint
for final evaluation based on the validation performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Main Results</title>
        <p>Table 3 presents a comparison between the baseline trained on the original PROMID data, and the
augmented model trained with external misinformation data.</p>
        <p>The results reveal a clear precision-recall tradeof between the two approaches. The baseline model
achieves exceptional precision (97.64%) but critically poor recall (30.77%), detecting only approximately
one-third of actual misinformation instances. This pattern strongly suggests that the model overfitting
to the 364 training examples.</p>
        <p>The augmented model demonstrates substantially improved generalization, achieving 90.44%
precision and 56.33% recall. While precision decreases by 7.2 percentage points, this is outweighed by
the 83.1% relative improvement in recall. The misinformation F1-score increases from 0.4682 to 0.6967
(48.8% improvement), and the weighted F1-score improves from 0.8516 to 0.9059 (6.4% gain), indicating
better overall classification performance.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Error Analysis</title>
        <p>To better understand the precision-recall tradeof, we conduct a detailed error analysis by examining
specific examples where the baseline and augmented models diverge in their predictions.
Precision Loss Examples. The augmented model shows increased false positive rates on several
categories of genuine content:
1. Political commentary and opinion: Tweets expressing strong political opinions about Ukraine
policy are frequently misclassified. For example, statements criticizing government aid decisions
or energy policy are flagged as misinformation, suggesting that the model mixes partisan rhetoric
with false content.
2. Sarcasm and rhetorical questions: Sarcastic tweets such as “It never ends. Why? Because</p>
        <p>Ukraine has already won” are incorrectly flagged.
3. Meta-commentary: Tweets discussing social media content itself (e.g. “Do you have a screenshot?
The tweet has been deleted” ) trigger false positives, suggesting the model associates discussion of
content manipulation with misinformation ecosystems.
4. Non-English content: Tweets in German, Dutch, Spanish, and other non-English languages
show elevated false positive rates, which may result from the English-language filtering applied
to the external augmentation data.</p>
        <p>Recall Gain Examples. The augmented model successfully identiefis diverse misinformation patterns
that the baseline model misses:
1. Conspiracy theories: Claims about secret bioweapon laboratories, fabricated relationships
between public figures, and distorted casualty figures are correctly detected.
2. Propaganda techniques: Tweets that selectively frame events to promote specific narratives
or those containing verifiable false claims about military operations are identified, including
fabricated claims about NATO aircraft being shot down, manufactured quotes from oficials, and
manipulated humanitarian contexts.
3. Coordinates narratives: The model detects recurring false narratives that appear across multiple
tweets with slight variations.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Works</title>
      <p>We presented a comparative study of misinformation detection approaches for tweets about the
RussoUkrainian conflict, developed for PROMID Subtask 3. Our baseline model, trained solely on the original
PROMID dataset with 364 misinformation examples, achieved high precision (0.9764) but poor recall
(0.3077), consistent with our hypothesis that extreme class imbalance leads to overfitting.</p>
      <p>By augmenting the training data with 5,022 external Ukraine-related misinformation examples, we
achieved substantially improved generalization with 0.9044 precision, 0.5633 recall, and 0.9059 weighted
F1-score. This represents a 48.8% improvement in misinformation F1-score over the baseline approach.</p>
      <p>Our results demonstrate that data augmentation is a viable and efective strategy for addressing
severe class imbalance in misinformation detection. The high precision maintained by our augmented
model makes it suitable for deployment in scenarios where false positives carry significant costs.</p>
      <p>Several directions could extend this work. First, incorporating user metadata (follower counts, account
age, verification status) alongside textual features. Second, multilingual modeling using transformers
such as XLM-R could better handle the diverse languages present in the dataset. Third, combining the
high-precision baseline with the high-recall augmented model could achieve better precision-recall
balance. Finally, using methods such as attention visualization and feature attribution could help identify
which linguistic and contextual features most strongly indicate misinformation, which could be used
for improving the model.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the European CHIST-ERA program within the ClimateSense project (Grant
ID ANR-24-CHR4-0002, EPSRC EP/Z003504/1). We thank the PROMID shared task organizers for
providing the dataset and evaluation framework.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mejova</surname>
          </string-name>
          , Too Little, Too Late:
          <article-title>Moderation of Misinformation around the RussoUkrainian Conflict</article-title>
          ,
          <source>in: 17th ACM Web Science Conference (WebSci)</source>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .1145/3717867. 3717876.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shasirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          , G. Pasi, T. Mandl,
          <article-title>Overview of the First Shared Task on Prompt Recovery for Misinformation Detection (PROMID), in:</article-title>
          K. Ghosh,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Chakraborty (Eds.), Working Notes of FIRE:
          <article-title>Forum for Information Retrieval Evaluation, CEUR-WS</article-title>
          .org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shasirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          , G. Pasi, T. Mandl,
          <article-title>Prompt Recovery for Misinformation Detection at FIRE 2025, in: 17th Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE)</article-title>
          ,
          <source>Association for Computing Machinery</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen, Deberta:
          <article-title>Decoding-enhanced bert with disentangled attention</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <year>2006</year>
          .03654.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peskine</surname>
          </string-name>
          , G. Alfarano,
          <string-name>
            <surname>I. Harrando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <surname>Detecting</surname>
            <given-names>COVID</given-names>
          </string-name>
          -19
          <article-title>-related conspiracy theories in tweets</article-title>
          , in: CEUR (Ed.),
          <source>MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Toraman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ozcelik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Şahinuç</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Can</surname>
          </string-name>
          ,
          <article-title>Mide22: An annotated multi-event tweet dataset for misinformation detection</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2210</volume>
          .
          <fpage>05401</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Weinzierl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Harabagiu</surname>
          </string-name>
          ,
          <article-title>Vaccinelies: A natural language resource for learning to recognize misinformation about the covid-19 and hpv</article-title>
          vaccines,
          <year>2022</year>
          . arXiv:
          <volume>2202</volume>
          .
          <fpage>09449</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hossain</surname>
          </string-name>
          , R. L.
          <string-name>
            <surname>Logan</surname>
            <given-names>IV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>A. Ugarte</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Matsubara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Young</surname>
          </string-name>
          , S. Singh,
          <article-title>COVIDLies: Detecting COVID-19 misinformation on social media</article-title>
          ,
          <source>in: 1st Workshop on NLP for COVID-19</source>
          , Association for Computational Linguistics,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .nlpcovid19-
          <fpage>2</fpage>
          .
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Anggrainingsih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <article-title>Bert based classification system for detecting rumours on twitter</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2109</volume>
          .
          <fpage>02975</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Da San Martino</surname>
          </string-name>
          , A.
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Petrov</surname>
          </string-name>
          , P. Nakov, SemEval-2020 task 11:
          <article-title>Detection of propaganda techniques in news articles</article-title>
          ,
          <source>in: 14th Workshop on Semantic Evaluation</source>
          , International Committee for Computational Linguistics, Barcelona,
          <year>2020</year>
          , pp.
          <fpage>1377</fpage>
          -
          <lpage>1414</lpage>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2020</year>
          .semeval-
          <volume>1</volume>
          .
          <fpage>186</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peskine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tronc</surname>
          </string-name>
          , P. Papotti, EURECOM at SemEval
          <article-title>-2024 Task 4: Hierarchical Loss and Model Ensembling in Detecting Persuasion Techniques</article-title>
          , in: 18th International Workshop on Semantic Evaluation (SemEval),
          <source>Association for Computational Linguistics</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          . semeval-
          <volume>1</volume>
          .
          <fpage>172</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peskine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korencic</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Grubisic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Definitions Matter:
          <article-title>Guiding GPT for Multi-label Classification</article-title>
          ,
          <source>in: International Conference on Empirical Methods for Natural Language Processing (EMNLP)</source>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>4054</fpage>
          -
          <lpage>4063</lpage>
          . doi:
          <volume>10</volume>
          .18653/V1/
          <year>2023</year>
          .FINDINGS-EMNLP.
          <year>267</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Burel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mensio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peskine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          , H. Alani,
          <article-title>CimpleKG: A continuously updated knowledge graph on misinformation, factors and fact-checks</article-title>
          ,
          <source>in: 23rd International Semantic Web Conference (ISWC)</source>
          , Baltimore, USA,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Majchrzak</surname>
          </string-name>
          ,
          <string-name>
            <surname>Amused:</surname>
          </string-name>
          <article-title>An annotation framework of multimodal social media data</article-title>
          ,
          <source>in: International Conference on Intelligent Technologies and Applications</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Burel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Alani</surname>
          </string-name>
          ,
          <article-title>The fact-checking observatory: Reporting the co-spread of misinformation and fact-checks on social media</article-title>
          ,
          <source>in: 34th ACM Conference on Hypertext and Social Media (HT)</source>
          ,
          <source>Association for Computing Machinery</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Burel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Farrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mensio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Khare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Alani</surname>
          </string-name>
          ,
          <article-title>Co-spread of misinformation and fact-checking content during the covid-19 pandemic</article-title>
          , in: S. Aref,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braghieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dignum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Grisolia</surname>
          </string-name>
          , D. Pedreschi (Eds.),
          <source>Social Informatics</source>
          , Springer International Publishing,
          <year>2020</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Burel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Farrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Alani</surname>
          </string-name>
          ,
          <article-title>Demographics and topics impact on the co-spread of covid-19 misinformation and fact-checks on twitter</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>58</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          ,
          <source>in: International Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>