<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the CLEF-2025 CheckThat! Lab Task 4 on Scientific Web Discourse</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Salim Hafid</string-name>
          <email>salim.hafid@sciencespo.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yavuz Selim Kartal</string-name>
          <email>yavuzselim.kartal@gesis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Schellhammer</string-name>
          <email>sebastian.schellhammer@gesis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katarina Boland</string-name>
          <email>katarina.boland@hhu.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitar Dimitrov</string-name>
          <email>dimitar.dimitrov@gesis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandra Bringay</string-name>
          <email>bringay@lirmm.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Todorov</string-name>
          <email>todorov@lirmm.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Dietze</string-name>
          <email>stefan.dietze@hhu.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GESIS - Leibniz Institute for the Social Sciences</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Heinrich-Heine-University</institution>
          ,
          <addr-line>Düsseldorf</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LIRMM, CNRS, University of Montpellier</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>médialab Sciences Po</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present an overview of Task 4 of the CheckThat! lab at the 2025 edition of the Conference and Labs of the Evaluation Forum (CLEF). Task 4 focuses on scientific web discourse and consists of two subtasks: detecting and diferentiating between diferent forms of scientific web discourse (task 4a), and retrieving the scientific publication given a social media post with an implicit reference (task 4b). Within the context of automated factchecking, these tasks contribute to the detection of scientific claims as well as the retrieval of scientific evidence for their verification. In total, 10 teams participated in task 4a and 30 in 4b, with 6 and 7 teams, respectively, submitting system description papers. The participants in task 4a primarily used transformer-based approaches, with some teams also experimenting with LLMs for data augmentation or classification. The best-performing team achieved a macro-average F1-score close to 0.8. In task 4b, most teams employed two-stage retrieval and re-ranking pipelines, including the use of various LLMs, with the best team reaching an MRR@5 score of 0.68. This paper presents a detailed overview of the two tasks, including the datasets and evaluation settings, along with a description of the participants' approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;scientific web discourse</kwd>
        <kwd>scientific claims</kwd>
        <kwd>evidence retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Scientific web discourse, i.e., discourse on the social web about scientific knowledge, resources, or other
research-related information, has increased substantially throughout the past years [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Understanding
the topics, claims, and studies that are being discussed, the citation habits of users, and the evolution
of these habits and the discourse more generally is critical for various tasks and disciplines. For
instance, identifying text that conveys scientific knowledge is essential for claim detection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
claim verification [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] in this domain. Given that phenomena such as fake news propagation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
and bias reinforcement [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] may have harmful efects for society [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], especially when coupled with
potentially sensitive and controversial topics such as COVID-19 or climate change, tackling such
fact-checking-related tasks for scientific web discourse is crucial. However, current state-of-the-art
language models have been shown to perform worse for downstream tasks involving scientific claims
compared to other domains [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Detecting references to scientific studies on social media is crucial for
computing altmetrics and assessing the credibility of information. Furthermore, it facilitates research
into the evolution of scientific discourse in online environments as studied by various disciplines such
(a)
(b)
as communication studies [
        <xref ref-type="bibr" rid="ref1 ref2">2, 1</xref>
        ], social sciences, and social psychology [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ], as well as computational
linguistics [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ].
      </p>
      <p>
        Scientific web discourse is often informal, e.g., "covid vaccines just don’t work on children", and displays
fuzzy/incomplete citation habits, such as "Stanford study shows that vaccines don’t work", where the
actual study is never cited through explicit references. These characteristics pose challenges both
from a societal perspective, leading to poorly informed online debates [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and from a computational
perspective, requiring robust datasets and methods to detect and analyse such discourse.
      </p>
      <p>
        Therefore, the CLEF-2025 CheckThat! lab contains two tasks that are relevant to the CheckThat!
verification pipeline [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], tailored specifically to the analysis of scientific web discourse:
• Task 4a - Scientific Web Discourse Detection : Given a social media post (tweet), detect if it
contains (1) a scientific claim, (2) a reference to a scientific study/publication, or (3) mentions of
scientific entities, e.g. a university or scientist.
• Task 4b - Claim-source Retrieval: Given an implicit reference to a scientific paper, i.e., a social
media post (tweet) that mentions a research publication without a URL (e.g., "Stanford study
shows that vaccines don’t work"), retrieve the mentioned paper from a pool of candidate papers.
      </p>
      <p>In the remainder of the paper, we introduce the datasets used in our two tasks (Section 2), describe
system submissions and results (Section 3), present related work (Section 4), and conclude with final
remarks (Section 5).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>
        2.1. Task 4a: Scientific Web Discourse Detection
The dataset for task 4a is an extension of the SciTweets corpus [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and consists of 1,606 posts from X
(formerly Twitter) annotated with the diferent forms of science-related online discourse, which are
scientific claims (Category 1), scientific references (Category 2), and references to science contexts or
entities (Category 3), following the definitions by Hafid et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]:
Science-related: Texts that fall under at least one of the following categories:
      </p>
      <p>Category 1 - Scientific knowledge (scientifically verifiable claims): Does the text include a
claim or a question that could be scientifically verified? i.e., claims that can only be verified with
the help of scientific publications or research data used in the scientific process (see posts 2 and 5
in Table 1).</p>
      <p>Category 2 - Reference to scientific knowledge: Does the text include at least one reference
to scientific knowledge? References can either be direct, e.g., DOI, title of a paper, or indirect, e.g.,
a link to an article that includes a direct reference (see posts 3 and 5 in Table 1).</p>
      <p>Category 3 - Related to scientific research in general: Does the text mention a scientific
research context (e.g., mention of a scientist, scientific research eforts, research findings)? (see
posts 3, 4, and 5 in Table 1).</p>
      <p>Not science-related: Texts that do not fall under either of the three previous categories. (see post 1 in
Table 1)</p>
      <p>Table 2 shows the number of posts per category and per split.
65% of cats born with blue eyes are deaf.
@user Please read this research analysis
https://www.apa.org/pubs/journals/releases/psp-pspp0000147.pdf.</p>
      <sec id="sec-2-1">
        <title>How is University of Chicago shaping the future of science? Find out on April 6</title>
      </sec>
      <sec id="sec-2-2">
        <title>A fifth of US high school students use tobacco, finds survey http://www.bmj.com/content/349/bmj.g6885</title>
        <p>
          2.2. Task 4b: Scientific Claim Source Retrieval
The dataset for task 4b consists of two sets, a query set and a collection set. The query set contains
15,699 X (Twitter) posts with implicit references to scientific papers from the CORD-19 corpus [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The
collection set contains metadata, such as title, abstract, and afiliations of the 7,718 CORD-19 scientific
papers, which the posts from the query set implicitly refer to. Table 3 shows the number of posts (query
set) and number of publications (collection set) per split. Note that the collection set is identical for all
three splits. Tables 4 and 5 show two exemplary posts with implicit references and the corresponding
CORD-19 publications. For instance, given the first post in Table 4, the participants are asked to retrieve
the correct publication, which is the first publication in Table 5. See [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] for more details on how the
data was sampled and annotated.
3. Results and Overview of the Systems
3.1. Task 4a: Scientific Web Discourse Detection
Task 4a is a multilabel classification task and was evaluated through the macro-averaged F1-score. The
baseline is a DeBERTaV3-base model trained on the train set for 10 epochs with a learning rate of 2− 5
and a batch size of 16. For the final test set predictions, we used the checkpoint with the best dev set
performance, resulting in a test set macro F1-score of 0.7668 (rank 7).
        </p>
        <p>In total, ten teams participated in task 4a. Table 6 provides an overview of the diferent approaches
and their performances for those teams that submitted a paper description of their work. The F1-score
and rank indicate the performance and position on the final test set leaderboard. Most teams relied on
transformer-based models such as DeBERTa-v3, SciBERT, and Twitter-Roberta, while some also used
LLMs. In addition, diferent techniques such as LLM-based data augmentation, ensemble methods, and
other optimizations were employed.</p>
        <p>
          Team ClimateSense [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] fine-tuned a twitter-roberta-base-2022-154m model and identified the
bestperforming checkpoint, for each category, based on the dev set performance. Using the embeddings of
these checkpoints, traditional classifiers were trained for each category (Nearest centroid classifiers for
categories 1 and 3, and a Naive Bayes classifier for category 2), ranking first on the overall leaderboard
with an F1-score of 0.7998. They also experimented with SetFit [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], and training classifiers on top of
existing sentence encoders, but found the approaches to have lower performance compared to their
ifnal system.
        </p>
        <p>
          Team VerbaNexAI [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] fine-tuned a DeBERTa-v3-base model using hyperparameters, i.e., learning
rate and number of epochs, found with 5-fold cross-validation. To improve performance, they adjusted
the binary cross-entropy loss with class weights and employed threshold-tuning. For their final
submission, ranking second, they used a soft-voting ensemble of their two strongest models (one with
class weights and one without).
        </p>
        <p>
          Team SBU-SCIRE [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] augmented the training data to 2,369 samples with paraphrases using
DeepSeek-R1. They trained five DeBERTa-v3-large models on the augmented dataset using 5-fold
cross-validation with a focal loss [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] to address class imbalance and focus on hard examples. For
inference on the test set, they averaged the logits of the five models before applying optimized class-specific
thresholds to the sigmoid probabilities.
        </p>
        <p>
          Team DS@GT [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] trained diferent transformer-based models, such as DeBERTa-v3-base and -large,
and used zero-shot and few-shot classification with GPT-4o and GPT-4o mini. While the
transformerbased models performed better for categories one and three, the LLM-based approaches outperformed
them on category two. Thus, for their final submission, they combined DeBERTa-v3-base (categories
one and three) with GPT-4o mini using few-shot with five examples based on semantic similarity
(category two).
        </p>
        <p>
          Team TurQUaz [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] employed various LLMs, such as Gemma3 (12B), Qwen3 (8B), DeepSeek-R1
(8B), in diferent collaborative settings. Specifically, they investigated three settings: (1) a single debate,
where two LLMs argue in favor of or against a specific classification (e.g., whether a post contains a
scientific claim) and a third model acts as a judge, (2) a team debate, following the same approach as
the single debate but with multiple LLMs collaborating on each side, and (3) a council debate, where
multiple LLMs argue together to reach a consensus. For their final submission, they chose the council
debate, outperforming the two other settings. While the approach, overall, did not improve upon the
baseline, it ranked first in identifying scientific references (category two).
        </p>
        <p>
          Team JU_NLP [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] generated embeddings using SciBERT and Twitter-RoBERTa models to capture
both scientific and social media discourse characteristics of posts. The embeddings were concatenated
and used to train a two-layer classification head.
        </p>
        <p>Overall, fine-tuning existing PLMs such as DeBERTa-v3 or twitter-roberta-base-2022-154m performs</p>
      </sec>
      <sec id="sec-2-3">
        <title>Peer-reviewed in the New England Journal of Medicine regarding Delta (B.1.617.2): •Pfizer is 90% efective •AstraZeneca is 70% efective.</title>
      </sec>
      <sec id="sec-2-4">
        <title>This falls in line with vaccine eficacy of other variants. Yes, the vaccines ARE indeed efective against Delta.</title>
      </sec>
      <sec id="sec-2-5">
        <title>Published in the journal Antiviral Research, the study from Monash University showed that a single dose of Ivermectin could stop the coronavirus growing in cell culture – efectively eradicating all genetic material of the virus within two days.</title>
        <sec id="sec-2-5-1">
          <title>CORD Id</title>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>5g02ykhi ivy95jpw</title>
      </sec>
      <sec id="sec-2-7">
        <title>5g02ykhi Efectiveness of Covid-19</title>
      </sec>
      <sec id="sec-2-8">
        <title>Vaccines against the</title>
      </sec>
      <sec id="sec-2-9">
        <title>B.1.617.2 (Delta) Variant</title>
        <p>Date</p>
        <sec id="sec-2-9-1">
          <title>Venue + Authors</title>
        </sec>
        <sec id="sec-2-9-2">
          <title>Selected Abstract Text</title>
          <p>ivy95jpw</p>
        </sec>
      </sec>
      <sec id="sec-2-10">
        <title>The FDA-approved drug ivermectin inhibits the replication of SARS-CoV-2 in vitro</title>
        <p>[...] We report here that</p>
      </sec>
      <sec id="sec-2-11">
        <title>Ivermectin [...] is an</title>
        <p>inhibitor of the causative
virus (SARS-CoV-2), with a
single addition to
Vero-hSLAM cells 2 h post
infection with SARS-CoV-2
able to efect 5000-fold
reduction in viral RNA at
48 h. [...]
best in terms of macro-avg F1-score, with some LLM approaches outperforming them in the identification
of scientific references (category two).
3.2. Task 4b: Scientific Claim Source Retrieval
Task 4b is a retrieval task and was evaluated by the MRR@5 (Mean Reciprocal Rank) score. BM25
ranking using the title and abstract of the papers and the text of the X posts serves as the baseline
with an MMR@5 of 0.43. The best-performing team reached an MMR@5 of 0.68.</p>
        <p>In total, 30 teams participated in task 4b. Table 7 provides an overview of the diferent approaches and
their performance for teams that submitted a paper description of their work. Most teams relied on a
combination of retrieval methods (dense, sparse, or both) and re-ranking models. Retrieval methods
included both lexical and semantic methods. LLMs such as ChatGPT, LLaMa, and Gemma were mainly
used as re-rankers, but did not always outperform fine-tuned transformer-based models. Additionally,
some teams experimented with data augmentation and style transfer techniques.</p>
        <p>
          Team AIRwaves [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] employed a two-stage pipeline using neural representation learning for
can■ ■
a
T
R
E
B
o
R
r
e
t
t
i
w
T
■ ■
■
s
M
L
L
■
s
r
e
h
t
O
■
n
o
i
t
a
t
n
e
m
g
u
A
a
t
a
D
e
l
b
m
e
s
n
        </p>
        <p>E
■
■
■
■
■ ■
■
s
n
o
i
t
a
z
i
m
i
t
p
O
r
e
h
t
O
e
r
o
c
S
1
F
.
g
v
a
o
r
c
a</p>
        <p>M
didate generation with a fine-tuned E5-large model, followed by neural re-ranking with a SciBERT
cross-encoder to re-order the top predictions. Incorporating one additional BM-25-mined hard negative
example per query improved the performance significantly.</p>
        <p>
          Team Deep Retrieval [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] combined lexical BM25-based retrieval with a semantic search-based
approach using an INF-Retriever-v1 retrieval model to generate candidates, which were then re-ranked
with a BAAI/bge-reranker-v2-gemma cross-encoder. While the semantic retrieval outperformed the
lexical BM-25 approach, combining and re-ranking the generated candidates from both approaches
yielded the best performance.
        </p>
        <p>
          Team ATOM [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] explored diferent retriever and re-ranking models. For their final submission,
they used a GTR-T5-Large model to retrieve candidates, followed by the MXBAI-base-v2 re-ranker.
The team further experimented with enriching the collection set with full-texts, but did not find it to
improve performance compared to using the provided abstracts.
        </p>
        <p>
          Team SBU-SCIRE [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] used a Snowflake/snowflake-arctic-embed-l-v2.0 model for dense retrieval,
followed by an ms-marco-MiniLM-L4-v2 cross-encoder for re-ranking. To improve the training
efectiveness of the dense retriever, they used a strategically sampled set of nine hard negative examples per
query.
        </p>
        <p>
          Team SeRRa [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] used a multi-step pipeline including dense retrieval for candidate generation with
a Sentence-BERT model, re-ranking using a SciBERT-based binary classification model, and a final
ranking through pairwise comparisons of the top 10 re-ranked documents with the input claim using
a ModernBERT model. In addition, they evaluated the efect of hard negative sampling for the final
re-ranking step and found a significant improvement in performance.
        </p>
        <p>
          Team Claim2Source [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] systematically evaluated the impact of seven diferent style transfer
techniques applied to both claims and source documents using a LLaMa 3.3-70B-Instruct model. The
style transfer techniques involved three strategies for rewriting claims, such as converting informal
posts into more formal language, and four strategies for transforming publications, e.g., writing a
concise social media post based on a scientific abstract. They observed that making claims more formal
tended to help retrieval, but rewriting the titles and abstracts of the publications usually degraded the
performance. Still, for their final approach, they relied on dense retrieval with a GritLM-7B model
without applying any style transfer.
        </p>
        <p>
          Team DS@GT [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] explored eight diferent two-stage retrieval and re-ranking pipelines and
investigated the efect of rewriting posts in formal language using ChatGPT. They found that appending
a formal paraphrase to the original post slightly improved the re-ranking performance. Their final
approach combined BM25-retrieval with a T5 model for reranking.
■ ■
■
r
e
f
s
n
a
r
t
e
l
y
t
S
0.67
0.66
0.66
0.65
0.61
0.59
0.58
0.43
        </p>
        <p>Overall, most teams followed a two-stage approach, combining a dense retrieval step with neural
re-ranking. Strategically sampling hard negative samples led to performance gains, while results from
applying style transfer techniques were mixed.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Related Work</title>
      <p>
        4.1. Task 4a: Scientific Web Discourse Detection
Scientific web discourse has been studied by various disciplines, including social sciences, measuring the
engagement with scientific publications on social media [
        <xref ref-type="bibr" rid="ref32 ref33 ref34">32, 33, 34</xref>
        ], as well as NLP research, detecting
and verifying scientific claims [
        <xref ref-type="bibr" rid="ref35 ref36 ref37 ref38 ref5">35, 36, 37, 38, 5</xref>
        ]. To facilitate such research, high-quality datasets with
robust definitions are crucial. While existing resources often target specific domains [
        <xref ref-type="bibr" rid="ref35 ref37">37, 35</xref>
        ], or generate
synthetic claims [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ], we follow the domain-agnostic definitions of [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and extend their SciTweets
corpus to provide 1,606 X posts in total.
      </p>
      <p>
        From a computational perspective, task 4a includes the detection of claims, which has been studied
in prior work [
        <xref ref-type="bibr" rid="ref35 ref37 ref40">35, 40, 41, 37</xref>
        ]. For example, in a previous iteration of the CheckThat! lab, the goal was
to identify relevant claims in tweets [41]. More closely related to scientific claims, Wuhrl and Klinger
[
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] detect claims in tweets from the biomedical domain. In contrast, our task involves the detection of
scientific claims independent of the scientific field. Furthermore, to the best of our knowledge, the two
other subtasks of task 4a, the detection of scientific references and scientific contexts, have not been
addressed in prior work.
4.2. Task 4b: Scientific Claim Source Retrieval
The task of scientific claim source retrieval is closely related to the problem of evidence retrieval in
automated fact-checking [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], as explored by previous research [
        <xref ref-type="bibr" rid="ref39">42, 39, 43, 44</xref>
        ]. For example, the FEVER
shared task asked participants to retrieve evidence from Wikipedia for human-authored claims [42].
In the previous edition of the CheckThat! lab 2024, a similar task was introduced: given a rumour,
evidence tweets from authority Twitter accounts should be retrieved [43]. However, existing works are
diferent because they use synthetic claims [ 42], claims that originate from diferent sources, such as
scientific publications [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ], or evidence from sources other than scientific publications. Furthermore,
previous evidence retrieval tasks focus on retrieving any relevant evidence useful to fact-check a claim,
whereas we are interested in finding the one piece of evidence that is most likely the basis for the stated
claim. The original source that a claim is based on has been shown to be one of the primary pieces
of evidence used by fact-checkers [45]. Another related line of work is the retrieval of publications
referred to by news articles [46, 47, 48]. While similar to the implicit references in our dataset, we
assume the references in news articles are more formal and have a larger context.
      </p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion and Future Work</title>
      <p>We presented an overview of Task 4 of the CheckThat! lab at CLEF 2025, which comprised two
subtasks: identifying and distinguishing between diferent forms of scientific web discourse (task 4a),
and retrieving the scientific publication given a social media post with an implicit reference (task 4b).</p>
      <p>
        For task 4a, most teams relied on fine-tuning transformer-based models, with some exploring LLMs
for data augmentation as well as for zero- and few-shot classification. In task 4b, two-stage retrieval
pipelines were used by many teams, including the use of various LLMs for candidate generation
and reranking. The highest-ranked team for task 4a, team ClimateSense [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] fine-tuned a
twitterroberta-base-2022-154m model and, for each category, used the best-performing checkpoint to extract
embeddings for training a traditional classifier using a weighted loss function. Team AIRwaves [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ],
the top-ranked team with a system description paper and second overall in task 4b, implemented a
two-stage pipeline including candidate generation with a fine-tuned E5-large model and reranking the
top predictions with a SciBERT-based cross-encoder. While these teams achieved an F1-score of 0.80
and an MRR@5 score of 0.67, there is still room for improvement across both tasks.
      </p>
      <p>In total, 40 teams submitted their predictions, and 13 submitted system description papers, attracting
considerable interest. In future iterations of these tasks, we plan to expand the languages covered
(e.g., including French and German), include additional online discourse platforms such as Telegram or
Bluesky, and incorporate more realistic scenarios (e.g., moving beyond the COVID-19 domain in task
4b).</p>
    </sec>
    <sec id="sec-5">
      <title>6. Acknowledgments</title>
      <p>This work has been funded by the AI4Sci grant (co-funded by MESRI (France, grant UM-211745), BMBF
(Germany, grant 01IS21086), and the French National Research Agency (ANR)), as well as the Leibniz
Association as part of the Leibniz Collaborative Excellence funding programme (grant no K490/2022).</p>
    </sec>
    <sec id="sec-6">
      <title>7. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT, Grammarly to perform the following
tasks: "Grammar and spelling check", "Paraphrase and reword". After using these tools/services, the
authors reviewed and edited the content as needed and take full responsibility for the publication’s
content.
15th Annual Meeting of the Forum for Information Retrieval Evaluation (2023). URL: https://api.
semanticscholar.org/CorpusID:264820358.
[41] P. Nakov, A. Barrón-Cedeño, G. D. S. Martino, F. Alam, R. Míguez, T. Caselli, M. Kutlu, W. Zaghouani,
C. Li, S. Shaar, H. Mubarak, A. Nikolov, Y. S. Kartal, Overview of the clef-2022 checkthat! lab task
1 on identifying relevant claims in tweets, in: Conference and Labs of the Evaluation Forum, 2022.</p>
      <p>URL: https://api.semanticscholar.org/CorpusID:251472020.
[42] J. Thorne, A. Vlachos, O. Cocarascu, C. Christodoulopoulos, A. Mittal, The fact extraction and
verification (fever) shared task, ArXiv abs/1811.10971 (2018). URL: https://api.semanticscholar.org/
CorpusID:53645946.
[43] F. Haouari, T. Elsayed, R. Suwaileh, Overview of the clef-2024 checkthat! lab task 5 on rumor
verification using evidence from authorities, in: Conference and Labs of the Evaluation Forum,
2024. URL: https://api.semanticscholar.org/CorpusID:271771424.
[44] J. Chen, G. Kim, A. Sriram, G. Durrett, E. Choi, Complex claim verification with evidence
retrieved in the wild, ArXiv abs/2305.11859 (2023). URL: https://api.semanticscholar.org/CorpusID:
258822852.
[45] M. Glockner, Y. Hou, I. Gurevych, Missing counter-evidence renders nlp fact-checking unrealistic
for misinformation, ArXiv abs/2210.13865 (2022). URL: https://api.semanticscholar.org/CorpusID:
253107194.
[46] J. Wang, B. Yu, News2pubmed: A browser extension for linking health news to medical literature,
in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development
in Information Retrieval, 2021, pp. 2605–2609.
[47] K. Kousha, M. Thelwall, An automatic method to identify citations to journals in news stories:
A case study of uk newspapers citing web of science journals, Journal of Data and Information
Science 4 (2019) 73–95.
[48] J. Ravenscroft, A. Clare, M. Liakata, Harrigt: Linking news articles to scientific literature, in:</p>
      <p>Proceedings of ACL, 2018, p. 19.
[49] G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 - Conference and Labs
of the Evaluation Forum, CLEF 2025, Madrid, Spain, 2025.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dunwoody</surname>
          </string-name>
          ,
          <article-title>Science journalism: Prospects in the digital age, in: Routledge handbook of public communication of science and technology</article-title>
          , Routledge,
          <year>2021</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brüggemann</surname>
          </string-name>
          , I. Lörcher,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walter</surname>
          </string-name>
          ,
          <article-title>Post-normal science communication: exploring the blurring boundaries of science and journalism</article-title>
          ,
          <source>Journal of Science Communication</source>
          <volume>19</volume>
          (
          <year>2020</year>
          )
          <article-title>A02</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Przybyła</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          , et al.,
          <article-title>The clef-2024 checkthat! lab: Check-worthiness, subjectivity, persuasion, roles, authorities, and adversarial robustness</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>449</fpage>
          -
          <lpage>458</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Harwood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chillrud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ananthram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mckeown</surname>
          </string-name>
          ,
          <article-title>Check-covid: Factchecking covid-19 news claims with scientific evidence, in: Findings of the Association for Computational Linguistics: ACL</article-title>
          <year>2023</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>14114</fpage>
          -
          <lpage>14127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Corney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barron-Cedeno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          , G. Da San Martino, et al.,
          <string-name>
            <surname>Automated</surname>
          </string-name>
          fact
          <article-title>-checking for assisting human fact-checkers</article-title>
          , in: IJCAI,
          <source>International Joint Conferences on Artificial Intelligence</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>4551</fpage>
          -
          <lpage>4558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vosoughi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Aral,</surname>
          </string-name>
          <article-title>The spread of true and false news online</article-title>
          , science
          <volume>359</volume>
          (
          <year>2018</year>
          )
          <fpage>1146</fpage>
          -
          <lpage>1151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Garimella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. F.</given-names>
            <surname>Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gionis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mathioudakis</surname>
          </string-name>
          ,
          <article-title>Quantifying controversy on social media</article-title>
          ,
          <source>ACM Transactions on Social Computing</source>
          <volume>1</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y. M.</given-names>
            <surname>Rocha</surname>
          </string-name>
          , G. A. de Moura,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Desidério</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. H. de Oliveira</surname>
            ,
            <given-names>F. D.</given-names>
          </string-name>
          <string-name>
            <surname>Lourenço</surname>
          </string-name>
          , L. D. de Figueiredo Nicolete,
          <article-title>The impact of fake news on social media and its influence on health during the covid-19 pandemic: A systematic review</article-title>
          ,
          <source>Journal of Public Health</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Papastergiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bringay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <article-title>An in-depth analysis of the linguistic characteristics of science claims on the web and their impact on factchecking</article-title>
          ,
          <source>ACM Trans. Web</source>
          (
          <year>2025</year>
          ). URL: https://doi.org/10.1145/3746170. doi:
          <volume>10</volume>
          .1145/3746170, just Accepted.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Kelkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Logan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Majhail</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pemmaraju</surname>
          </string-name>
          ,
          <article-title>The democratization of scientific conferences: Twitter in the era of covid-19 and beyond</article-title>
          ,
          <source>Current hematologic malignancy reports 16</source>
          (
          <year>2021</year>
          )
          <fpage>132</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kreps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kriner</surname>
          </string-name>
          ,
          <article-title>Model uncertainty, political contestation, and public trust in science: evidence from the covid-19 pandemic</article-title>
          . sci.
          <source>adv</source>
          .
          <volume>6</volume>
          ,
          <issue>eabd4563</issue>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>August</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Card</surname>
          </string-name>
          , G. Hsieh,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Reinecke</surname>
          </string-name>
          ,
          <article-title>Explain like i am a scientist: The linguistic barriers of entry to r/science</article-title>
          , in
          <source>: Proceedings of the 2020 CHI conference on human factors in computing systems</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Chandrasekharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Samory</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jhaver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Charvat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bruckman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lampe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Gilbert,</surname>
          </string-name>
          <article-title>The internet's hidden rules: An empirical study of reddit norm violations at micro, meso, and macro scales</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>2</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Venktesh</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! Lab: Subjectivity, fact-checking, claim normalization, and retrieval</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bringay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <article-title>Scitweets-a dataset and annotation framework for detecting scientific online discourse</article-title>
          ,
          <source>in: Proceedings of the 31st ACM International Conference on Information &amp; Knowledge Management</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3988</fpage>
          -
          <lpage>3992</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jacot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bringay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <article-title>Disambiguation of implicit scientific references on x</article-title>
          ,
          <source>Proceedings of the 36th ACM Conference on Hypertext and Social Media</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chandrasekhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Katsis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Kinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Merrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mooney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Murdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sheehan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Wade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. X. R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wilhelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Raymond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kohlmeier</surname>
          </string-name>
          , CORD-
          <volume>19</volume>
          : The COVID-19 open research dataset,
          <source>in: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          . URL: https://www.aclweb.org/anthology/
          <year>2020</year>
          .nlpcovid19-acl.1.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Burel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lisena</surname>
          </string-name>
          , E. Daga,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          , H. Alani, ClimateSense at CheckThat! 2025:
          <article-title>Combining Fine-tuned Large Language Models and Conventional Machine Learning Models for Subjectivity and Scientific Web Discourse Analysis</article-title>
          ,
          <source>in: [49]</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tunstall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U. E. S.</given-names>
            <surname>Jo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wasserblat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pereg</surname>
          </string-name>
          ,
          <article-title>Eficient few-shot learning without prompts</article-title>
          ,
          <source>ArXiv abs/2209</source>
          .11055 (
          <year>2022</year>
          ). URL: https://api.semanticscholar.org/ CorpusID:252439001.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Serrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Martinez Santos</surname>
          </string-name>
          , E. Puertas, VerbaNexAI Lab at CheckThat! 2025:
          <article-title>Fine-Tuning DeBERTa for Multi-Label Scientific Discourse Detection in Tweets</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Thapliyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samridh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          , SBU-SCIRE at CheckThat! 2025:
          <article-title>Bridging Social Media, Scientific Discourse, and Scientific Literature</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>R. B.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
          </string-name>
          ,
          <article-title>Focal loss for dense object detection</article-title>
          ,
          <source>2017 IEEE International Conference on Computer Vision</source>
          (ICCV) (
          <year>2017</year>
          )
          <fpage>2999</fpage>
          -
          <lpage>3007</lpage>
          . URL: https://api. semanticscholar.org/CorpusID:47252984.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Truong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schofield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heil</surname>
          </string-name>
          ,
          <string-name>
            <surname>DS</surname>
          </string-name>
          @GT at CheckThat! 2025:
          <article-title>Ensemble Methods for Detection of Scientific Discourse on Social Media</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T.</given-names>
            <surname>Saraç</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mergen</surname>
          </string-name>
          , M. Kutlu, TurQUaz at CheckThat! 2025:
          <article-title>Debating Large Language Models for Scientific Web Discourse Detection</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pal</surname>
          </string-name>
          , JU_NLP at CheckThat! 2025:
          <article-title>Leveraging Hybrid Embeddings for Multi-Label Classification in Scientific Social Media Discourse</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ashbaugh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Baumgärtner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Greß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          , D. Werner, AIRwaves at CheckThat! 2025:
          <article-title>Retrieving Scientific Sources for Implicit Claims on Social Media with Dual Encoders and Neural Re-Ranking</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Sager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kamaraj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Grewe</surname>
          </string-name>
          , T. Stadelmann, Deep Retrieval at CheckThat! 2025:
          <article-title>Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval</article-title>
          and Re-Ranking, in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Staudinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Ebshihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Kusa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , ATOM at CheckThat! 2025:
          <article-title>Retrieve the Implicit - Scientific Evidence Retrieval</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marchetti</surname>
          </string-name>
          , G. Rocha,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Cardoso</surname>
          </string-name>
          , Team SeRRa at CheckThat! 2025:
          <article-title>Sequential Re-Ranking in a Scientific Claim Source Retrieval Pipeline</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schreieder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          , Claim2Source at CheckThat! 2025:
          <article-title>Zero-Shot Style Transfer for Scientific Claim-Source Retrieval</article-title>
          , in: [49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schofield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Truong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heil</surname>
          </string-name>
          ,
          <string-name>
            <surname>DS</surname>
          </string-name>
          @GT at CheckThat! 2025:
          <article-title>Exploring Retrieval and Reranking Pipelines for Scientific Claim Source Retrieval on Social Media Discourse</article-title>
          , in: [ 49],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carlson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <article-title>Quantifying and contextualizing the impact of biorxiv preprints through automated social media audience segmentation</article-title>
          ,
          <source>PLoS Biology</source>
          <volume>18</volume>
          (
          <year>2020</year>
          )
          <article-title>e3000860</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>R.</given-names>
            <surname>Haunschild</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bornmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Potnis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Tahamtan</surname>
          </string-name>
          ,
          <article-title>Investigating dissemination of scientific information on twitter: A study of topic networks in opioid publications, Quantitative Science Studies (</article-title>
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Díaz-Faes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Costas</surname>
          </string-name>
          ,
          <article-title>Towards a second generation of 'social media metrics': Characterizing twitter communities of attention around science</article-title>
          ,
          <source>PloS one 14</source>
          (
          <year>2019</year>
          )
          <article-title>e0216408</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wuhrl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Klinger</surname>
          </string-name>
          ,
          <article-title>Claim detection in biomedical twitter posts</article-title>
          ,
          <source>in: Workshop on Biomedical Natural Language Processing</source>
          ,
          <year>2021</year>
          . URL: https://api.semanticscholar.org/CorpusID:233387769.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>I.</given-names>
            <surname>Srba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pecher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tomlein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Moro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stefancova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Simko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bielikova</surname>
          </string-name>
          ,
          <article-title>Monant medical misinformation dataset: Mapping articles to fact-checked claims</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2949</fpage>
          -
          <lpage>2959</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>P.</given-names>
            <surname>Smeros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Castillo</surname>
          </string-name>
          , K. Aberer,
          <article-title>SciClops: Detecting and Contextualizing Scientific Claims for Assisting Manual Fact-Checking</article-title>
          ,
          <source>Proceedings of the 30th ACM International Conference on Information &amp; Knowledge Management</source>
          (
          <year>2021</year>
          )
          <fpage>1692</fpage>
          -
          <lpage>1702</lpage>
          . URL: http://arxiv.org/abs/2110.13090. doi:
          <volume>10</volume>
          .1145/3459637.3482475, arXiv:
          <fpage>2110</fpage>
          .
          <fpage>13090</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Arslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Caraballo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jimenez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gawsane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joseph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Nayak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sable</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Tremayne, ClaimBuster: the first-ever end-toend fact-checking system</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>10</volume>
          (
          <year>2017</year>
          )
          <fpage>1945</fpage>
          -
          <lpage>1948</lpage>
          . URL: https://dl.acm.org/doi/10.14778/3137765.3137815. doi:
          <volume>10</volume>
          .14778/3137765.3137815.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wadden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Zuylen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <article-title>Fact or fiction: Verifying scientific claims</article-title>
          ,
          <source>in: Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2020</year>
          . URL: https://api.semanticscholar.org/CorpusID:216867133.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          , T. Chakraborty,
          <article-title>Overview of the claimscan-2023: Uncovering truth in social media through claim detection and identification of claim spans</article-title>
          ,
          <source>Proceedings of the</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>