<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SCIRE at CheckThat! 2025: Bridging Social Media, Scientific Discourse, and Scientific Literature</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Parth Manish Thapliyal</string-name>
          <email>pthapliyal@cs.stonybrook.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ritesh Sunil Chavan</string-name>
          <email>riteshsunil.chavan@stonybrook.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samridh Samridh</string-name>
          <email>samridh.samridh@stonybrook.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chaoyuan Zuo</string-name>
          <email>zuocy@nankai.edu.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ritwik Banerjee</string-name>
          <email>rbanerjee@cs.stonybrook.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science, Stony Brook University</institution>
          ,
          <addr-line>100 Nicolls Rd, Stony Brook, New York</addr-line>
          ,
          <country country="US">United States of America</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Journalism and Communication, Nankai University</institution>
          ,
          <addr-line>38 Tongyan Road, Jinnan District, Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The increasing prominence of scientific discourse on social media platforms presents both unprecedented opportunities for public engagement and significant risks of misinformation. While scientific claims, references to publications, and mentions of research entities proliferate rapidly, current platforms lack robust mechanisms to validate their veracity or trace implicit sources. Manual identification and sourcing of such content is impractical at scale, and although computational methods exist for generic fact-checking or citation retrieval, they often fail to address the unique challenges of noisy, abbreviated social media language - particularly the detection of nuanced scientific discourse and the retrieval of publications from implicit, non-URL references. In this paper, we propose a unified framework tackling two critical tasks: (1) detection of scientific web discourse , where we identify tweets containing scientific claims, references or research entities, using a combination of natural language augmentation and supervised learning; and (2) source retrieval for scientific claims , employing a two-stage dense retrieval and re-ranking pipeline to link implicit mentions of sources to their actual publications from candidate pools. Our multi-stage architecture first filters and classifies scientific content, then prioritizes and resolves latent citations. Evaluations on a curated dataset provided by the CLEF-2025 CheckThat! Lab demonstrate the efectiveness of our approach, achieving significant improvements across both tasks. This work provides essential tools for automating scientific credibility assessment and aiding the verification of scientific information in online ecosystems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data Augmentation</kwd>
        <kwd>Dense Retrieval</kwd>
        <kwd>Re-ranking</kwd>
        <kwd>Cross-encoder</kwd>
        <kwd>Bi-encoder</kwd>
        <kwd>Large language model</kwd>
        <kwd>Transformer</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        It is increasingly evident that scientific discourse now permeates the fabric of social media [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ],
ofering transformative potential for public engagement while simultaneously amplifying the risks of
misinformation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] – including (but not limited to) acts of selective omission [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], fear-mongering [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
and use of misleading contexts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For the average user, social media platforms like Twitter/X have
become conduits for encountering scientific claims, references to publications, and mentions of research
entities. Discerning the veracity and validity of such content remains daunting, however. Users
naturally gravitate toward information confirming their existing beliefs [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], and this confirmation
bias, compounded by the breakneck speed of scientific information difusion, creates fertile ground for
unverified claims to proliferate as accepted facts [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. While correcting cognitive biases is a profound
societal challenge, gleaning the scientific validity of claims being shared ofers a more pragmatic path
forward, where progress can be objectively measured.
      </p>
      <p>
        Comprehensive manual validation of scientific discourse is infeasible at social media scale.
Computational approaches have emerged, yet existing solutions remain inadequate. Prior work in generic
fact-checking often overlooks scientific nuance, while citation retrieval systems typically rely on explicit
URLs or DOIs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] – failing to address the prevalence of implicit, unstructured references (e.g., “a recent
Nature study shows . . . ”). Early studies – e.g., Levy et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Cazalens et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] – focused on
coarse-grained claim detection but lacked mechanisms to prioritize content or resolve latent sources.
Subsequent eforts made significant progress by ranking the “check-worthiness” of claims [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ],
and employed increasingly sophisticated models for citation matching [16, 17]. Yet, they often struggled
with the shifting landscape of social media lexicon and the unique brevity seen on such platforms [18].
Critically, the synergistic tasks of detecting scientific discourse and retrieving its implicit sources remain
underexplored, with no unified framework addressing both. The CLEF CheckThat! Lab’s earlier datasets
were pioneers in identifying significant gaps in handling scientific specificity and source ambiguity [ 19].
      </p>
      <p>In this work, we bridge this divide by proposing an integrated pipeline for scientific information
curation. Unlike prior methods constrained by narrow scope of domain or dataset, our approach
leverages natural language augmentation and dense retrieval to confront the dual challenges of noisy
context and implicit references. We focus on two pillars: (1) detecting scientific web discourse –
identifying tweets containing claims, publication references, or research entities; and (2) retrieving
sources for scientific claims – linking implicit mentions to actual publications via candidate pools.
Our architecture deliberately prioritizes scalability and robustness, filtering scientific content before
resolving its provenance. We use the task formulation, data, and evaluation framework provided by the
fourth task of CLEF-2025 CheckThat! Lab, viz., Scientific Web Discourse [20].</p>
    </sec>
    <sec id="sec-2">
      <title>2. A Quick Overview of Tasks and Dataset</title>
      <p>Task 1 : Scientific Web Discourse Detection: This task focuses on identifying scientific content
within social media posts by detecting three key elements: (1) explicit scientific claims, (2) references to
research publications or studies, and (3) mentions of scientific entities like researchers or institutions.
It addresses the critical challenge of distinguishing authentic scientific discourse from general online
conversations, particularly in high-impact domains like COVID-19 and climate change where
misinformation risks are elevated. Current research is hindered by inconsistent definitions of science-related
content and a scarcity of annotated data to train detection models.</p>
      <p>Task 2 : Scientific Claim Source Retrieval: This complementary task tackles the problem of linking
implicit scientific references in social media to their source publications. Given tweets mentioning
research without direct URLs or identifiers, the goal is to accurately retrieve the referenced papers
from candidate pools. This addresses a key verification bottleneck in online scientific discourse, where
informal mentions (e.g., “a recent Harvard study shows . . . ”) lack traceable citations yet require validation
against original research to combat misinformation. The absence of standardized datasets for implicit
citation resolution remains a significant research gap.</p>
      <p>Datasets: The dataset construction reflects the complexity of real-world scientific social media
discourse, with tweet texts deliberately paraphrased for compliance while preserving linguistic authenticity.
For the first task, the corpus comprises 1,229 training samples, 137 development samples, and 240 test
instances, with each tweet annotated using binary labels across the three target categories. The
evaluation employs macro-averaged 1-scores to account for class imbalance inherent in scientific discourse
detection. The second task leverages the CORD-19 publication database as its candidate pool, containing
comprehensive metadata including study titles, abstracts, venues, and author information. Training
instances consist of tweet-publication pairs where implicit references (e.g., “published in Nature” or
“recent Stanford research”) must be resolved to specific papers identified by CORD IDs. Performance
assessment utilizes Mean Reciprocal Rank at 5 (MRR@5), emphasizing the practical importance of
surfacing correct sources within top-ranked results for human verification workflows.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology: Scientific Web Discourse Detection</title>
      <sec id="sec-3-1">
        <title>3.1. Data Preparation and Augmentation</title>
        <p>To address data scarcity and enhance linguistic diversity, we employed DeepSeek-R1 [21] for
paraphrasebased augmentation, efectively doubling the original training set to 2,369 samples. To mitigate class
imbalance, we implemented two complementary strategies: (i) preservation of all positive samples,
containing at least one scientific indicator, and (ii) random undersampling of 50% of negative samples
(no scientific indicators). This yielded a balanced subset of 1,663 samples for initial experiments. The
full augmented corpus was reserved for cross-validation studies.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Architecture and Training Framework</title>
        <p>We adopted DeBERTa-v3-large [22, 23] as our base architecture for its superior contextual representation
capabilities, appending a multi-label classification head with three sigmoid-activated outputs
corresponding to our target classes: (i) scientific claims, (ii) study/publication references, and (iii) scientific
entity mentions. All layers remained unfrozen during fine-tuning to enable full domain adaptation.</p>
        <sec id="sec-3-2-1">
          <title>Loss function formulation:</title>
          <p>To tackle the remaining class imbalance, the multi-label facet of this
task, as well as to address learning from hard examples, we employed focal loss [24] in two sets of
experiments: (1) with fixed  parameters, set to 0.7 for each one of the three classes; and (2) with
dynamic weights, where  parameters are dynamically computed based on the class distribution of the
fully augmented training data.</p>
          <p>In both sets of experiments, we use a fixed  = 2. On the other hand, the class-adaptive  weights,
  =</p>
          <p>total samples
num classes × freq()
,
is a scaling that inversely weights each class by its occurrence frequency, thereby increasing penalty for
minority classes. For the second strand of experiments, these values are recomputed per epoch based
on current batch statistics.</p>
          <p>recomputation; and</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Training protocols:</title>
          <p>Three training paradigms were progressively explored:
1. single-model training on under-sampled data (1,663 samples) with fixed  = 0.7 for each class;
2. fully augmented data training on the augmented corpus (2,369 samples) with dynamic 
×
3. stratified 5-fold cross-validation</p>
          <p>with fold-specific dynamic computation of  weights.</p>
          <p>All runs used the Adam optimization with linear warm-up and cosine decay. The key hyperparameters
included a batch size of 16 (undersampled) and 8 (fully augmented data); a learning rate  = 2
for fixed-  , and  = 1
10− 5 for dynamic  ; and early stopping criteria where the best checkpoint
×
10− 5
was selected by the highest macro-1 score on validation data.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Threshold optimization strategy:</title>
          <p>Given the multi-label nature of the task, we replaced the default
0.5 decision threshold with a systematic per-class optimization:
(1) we generated precision-recall curves for each class, and
(2) identified probability thresholds maximizing 1 scores, done as follows:
 *  = argmax

︂( 2 · ( ) · ( ) ︂)
( ) + ( )
where  and  denote class- precision/recall at threshold  .</p>
          <p>Cross-validation ensemble: For our final architecture, we implemented a 5-model ensemble by:
1. training five independent DeBERTa-v3-large instances on mutually exclusive folds;
2. performing inference via logit averaging:</p>
          <p>⎛ 5 ⎞
ˆ() =  ⎝ 51 ∑︁ logits()⎠ ; and</p>
          <p>=1
3. applying class-specific threshold optimization to ensemble probabilities.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Evaluation framework</title>
        <p>Model performance was assessed using macro-averaged 1 across all classes, with complementary
analysis of per-class precision/recall. All thresholds were optimized exclusively on the development set
to prevent data leakage. The final benchmark was the test set available from the CLEF-2025 CheckThat!
Lab’s scientific web discourse task [20].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology: Scientific Claim Source Retrieval</title>
      <p>In this task, given a tweet containing an implicit reference to a scientific publication (without URL
identifiers), we frame source retrieval as a ranking problem over candidate papers. Our solution employs
a two-stage pipeline:
1. a dense retrieval step for eficient candidate screening from a large pool of publications; and
2. a neural re-ranking step for precision-guided refinement of top candidates.</p>
      <sec id="sec-4-1">
        <title>4.1. Dense Retrieval</title>
        <p>The embedding model: We used the Snowflake/snowflake-arctic-embed-l-v2.0 [ 25] dense retriever
to encode both tweets and research papers. Instead of the entire article, only the concatenation of title
and abstract were used to represent the input. On the other hand, tweets were represented by their
raw text, with casing and punctuation preserved. We used cosine distance between L2-normalized
embeddings as the similarity metric.</p>
        <p>Training optimization: We split the data into 85% for training and 15% for evaluation, with fixed
random seed for reproducibility. This deviation from the original training-evaluation split in the dataset
was done to allow for an expansion of the training set and better focus on improving training accuracy
while minimizing overfitting. It is worth noting that the final evaluation of the system was conducted
on a completely separate test set</p>
        <p>Our negative sampling strategy constructs informative triplets by combining one positive paper with
nine strategically selected negatives per query: retrieval hard negatives (top-ranked incorrect papers
from initial retrieval), semantic hard negatives (high-similarity non-relevant papers identified through
paper-paper similarity queries), and soft negatives (contextually relevant papers ranked below position
5 in initial results). We employ a fixed ratio of 3:3:3 for these three types of negatives, ensuring the
model learns to distinguish subtle diferences between correct matches and challenging distractions at
multiple relevance tiers.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Neural Re-ranking</title>
        <p>
          The re-ranking phase employs a cross-encoder architecture based on ms-marco-MiniLM-L4-v2,1
which processes tweet-paper pairs through concatenated input sequences formatted as
[CLS] tweet [SEP] paper_text [SEP]. We preserve raw text casing and punctuation
1https://huggingface.co/cross-encoder/ms-marco-MiniLM-L4-v2
while applying strict 512-token truncation—deliberately avoiding lemmatization or normalization to
maintain linguistic authenticity. This architecture outputs a continuous relevance score ∈ [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] for
each candidate pair. To balance recall and MRR objectives, we re-rank only the top-20 candidates from
the dense retriever—a cutof empirically determined to optimize positional sensitivity while avoiding
diminishing returns from deeper list exploration.
        </p>
        <p>Baseline implementation and experimental setup: For comparative evaluation, we implemented
sparse retrieval baselines (BM25/BM25+) with full preprocessing pipelines comprising the removal
of function words, stemming, punctuation stripping, and then indexing complete paper texts (title +
abstract). We further tested hybrid pipelines combining BM25 with TinyBERT2 re-rankers and
sentencetransformers with cross-encoders. All experiments were conducted on NVIDIA A100 GPUs using
consistent evaluation protocols. The dense retriever (Snowflake-arctic) generated 1024-dimensional
embeddings via SentencePiece tokenization, while the cross-encoder used WordPiece tokenization
and was fine-tuned with AdamW optimization featuring 500-step linear warmup—ensuring stable
convergence during relevance refinement.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <sec id="sec-5-1">
        <title>5.1. Scientific Web Discourse</title>
        <p>The empirical performance details for scientific web discourse detection are shown in Table 1. Our
systematic experiments reveal critical insights into optimizing scientific discourse detection. The
initial approach — combining DeepSeek-R1 paraphrase augmentation with strategic under-sampling
of negative samples (reducing the training set to 1,663 samples), DeBERTa-v3-large fine-tuning, Focal
Loss (fixed  at 0.7 and  at 2), and per-class threshold optimization — achieved a strong macro-F1 of
0.8849. Our result with adaptive decision boundaries is a significant 5.51% improvement over using the
ifxed default threshold (the latter yields a macro-F1 score of 0.8298).
2https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L2</p>
        <p>However, expanding to the full augmented dataset (2,369 samples) with dynamically calculated
 -weights (0.2832, 0.4132, and 0.3036) yielded a slight reduction in performance (0.8615 macro-1),
suggesting that the benefits of additional data volume in this configuration are outweighed by the
curated class balance with undersampling.</p>
        <p>The most substantial gains emerged from our ensemble strategy: 5-fold cross-validation on the full
augmented dataset with fold-specific dynamic  -weighting, followed by logit averaging and threshold
optimization. This approach achieved a state-of-the-art macro-1 of 0.9208, a 3.6% absolute
improvement the best single-model result. The optimal thresholds (0.74, 0.21, 0.41) revealed class-dependent
sensitivity patterns, with scientific claims requiring higher confidence thresholds than publication
references. Once again, default thresholding degraded ensemble performance — to 0.8595, underscoring
that threshold optimization remains indispensable even for sophisticated architectures.</p>
        <p>These findings collectively confirm that model diversity through cross-validation and adaptive
decision boundaries efectively mitigates variability in linguistic expression and class imbalance inherent
to scientific social media discourse.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Scientific Claim Source Retrieval</title>
        <p>Table 2 reports the results of initial ranking with sparse as well as dense encoders. After exploring
various permutations of function word removal, punctuation removal, lemmatization, and stemming, we
found that the best preprocessing steps consisted of function word removal, stemming, and punctuation
removal. Despite earlier studies showing that removal of function words do not significantly afect
retrieval with sparse encoders [26], our results indicate that afects downstream performance. In the
two sparse encoders used in our experiments, BM25+ showed only marginal improvements over BM25,
and hyperparameter tuning did not significantly change retrieval scores.</p>
        <p>Among the sentence transformer bi-encoder models evaluated, multi-qa-mpnet-base-cos-v1 —
a dense representation designed for semantic search — emerged as the top performer, though notably
still underperforming compared to traditional sparse retrieval methods. Other bi-encoders, including
all-MiniLM-L12-v2, multi-qa-distilbert-cos-v1, and all-distilroberta-v1, achieved
marginally lower but comparable results, confirming a consistent performance gap relative to non-neural
baselines. SciBERT [27] proved particularly unsuitable for this task, likely due to its domain-specific
pretraining prioritizing classification objectives over semantic alignment capabilities essential for
retrieval. This performance pattern highlights a critical architectural limitation: standard bi-encoders
struggle to capture the nuanced query-document relationships required for implicit citation resolution.</p>
        <p>In contrast, Snowflake/ snowflake-arctic-embed-l-v2.0, which employs separate specialized
encoders for queries and documents, demonstrated superior efectiveness. This suggests that asymmetric
embedding spaces better accommodate the structural disparity between conversational tweets and
formal scientific text.</p>
        <p>Results of the second stage of the retrieval pipeline — neural re-ranking — are shown in Table 3. The
neural re-ranking results demonstrate a clear performance advantage for the dense retrieval pipeline.
When paired with the ms-marco-MiniLM-L4-v2 cross-encoder, the dense retriever achieved
substantially higher MRR scores (Evaluation: 0.77, Test: 0.65) compared to the sparse retrieval approach
with ms-marco-TinyBERT-L2-v2 cross-encoder (Evaluation: 0.68, Test: 0.55). This 12-13% relative
improvement across both evaluation phases highlights the efectiveness of dense embeddings in
capturing semantic relationships between implicit tweet references and candidate papers. The
performance gap between evaluation and test sets (approximately 0.12 MRR points for both configurations)
suggests consistent generalization behavior, though further analysis would be needed to characterize
the distribution shift. These results validate our architectural choice of dense retrieval coupled with
moderate-sized cross-encoders for optimal precision-recall balance in scientific source resolution.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Comparative Evaluation</title>
        <p>Compared to other systems participating in the CheckThat! Lab at CLEF 2025, our scientific discourse
detection method using ensemble DeBERTa models with adaptive thresholding secured the fourth
position, with 79.17% macro-F1 score. This is the aggregate score across three categories of discourse:
scientific claims, reference to scientific knowledge, and mention of scientific research context. Our
approach had a relatively poor performance in the first category, with 76.42% marco-F1, but a competitive
macro-F1 score of 77.31% in the second category. Our approach excelled in detecting mentions of
scientific research context, topping the leaderboard with a macro-F1 of 83.77%.</p>
        <p>In the second task of source retrieval for implicit scientific claims, our approach based on
SnowflakeArctic dense retrieval and MiniLM cross-encoder re-ranking reported the MMR@5 score of 0.65, securing
the 5th position (out of 30 participants) in the leaderboard.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>Our work establishes an efective framework for combating scientific misinformation on social media
through two synergistic capabilities: (1) precise detection of scientific discourse in tweets via
ensemble DeBERTa models with adaptive thresholding; and (2) robust source retrieval for implicit claims
using Snowflake-Arctic dense retrieval and MiniLM cross-encoder re-ranking. The integration of
strategic negative sampling, dynamic loss weighting, and logit ensemble methods proved critical in
overcoming linguistic noise and class imbalance. This pipeline provides essential infrastructure for
downstream credibility assessment, demonstrating that nuanced scientific communication patterns can
be computationally modeled with high precision.</p>
      <p>We see three promising avenues for future developments. First, extending detection to multi-modal
content (images/videos containing scientific claims) would address growing visual misinformation
vectors. Second, developing domain-adaptive retrieval that dynamically adjusts to emerging scientific
ifelds could prevent vocabulary obsolescence. Third, creating real-time distillation techniques to
compress our ensemble architecture would enable deployment at scale. In conjunction to these avenues,
ethical frameworks for automated credibility scoring require careful development to prevent algorithmic
bias while maintaining scientific rigor. These advancements would transform our pipeline from a
research tool into a practical safeguard for digital scientific discourse.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported in part by a seed award from the AI Innovation Institute (AI3) at Stony Brook
University (State University of New York at Stony Brook).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT for spelling and grammar suggestions
in some sections of this manuscript. After using this tool, the authors reviewed and edited the content
as needed and take full responsibility for the publication’s content.
2020, Association for Computational Linguistics, Online, 2020, pp. 476–488. doi:10.18653/v1/
2020.findings-emnlp.43.
[16] M. Färber, A. Jatowt", Citation recommendation: approaches and datasets, Int J Digit Libr 21 (2020)
371–405. doi:10.1007/s00799-020-00288-2.
[17] V. Viswanathan, G. Neubig, P. Liu, CitationIE: Leveraging the citation graph for scientific
information extraction, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Proceedings of the 59th Annual Meeting
of the Association for Computational Linguistics and the 11th International Joint Conference on
Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics,
2021, pp. 719–731. doi:10.18653/v1/2021.acl-long.59.
[18] D. Rousidis, E. Garoufallou, P. Balatsoukas, K. Paraskeuopoulos, S. Asderi, D. Koutsomiha, Metadata
requirements for repositories in health informatics research: evidence from the analysis of social
media citations, in: Metadata and Semantics Research: 7th Research Conference, MTSR 2013,
Thessaloniki, Greece, November 19-22, 2013. Proceedings 7, Springer, 2013, pp. 246–257.
[19] A. Barrón-Cedeño, F. Alam, T. Chakraborty, T. Elsayed, P. Nakov, P. Przybyła, J. M. Struß, F. Haouari,
M. Hasanain, F. Ruggeri, et al., The clef-2024 checkthat! lab: Check-worthiness, subjectivity,
persuasion, roles, authorities, and adversarial robustness, in: European Conference on Information
Retrieval, Springer, 2024, pp. 449–458.
[20] S. Hafid, Y. S. Kartal, S. Schellhammer, K. Boland, D. Dimitrov, S. Bringay, K. Todorov, S. Dietze,</p>
      <p>Overview of the CLEF-2025 CheckThat! Lab Task 4 on Scientific Web Discourse, 2025.
[21] D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Bi, et al.,
DeepSeekR1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, arXiv preprint
arXiv:2501.12948 (2025).
[22] P. He, X. Liu, J. Gao, W. Chen, DeBERTa: Decoding-enhanced BERT with Disentangled Attention,
in: International Conference on Learning Representations, 2021.
[23] P. He, J. Gao, W. Chen, DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with</p>
      <p>Gradient-Disentangled Embedding Sharing, 2021. arXiv:2111.09543.
[24] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal Loss for Dense Object Detection, in:</p>
      <p>Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
[25] P. Yu, L. Merrick, G. Nuti, D. Campos, Arctic-Embed 2.0: Multilingual Retrieval Without
Compromise, Technical Report, Snowflake Inc., 2024.
[26] A. Trotman, A. Puurula, B. Burgess, Improvements to bm25 and language models examined, in:</p>
      <p>Proceedings of the 19th Australasian Document Computing Symposium, 2014, pp. 58–65.
[27] I. Beltagy, K. Lo, A. Cohan, SciBERT: A pretrained language model for scientific text, in: K. Inui,
J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019,
pp. 3615–3620. doi:10.18653/v1/D19-1371.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hargittai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Füchslin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <source>How Do Young Adults Engage With Science and Research on Social Media? Some Preliminary Findings and an Agenda for Future Research, Social Media + Society</source>
          <volume>4</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1177/2056305118797720.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Höttecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Allchin</surname>
          </string-name>
          ,
          <article-title>Reconceptualizing nature-of-science education in the age of social media</article-title>
          ,
          <source>Science Education</source>
          <volume>104</volume>
          (
          <year>2020</year>
          )
          <fpage>641</fpage>
          -
          <lpage>666</lpage>
          . doi:https://doi.org/10.1002/sce.21575.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Iyengar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Massey</surname>
          </string-name>
          ,
          <article-title>Scientific communication in a post-truth society</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>116</volume>
          (
          <year>2019</year>
          )
          <fpage>7656</fpage>
          -
          <lpage>7661</lpage>
          . doi:
          <volume>10</volume>
          .1073/pnas.1805868115.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Page</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>McKenzie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kirkham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Forbes</surname>
          </string-name>
          ,
          <article-title>Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions, Cochrane Database of Systematic Reviews (</article-title>
          <year>2014</year>
          ). doi:
          <volume>10</volume>
          .1002/ 14651858.MR000035.
          <year>pub2</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wolinsky</surname>
          </string-name>
          ,
          <article-title>Disease mongering and drug marketing: Does the pharmaceutical industry manufacture diseases as well as drugs?</article-title>
          ,
          <source>EMBO Reports 6</source>
          (
          <year>2005</year>
          )
          <fpage>612</fpage>
          -
          <lpage>614</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <article-title>An Empirical Assessment of the Qualitative Aspects of Misinformation in Health News</article-title>
          , in: A.
          <string-name>
            <surname>Feldman</surname>
          </string-name>
          , G. Da San Martino, C. Leberknight, P. Nakov (Eds.),
          <source>Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship</source>
          , Disinformation, and Propaganda, Association for Computational Linguistics, Online,
          <year>2021</year>
          , pp.
          <fpage>76</fpage>
          -
          <lpage>81</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .nlp4if-
          <fpage>1</fpage>
          .11/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .nlp4if-
          <fpage>1</fpage>
          .
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Mothes</surname>
          </string-name>
          , Confirmation Bias,
          <source>The SAGE encyclopedia of political behavior 2</source>
          (
          <year>2017</year>
          )
          <article-title>125</article-title>
          . doi:
          <volume>10</volume>
          . 4135/9781483391144.n61.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Galdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gawronski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Arcuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Friese</surname>
          </string-name>
          ,
          <article-title>Selective exposure in decided and undecided individuals: Diferential relations to automatic associations and conscious beliefs</article-title>
          ,
          <source>Personality and Social Psychology Bulletin</source>
          <volume>38</volume>
          (
          <year>2012</year>
          )
          <fpage>559</fpage>
          -
          <lpage>569</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Moravec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Minas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Dennis</surname>
          </string-name>
          , Fake News on Social Media:
          <article-title>People Believe What They Want to Believe When it Makes No Sense at All</article-title>
          ,
          <source>MIS Quarterly 43</source>
          (
          <year>2019</year>
          )
          <fpage>1343</fpage>
          -
          <lpage>1360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Acharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <article-title>Querying Across Genres for Medical Claims in News</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>1783</fpage>
          -
          <lpage>1789</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>139</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gretz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sznajder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hummel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aharonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Slonim</surname>
          </string-name>
          ,
          <article-title>Unsupervised corpuswide claim detection</article-title>
          , in: I.
          <string-name>
            <surname>Habernal</surname>
            , I. Gurevych,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Ashley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cardie</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Green</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Litman</surname>
            , G. Petasis,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Slonim</surname>
          </string-name>
          , V. Walker (Eds.),
          <source>Proceedings of the 4th Workshop on Argument Mining, Association for Computational Linguistics</source>
          , Copenhagen, Denmark,
          <year>2017</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>84</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W17</fpage>
          -5110.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cazalens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamarre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leblay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Manolescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tannier</surname>
          </string-name>
          ,
          <article-title>A content management perspective on fact-checking</article-title>
          ,
          <source>in: Companion Proceedings of the The Web Conference</source>
          <year>2018</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>565</fpage>
          -
          <lpage>574</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tremayne</surname>
          </string-name>
          ,
          <article-title>Detecting check-worthy factual claims in presidential debates</article-title>
          ,
          <source>in: Proceedings of the 24th acm international on conference on information and knowledge management</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1835</fpage>
          -
          <lpage>1838</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karakas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <article-title>A hybrid recognition system for check-worthy claims using heuristics and supervised learning</article-title>
          ,
          <source>in: CEUR workshop proceedings</source>
          , volume
          <volume>2125</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Augenstein</surname>
          </string-name>
          ,
          <article-title>Claim check-worthiness detection as positive unlabelled learning</article-title>
          , in: T. Cohn,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <article-title>Findings of the Association for Computational Linguistics: EMNLP</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>