<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pascal J. Sager</string-name>
          <email>pascaljosef.sager@uzh.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ashwini Kamaraj</string-name>
          <email>ashwini.kamaraj@uzh.ch</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin F. Grewe</string-name>
          <email>bgrewe@ethz.ch</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thilo Stadelmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Artificial Intelligence, Zurich University of Applied Sciences</institution>
          ,
          <addr-line>Technikumstrasse 71, 8401 Winterthur</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>European Centre for Living Technology</institution>
          ,
          <addr-line>Dorsoduro 3246, 30123 Venice</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Neuroinformatics, ETH Zurich and University of Zurich</institution>
          ,
          <addr-line>Winterthurerstrasse 190, 8057 Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Zurich</institution>
          ,
          <addr-line>Rämistrasse 71, 8006 Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>We present the methodology and results of the Deep Retrieval team for subtask 4b of the CLEF CheckThat! 2025 competition, which focuses on retrieving relevant scientific literature for given social media posts. To address this task, we propose a hybrid retrieval pipeline that combines lexical precision, semantic generalization, and deep contextual re-ranking, enabling robust retrieval that bridges the informal-to-formal language gap. Specifically, we combine BM25-based keyword matching with a FAISS vector store using a fine-tuned INF-Retriever-v1 model for dense semantic retrieval. BM25 returns the top 30 candidates, and semantic search yields 100 candidates, which are then merged and re-ranked via a large language model (LLM)-based cross-encoder.</p>
      </abstract>
      <kwd-group>
        <kwd>Information retrieval</kwd>
        <kwd>Scientific document search</kwd>
        <kwd>Social media fact verification</kwd>
        <kwd>Fact-checking</kwd>
        <kwd>CLEF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the age of online misinformation, tracing social media claims back to their original scientific sources
is crucial for automated fact-checking and evidence-based verification [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. However, this task is
inherently challenging due to the linguistic and structural gap between informal, user-generated content
and formal scientific literature. Social media posts often paraphrase, summarize, or loosely reference
scientific findings, rarely using standardized terminology or explicit citations. These ambiguities make
it dificult to reliably identify the corresponding scientific publications.
      </p>
      <p>
        Bridging this gap requires retrieval systems that can handle domain-specific vocabulary, implicit
references, and abstract semantics [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Subtask 4b of the CLEF CheckThat! 2025 competition [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ]
exemplifies this challenge, focusing on retrieving scientific sources for social media claims. Figure
1
illustrates the task and our proposed solution: A hybrid retrieval pipeline designed specifically for
cross-domain scientific source retrieval. Our method integrates:
1. Lexical retrieval with BM25 [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] to capture explicit term overlap (e.g., named entities,
keywords);
2. Semantic retrieval using a FAISS-based [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] vector store to compare dense embeddings obtained
with a fine-tuned INF-Retriever-v1 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] model, enabling the detection of semantic overlaps;
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
3. Re-ranking with a large language model (LLM)-based cross-encoder [11, 12], which jointly
encodes and scores pairs of social media posts and documents to refine relevance using deep
contextual understanding.</p>
      <p>This architecture is designed to harness the complementary strengths of these diferent retrieval
methods.</p>
      <p>We evaluate our pipeline on the CheckThat! 2025 Subtask 4b dataset, achieving a Mean Reciprocal
Rank at 5 (MRR@5) of 76.46% on the development set (ranked 1st on the leaderboard) and 66.43% on
the test set (ranked 3rd on the leaderboard out of 31 teams), with only a 2 percentage points lower
score than the top-performing team. Importantly, we achieved this strong score without using
any external training data, metadata, external knowledge sources, or closed-source models,
making our approach broadly applicable and easily transferable to other domains and tasks. Overall,
our main contributions are:
1. A robust hybrid information retrieval (IR) architecture tailored for scientific source retrieval from
informal social media content;
2. Empirical evidence demonstrating the efectiveness of embedding fine-tuning and LLM-based
re-ranking in bridging informal-to-formal domain gaps;
3. A comprehensive experimental analysis, including ablations and a comparison to a commercial
baseline.</p>
      <p>By publishing this well-engineered pipeline, we aim to support eforts to counter misinformation and
ofer a practical, open-source blueprint for cross-domain document retrieval.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>Fact-Checking and Scientific Source Retrieval. Automated fact-checking critically depends on</title>
        <p>robust document retrieval methods to identify evidence that supports or refutes a given claim [13]. The
evolution of this field has progressed from early strategies utilizing structured knowledge bases and
curated news sources [14] to approaches that exploit unstructured, domain-specific corpora [ 15]. A
particularly challenging scenario involves retrieving scientific literature to verify claims originating
from social media, due to the frequent lexical and conceptual mismatch between informal language and
the academic writing style [16, 17, 18].</p>
        <p>
          Sparse vs. Dense Retrieval. Retrieval methods are commonly grouped into sparse and dense
approaches. Sparse approaches like BM25 [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ] rely on term overlap and excel with strong lexical
alignment, using probabilistic relevance frameworks with saturation parameters and document length
normalization for robust ranking. Conversely, dense retrieval uses neural networks to encode text into
vector representations, enabling semantic similarity matching through metrics such as cosine similarity
[19, 20]. Dense models are particularly advantageous in scenarios where claims are paraphrased or
loosely aligned with scientific language, as is often the case in user-generated content. Although
dense retrieval has historically required domain-specific fine-tuning [ 21, 22], recent foundation models
pre-trained on diverse corpora exhibit strong generalization [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], increasingly blurring the distinction
between general-purpose and domain-adapted retrieval.
        </p>
        <p>Hybrid Retrieval and LLM Re-Ranking. Hybrid retrieval frameworks can combine sparse and
dense retrieval by adding a subsequent re-ranking stage to merge their results and improve retrieval
quality. Neural re-rankers [23] have demonstrated substantial improvements in ranking accuracy across
multiple domains. Recently, large language models (LLMs) have been employed as cross-encoders,
jointly encoding claim–document pairs to capture nuanced semantic relationships [11, 12]. In this work,
we adopt such a hybrid retrieval architecture by combining sparse retrieval via BM25, dense neural
retrieval, and LLM-based re-ranking to leverage the diferent strengths of these retrieval methods.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Lexical Retrieval</title>
        <p>
          We use BM25 [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ] for sparse retrieval, and rank documents based on n-gram overlap and frequency
statistics. Lexical methods are particularly efective for matching query terms to titles and commonly
used scientific expressions.
        </p>
        <p>
          Pre-Processing. In contrast to the baseline BM25 provided by the challenge organizers [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], we apply
additional normalization steps to improve match quality. Our pipeline includes lowercasing, punctuation
removal, and subword tokenization using byte pair encoding (BPE) [24]. We chose subword tokenization
over lemmatization to maximize n-gram overlaps between informal query terms and formal document
vocabulary. Hashtags are removed, while symbols such as percentages (%) are preserved to maintain
scientific meaning. This design choice specifically addresses the domain gap between informal social
media queries and formal scientific language by increasing the chance of partial term matches. We
detail additional pre-processing experiments, which were excluded from the pipeline, in Appendix A.
Retrieval. At inference time, the BM25 retriever returns the top-30 documents ranked by relevance.
This candidate set provides strong lexical matches for downstream re-ranking and complements the
semantic retriever.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Semantic Retrieval</title>
        <p>
          To overcome vocabulary mismatches and paraphrasing issues, we implement dense retrieval based on
transformer-derived embeddings [25], capturing semantic similarity between queries and documents.
Embedding Model and Fine-tuning. We initialize our dense retriever with the
INF-Retrieverv1 model [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], a fine-tuned variant of gte-Qwen2-7B-instruct [26], chosen for its strong long-text
retrieval performance and open-source availability. We fine-tune it on the CLEF CheckThat! training
set using the multiple negatives ranking (MNR) loss [27], training it to assign higher similarity scores
to (social media post, document) pairs with known associations than to random negatives.
        </p>
        <p>Fine-tuning uses a maximum input length of 8, 192 tokens to avoid truncation. Queries and documents
are tokenized independently, embeddings use last-token pooling, and LoRA adapters [28] are applied to
the final eight transformer layers to reduce memory and training time [ 29]. We optimize with AdamW
[30] using cosine learning rate decay and gradient accumulation. Full details of the fine-tuning setup
are provided in Appendix B.</p>
        <p>
          Vector Store. We pre-compute document embeddings and store them in a FAISS index [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. The
embeddings are normalized using the L2 norm, allowing cosine similarity to be computed eficiently
via dot products. At inference time, the social media post is encoded into a dense vector using the same
model, and the top 100 most similar documents are retrieved. We avoid chunking abstracts, as empirical
results have shown that full-document retrieval performs better.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Re-Ranking</title>
        <p>While dense and sparse retrievers are computationally faster, the subsequent re-ranking process is
computationally intensive. Unlike embedding models, which independently embed each document
and query into vectors and compute similarity using a distance metric, the re-ranker processes each
query-document pair jointly to directly output a similarity score. The computational cost of these
pairwise comparisons limits re-ranking to small candidate subsets, making the initial retrieval stage
essential for filtering documents.</p>
        <p>Ranking. We evaluated various re-ranking models (see Appendix C) and selected
BAAI/bgereranker-v2-gemma [11, 12], an LLM-based cross encoder built on Gemma [31], as it performed
best. To balance cost and performance, we re-rank the top 100 dense and top 30 BM25 candidates,
favoring the former due to stronger individual performance (see Section 4). Empirically, increasing
the number of BM25 candidates beyond 30 did not improve re-ranking performance but substantially
increased computational cost, whereas increasing the dense candidates to 100 led to substantial gains.
After removing duplicates within the 130 candidates for each query, the re-ranker scores all remaining
candidates from scratch, and the top five results are returned as output.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The CLEF CheckThat! 2025 Subtask 4b evaluates systems using mean reciprocal rank at 5 (MRR@5),
which reflects how highly the correct source is ranked among the top five retrieved documents. Since
MRR@5 is sensitive to ranking order, we prioritize optimizing the lexical and semantic retrievers for
precision. Unlike MRR, Precision@ measures the proportion of relevant documents in the top- results
regardless of their order, ensuring that each retrieval stage yields high-quality candidate sets suitable
for downstream re-ranking.</p>
      <p>
        All experiments were conducted on the oficial datasets provided by the task organizers [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The
corpus includes 7, 718 documents. The development set comprises 1, 400 queries, and the test set
contains 1, 446 queries. Our complete system achieves an MRR@5 score of 76.46% on the development
set and 66.43% on the test set. Table 1 summarizes development and test set results across individual
and combined retrieval stages. We evaluate MRR@1 and MRR@5, along with Precision@30 and
Precision@1001. Although the absolute performance on the development set is generally higher than
on the test set by approximately 10 percentage points, the relative gains achieved through our methods,
such as pre-processing and fine-tuning, are consistent across both sets.
      </p>
      <p>Lexical Retrieval. Our BM25 retriever with additional normalization and subword tokenization
yields an 8.4-point gain in MRR@5 and a 10.3-point gain in Precision@30 over the oficial baseline on
the test set, similar to the improvements observed on the development set (7.0 points in MRR@5, 7.4
points in Precision@30). Our preprocessing reduces noise and increases n-gram overlap, leading to
better alignment between informal social media posts and formal scientific documents.
Semantic Retrieval. On the development set, employing INF-Retriever-v1 yields an absolute
improvement of 10.03 percentage points in MRR@5 over the BM25 baseline. Fine-tuning the retriever
further increases MRR@5 by 1.98 points, reaching a final score of 67.19%. In terms of Precision@100,
the base model achieves a 13.7-point gain compared to the BM25 baseline, with fine-tuning contributing
an additional 1.1-point improvement. These gains are similar on the test set: INF-Retriever-v1
improves MRR@5 by 11.4 points over the BM25 baseline, and fine-tuning adds a further 2.2-point gain,
1These metrics correspond to the best-performing configuration: BM25 returns the top 30 documents, and the semantic
retriever contributes the top 100.
culminating in an MRR@5 of 56.72%. Precision@100 follows a similar trend, with respective gains of
17.85 and 1.93 percentage points. These consistent improvements across both development and test
splits highlight the efectiveness and robustness of semantic retrieval, particularly when fine-tuning is
applied. We also experimented with data augmentation techniques, including HyDE-generated queries
and alternative document variants. However, these did not yield further gains. A discussion on data
augmentation is provided in Appendix D.</p>
      <p>Re-Ranking. Our complete pipeline with bge-reranker-v2-gemma as re-ranker achieves an MRR@5
of 76.46% on the development set, providing a +9.3 percentage points gain over our best individual
retrieval method. To isolate the efectiveness of our re-ranking approach, we compare it against a
hybrid baseline using Elasticsearch (see Appendix E for implementation details). Similar to our pipeline,
this baseline uses BM25 for keyword search and the fine-tuned embedding model for semantic search,
followed by reciprocal rank fusion (RRF) for re-ranking [32]. Although RRF provides a small boost over
standalone retrieval (+2.2 percentage points), it underperforms the cross-encoder by 7.1 percentage
points, highlighting the added value of learning-to-rank methods.</p>
      <p>On the test set, our pipeline achieves 66.43% MRR@5, with the re-ranker achieving similar
improvements of +9.7 percentage points over the best individual retrieval method, confirming that these gains
generalize across datasets.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusion</title>
      <p>In this paper, we presented a hybrid retrieval pipeline for attributing scientific sources to social media
claims. Our system combines BM25 retrieval, dense semantic search with a fine-tuned encoder, and
LLMbased cross-encoder re-ranking. Our results on subtask 4b of the CLEF CheckThat! 2025 competition
demonstrate the efectiveness of this architecture: We ranked 1st on the development set and 3rd on
the test set. Key findings include:
1. Hybrid retrieval is essential: Neither lexical nor semantic retrieval alone was suficient. BM25
reached MRR@5 of 51.47%; the fine-tuned semantic retriever achieved 56.72%. Applying a
cross-encoder to re-rank top candidates increased MRR@5 to 66.43% (a 23.3 percentage point
improvement over the baseline), confirming the benefit of hybrid retrieval followed by learned
ranking.
2. In-domain fine-tuning improves performance: Fine-tuning the dense retriever improved
MRR@5 by approx. +2 percentage points and led to a Precision@100 of 89.21%. While pre-trained
models perform well out of the box, domain adaptation further improves alignment between
informal queries and scientific abstracts.
3. Engineering matters: Achieving these results required substantial engineering and
experimentation eforts. We optimized hyperparameters, evaluated multiple data augmentation strategies
(Appendices A and D), and evaluated alternative re-ranking models (Appendix C).</p>
      <p>Despite the focus on CLEF’s benchmark task, the proposed architecture is designed with broader
applicability in mind. All components are modular and utilize open-source models, eliminating
reliance on commercial APIs, thereby enabling deployment on local infrastructure [29]. This ensures
compatibility with privacy-sensitive or ofline environments and facilitates customization.
Limitations and Future Work. The current pipeline does not incorporate document-level metadata,
such as author names, publication venues, or timestamps, which could improve retrieval precision
and disambiguation. In addition, we do not integrate external resources such as web search engines
or large-scale knowledge bases. Future work could explore metadata-aware retrieval and web-based
search strategies to further enhance retrieval performance.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We thank Prof. Dr. Simon Clematide and Andrianos Michail for guiding the research, engineering,
and writing process. We also thank the Department of Computational Linguistics at the University
of Zurich and the Centre for Artificial Intelligence at the Zurich University of Applied Sciences for
providing computational resources.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the creation of this work, the authors used ChatGPT2 to refine the pre-written text. Further, the
authors used Grammarly3 for spell checking. After using these tools, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.</p>
      <sec id="sec-7-1">
        <title>2https://chatgpt.com</title>
        <p>3https://www.grammarly.com
[11] C. Li, Z. Liu, S. Xiao, Y. Shao, Making Large Language Models A Better Foundation For Dense</p>
        <p>Retrieval, 2023. doi:10.48550/ARXIV.2312.15503, version Number: 1.
[12] J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, Z. Liu, M3-Embedding: Multi-Linguality,
MultiFunctionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation, in:
Findings of the Association for Computational Linguistics ACL 2024, Association for Computational
Linguistics, Bangkok, Thailand and virtual meeting, 2024, pp. 2318–2335. doi:10.18653/v1/2024.
findings-acl.137.
[13] J. Thorne, A. Vlachos, C. Christodoulopoulos, A. Mittal, FEVER: a Large-scale Dataset for Fact
Extraction and VERification, in: Proceedings of the 2018 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long
Papers), Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 809–819.
doi:10.18653/v1/N18-1074.
[14] A. Vlachos, S. Riedel, Fact Checking: Task definition and dataset construction, in: Proceedings of
the ACL 2014 Workshop on Language Technologies and Computational Social Science, Association
for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 18–22. doi:10.3115/v1/W14-2508.
[15] D. Wadden, S. Lin, K. Lo, L. L. Wang, M. Van Zuylen, A. Cohan, et al., Fact or Fiction: Verifying
Scientific Claims, in: Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp.
7534–7550. doi:10.18653/v1/2020.emnlp-main.609.
[16] D. Wadden, K. Lo, B. Kuehl, A. Cohan, I. Beltagy, L. L. Wang, et al., SciFact-Open: Towards
open-domain scientific claim verification, in: Findings of the Association for Computational
Linguistics: EMNLP 2022, Association for Computational Linguistics, Abu Dhabi, United Arab
Emirates, 2022, pp. 4719–4734. doi:10.18653/v1/2022.findings-emnlp.347.
[17] A. Barrón-Cedeño, F. Alam, A. Galassi, G. Da San Martino, P. Nakov, T. Elsayed, et al., Overview
of the CLEF–2023 CheckThat! Lab on Checkworthiness, Subjectivity, Political Bias, Factuality,
and Authority of News Articles and Their Source, in: Experimental IR Meets Multilinguality,
Multimodality, and Interaction, volume 14163, Springer Nature Switzerland, Cham, 2023, pp.
251–275. doi:10.1007/978-3-031-42448-9_20, series Title: Lecture Notes in Computer Science.
[18] A. Barrón-Cedeño, F. Alam, J. M. Struß, P. Nakov, T. Chakraborty, T. Elsayed, et al., Overview of
the CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities,
and Adversarial Robustness, in: Experimental IR Meets Multilinguality, Multimodality, and
Interaction, volume 14959, Springer Nature Switzerland, Cham, 2024, pp. 28–52. doi:10.1007/
978-3-031-71908-0_2, series Title: Lecture Notes in Computer Science.
[19] V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, et al., Dense Passage Retrieval for
Open-Domain Question Answering, in: Proceedings of the 2020 Conference on Empirical Methods
in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online,
2020, pp. 6769–6781. doi:10.18653/v1/2020.emnlp-main.550.
[20] G. Izacard, E. Grave, Leveraging Passage Retrieval with Generative Models for Open Domain
Question Answering, in: Proceedings of the 16th Conference of the European Chapter of the
Association for Computational Linguistics: Main Volume, Association for Computational Linguistics,
Online, 2021, pp. 874–880. doi:10.18653/v1/2021.eacl-main.74.
[21] J. Lee, M. Sung, J. Kang, D. Chen, Learning Dense Representations of Phrases at Scale, in:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the
11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers),
Association for Computational Linguistics, Online, 2021, pp. 6634–6647. doi:10.18653/v1/2021.
acl-long.518.
[22] J. Maillard, V. Karpukhin, F. Petroni, W.-t. Yih, B. Oguz, V. Stoyanov, et al., Multi-Task Retrieval
for Knowledge-Intensive Tasks, in: Proceedings of the 59th Annual Meeting of the Association
for Computational Linguistics and the 11th International Joint Conference on Natural Language
Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online, 2021, pp.
1098–1111. doi:10.18653/v1/2021.acl-long.89.
[23] R. Nogueira, K. Cho, Passage Re-ranking with BERT, 2019. doi:10.48550/ARXIV.1901.04085,
version Number: 5.
[24] R. Sennrich, B. Haddow, A. Birch, Neural Machine Translation of Rare Words with Subword Units,
in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, 2016, pp.
1715–1725. doi:10.18653/v1/P16-1162.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al.,
Attention is All you Need, in: Advances in Neural Information Processing Systems, volume 30,
Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/
3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[26] A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, C. Zhou, et al., Qwen2 Technical Report, 2024. doi:10.</p>
        <p>48550/arXiv.2407.10671, arXiv:2407.10671 [cs].
[27] M. Henderson, R. Al-Rfou, B. Strope, Y.-h. Sung, L. Lukacs, R. Guo, et al., Eficient Natural Language
Response Suggestion for Smart Reply, 2017. doi:10.48550/arXiv.1705.00652, arXiv:1705.00652
[cs].
[28] E. J. Hu, y. shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, et al., LoRA: Low-Rank Adaptation of
Large Language Models, in: International Conference on Learning Representations, 2022. URL:
https://openreview.net/forum?id=nZeVKeeFYf9.
[29] L. Tuggener, P. Sager, Y. Taoudi-Benchekroun, B. F. Grewe, T. Stadelmann, So you want your
private LLM at home? A survey and benchmark of methods for eficient GPTs, in: 2024 11th
IEEE Swiss Conference on Data Science (SDS), IEEE, Zurich, Switzerland, 2024, pp. 205–212.
doi:10.1109/SDS60720.2024.00036.
[30] I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, in: International Conference on</p>
        <p>Learning Representations, 2019. URL: https://openreview.net/forum?id=Bkg6RiCqY7.
[31] Gemma Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, et al., Gemma 2: Improving Open
Language Models at a Practical Size, 2024. URL: https://arxiv.org/abs/2408.00118. doi:10.48550/
ARXIV.2408.00118, version Number: 3.
[32] G. V. Cormack, C. L. A. Clarke, S. Buettcher, Reciprocal rank fusion outperforms condorcet and
individual rank learning methods, in: Proceedings of the 32nd international ACM SIGIR conference
on Research and development in information retrieval, ACM, Boston MA USA, 2009, pp. 758–759.
doi:10.1145/1571941.1572114.
[33] Gemma Team, A. Kamath, J. Ferret, S. Pathak, N. Vieillard, R. Merhej, S. Perrin, T. Matejovicova,
A. Ramé, M. Rivière, L. Rouillard, T. Mesnard, G. Cideron, J. bastien Grill, S. Ramos, E. Yvinec,
M. Casbon, E. Pot, I. Penchev, G. Liu, F. Visin, K. Kenealy, L. Beyer, X. Zhai, A. Tsitsulin, R.
BusaFekete, A. Feng, N. Sachdeva, B. Coleman, Y. Gao, B. Mustafa, I. Barr, E. Parisotto, D. Tian,
M. Eyal, C. Cherry, J.-T. Peter, D. Sinopalnikov, S. Bhupatiraju, R. Agarwal, M. Kazemi, D. Malkin,
R. Kumar, D. Vilar, I. Brusilovsky, J. Luo, A. Steiner, A. Friesen, A. Sharma, A. Sharma, A. M.
Gilady, A. Goedeckemeyer, A. Saade, A. Feng, A. Kolesnikov, A. Bendebury, A. Abdagic, A. Vadi,
A. György, A. S. Pinto, A. Das, A. Bapna, A. Miech, A. Yang, A. Paterson, A. Shenoy, A. Chakrabarti,
B. Piot, B. Wu, B. Shahriari, B. Petrini, C. Chen, C. L. Lan, C. A. Choquette-Choo, C. Carey,
C. Brick, D. Deutsch, D. Eisenbud, D. Cattle, D. Cheng, D. Paparas, D. S. Sreepathihalli, D. Reid,
D. Tran, D. Zelle, E. Noland, E. Huizenga, E. Kharitonov, F. Liu, G. Amirkhanyan, G. Cameron,
H. Hashemi, H. Klimczak-Plucińska, H. Singh, H. Mehta, H. T. Lehri, H. Hazimeh, I. Ballantyne,
I. Szpektor, I. Nardini, J. Pouget-Abadie, J. Chan, J. Stanton, J. Wieting, J. Lai, J. Orbay, J. Fernandez,
J. Newlan, J. yeong Ji, J. Singh, K. Black, K. Yu, K. Hui, K. Vodrahalli, K. Gref, L. Qiu, M. Valentine,
M. Coelho, M. Ritter, M. Hofman, M. Watson, M. Chaturvedi, M. Moynihan, M. Ma, N. Babar,
N. Noy, N. Byrd, N. Roy, N. Momchev, N. Chauhan, N. Sachdeva, O. Bunyan, P. Botarda, P. Caron,
P. K. Rubenstein, P. Culliton, P. Schmid, P. G. Sessa, P. Xu, P. Stanczyk, P. Tafti, R. Shivanna,
R. Wu, R. Pan, R. Rokni, R. Willoughby, R. Vallu, R. Mullins, S. Jerome, S. Smoot, S. Girgin, S. Iqbal,
S. Reddy, S. Sheth, S. Põder, S. Bhatnagar, S. R. Panyam, S. Eiger, S. Zhang, T. Liu, T. Yacovone,
T. Liechty, U. Kalra, U. Evci, V. Misra, V. Roseberry, V. Feinberg, V. Kolesnikov, W. Han, W. Kwon,
X. Chen, Y. Chow, Y. Zhu, Z. Wei, Z. Egyed, V. Cotruta, M. Giang, P. Kirk, A. Rao, K. Black, N. Babar,
J. Lo, E. Moreira, L. G. Martins, O. Sanseviero, L. Gonzalez, Z. Gleicher, T. Warkentin, V. Mirrokni,</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>A. Additional Experiments on Lexical Retrieval</title>
      <p>k=5
60.50
70.10
71.50
70.90
k=10</p>
      <p>To explore potential improvements in lexical retrieval, we experimented with two query reformulation
strategies using the Gemma3 12B language model [33]. These methods aim to reduce the linguistic
mismatch between informal social media posts and formal scientific abstracts.</p>
      <p>1. Query Rewriting: Reformulating social media posts to correct grammar and match the formal
language style of scientific abstracts while preserving the original query semantics (see Listing 1).
2. Query Expansion: Augmenting the original social media post with 2-3 contextually relevant
sentences to increase n-gram overlap with scientific abstracts (see Listing 2).</p>
      <p>Among the evaluated methods, query expansion yielded the highest performance, achieving a
Precision@20 of 80.6%, an improvement of 2.5 percentage points over BM25 with preprocessing. Query
rewriting also led to performance gains, with a Precision@20 of 79.6% (a 1.5 percentage point
improvement).</p>
      <p>However, both methods incur significant computational overhead due to the reliance on a large
language model. Specifically, inference time increased by approximately a factor of 60. Furthermore,
given that our pipeline includes a subsequent re-ranking stage, the marginal precision gains from these
query reformulations diminish in the final results of the entire pipeline. This unfavorable cost-benefit
trade-of renders these methods impractical for integration into the final pipeline, so we excluded them.</p>
      <p>Translate informal text into precise academic language, preserving
original meaning.</p>
      <p>Transformation Guidelines:
- Correct the original tweet's spelling and grammar errors while</p>
      <p>maintaining its style
- Convert colloquial language to precise academic terminology
- Convert hastags into proper words
- Do not add anything new. Only correct the mistakes in the original</p>
      <p>tweet.</p>
      <p>Output format:
Return a single string
Example:
Original Tweet: "Just saw amazin new study - mice w/ #Alzheimers showed
45% improvemnt in memory after new drug treatment!! Game changer for
#neurodegeneration research imo"
Output:
Just saw amazing new study - mice with Alzheimers showed 45% improvement
in memory after new drug treatment!! Game changer for
neurodegeneration research in my opinion
Transform the following tweet:
{tweet}</p>
      <sec id="sec-8-1">
        <title>Listing 1: Prompt template to rewrite a social media post.</title>
        <p>Translate informal text into precise academic language, according to the
transformation guidelines.</p>
        <p>Transformation Guidelines:
First, correct the original tweet's spelling and grammar errors while
maintaining its style. Then transform the tweet into academic language
using these rules:
-Convert colloquial language to precise academic terminology
-Maintain semantic accuracy of the original message
-Use passive voice and objective scientific tone
-Eliminate informal expressions and subjective qualifiers
-Transform hashtags into their full, proper form (e.g., "#COVID19" -&gt; "</p>
        <p>COVID-19 pandemic")
-Expand abbreviations and acronyms to their full forms
-Include key research terms that would appear in academic database
searches
-Preserve all factual claims, statistics, and findings mentioned
-Structure as a concise academic abstract (2-3 sentences)
Output format:
Return a single continuous string with both versions separated by " || "
as follows:
[Corrected Tweet] || [Academic Version]
Example:
Original Tweet: "Just saw amazin new study - mice w/ #Alzheimers showed
45% improvemnt in memory after new drug treatment!! Game changer for
#neurodegeneration research imo"
Output:
"Just saw amazing new study - mice with #Alzheimers showed 45%
improvement in memory after new drug treatment!! Game changer for
#neurodegeneration research in my opinion || A recent pharmacological
intervention demonstrated significant efficacy in an Alzheimer's
disease mouse model, with subjects exhibiting a 45% improvement in
memory function following administration of the novel compound.
These findings represent a potentially significant advancement in
neurodegenerative disease research, particularly regarding
therapeutic approaches for memory deficit amelioration in Alzheimer's
pathology."
Transform the following tweet:
{tweet}</p>
      </sec>
      <sec id="sec-8-2">
        <title>Listing 2: Prompt template to expand the social media post.</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>B. Embedding Model Fine-Tuning Details</title>
      <p>
        To fine-tune the semantic embedding model, we initialize from INF-Retriever-v1 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], a transformer
encoder pre-trained for dense retrieval tasks. Fine-tuning is performed by applying low-rank adaptation
(LoRA) [28] to the query and value projection layers of the self-attention modules in the top 8 transformer
layers (layers 20–27) using a rank  = 8 , scaling  = 32 , and dropout of 0.1. The inputs are tokenized
independently for queries (social media posts) and documents (title + abstract). The maximum sequence
length is 8, 192 tokens, allowing for the processing of social media posts and documents without
truncation.
      </p>
      <p>We use the multiple negatives ranking (MNR) loss [27]. Given a batch of  query–document pairs,
each query is trained to score highest on its corresponding document, while all other  − 1 documents
in the batch act as negatives. We extract embeddings using last-token pooling, which selects the hidden
state of the final token in each sequence. Embeddings are  2-normalized, and cosine similarity is
computed via dot product.</p>
      <p>We use AdamW [30] as optimizer with a learning rate of 1 × 10−5. We use 20 linear warmup steps
and then decay the learning rate to 0 using a cosine scheduler. We train on 2 A-100 GPUs using DDP
with a per-device batch size of 4 and 16 gradient accumulation steps (resulting in an efective batch size
of 64). We use gradient clipping with norm = 1.0 and use FP16 mixed precision. We evaluate retrieval
quality (i.e., run the vector store) on the development set after each epoch by measuring Precision@100.
The final model checkpoint is selected based on the best performance, which is obtained after epoch 2.</p>
    </sec>
    <sec id="sec-10">
      <title>C. Comparison of Re-Ranking Models</title>
      <p>Re-ranking Model
Semantic Retrieval
mxbai-rerank-large-v2
bge-reranker-large
bge-reranker-v2-gemma
bge-reranker-v2-minicpm [Layer 28]
bge-reranker-v2-minicpm [Layer 32]</p>
      <p>Traditional Cross-Encoders
LLM-based Cross-Encoders</p>
      <p>
        We conducted a comparative evaluation of various re-ranking models on the development set to
identify the most efective approach for our retrieval pipeline. The evaluated re-ranking models
include traditional cross-encoders (mxbai-rerank-large-v2 [
        <xref ref-type="bibr" rid="ref11">34</xref>
        ], bge-reranker-large [
        <xref ref-type="bibr" rid="ref12">35</xref>
        ]) and
LLMbased re-rankers (bge-reranker-v2-gemma [11, 12], bge-reranker-v2-minicpm [11, 12]), which use
pre-trained language models as base for relevance scoring. The bge-reranker-v2-minicpm model
supports layer-wise inference optimization, allowing computation to terminate at intermediate layers
rather than processing through the full network. We experimented with two diferent intermediate layer
configurations, terminating after layer 28 and layer 32. We selected layer 32 based on our preliminary
experiment with 100 samples across all available layers, which showed that layer 32 achieved the best
performance. Additionally, we included layer 28, as this is recommended by the oficial BGE re-ranker
repository. All re-rankers are evaluated on the development set using semantic retrieval candidates
as input. As shown in Table 3, LLM-based re-rankers outperformed traditional cross-encoders by a
considerable margin. This performance gap likely stems from LLMs’ extensive pre-training on diverse
text corpora, enabling them to comprehend both formal and informal language patterns. Between the
three LLM-based re-rankers, bge-reranker-v2-gemma achieved the best MRR@5 performance (76.03%
vs. 76.02% and 75.87 %). Although the margin is small, we selected BAAI/bge-reranker-v2-gemma as
our final model.
      </p>
    </sec>
    <sec id="sec-11">
      <title>D. Data Augmentation for Semantic Retrieval</title>
      <p>
        To enrich semantic retrieval, we experimented with two text augmentation strategies: hypothetical
document embeddings (HyDE) [
        <xref ref-type="bibr" rid="ref13">36</xref>
        ] and additional documents (AD). Both methods leverage the Llama
3.2 7B model [
        <xref ref-type="bibr" rid="ref14">37</xref>
        ] to generate auxiliary text representations.
      </p>
      <p>For HyDE, we prompted the model to generate a hypothetical scientific article (title and abstract)
based on a given social media post, aiming to bridge the domain gap between informal social media
language and formal scientific discourse (see Listing 3). For AD, we augmented the document corpus
by generating (1) a summary and (2) a synthetic social media post for each document. These variants
were stored alongside the original document in the vector index (Listings 4 and 5).</p>
      <p>You are an expert in scientific research. Based on the following tweet,
generate a hypothetical scientific paper that includes only a title and
an abstract. The abstract should succinctly summarize the research
objective, methodology, key findings, and conclusions.</p>
      <p>Listing 3: Prompt template to generate hypothetical document embeddings.</p>
      <p>Tweet: {tweet}
{format_instructions}
Summarize the following document:
Title: {title}
Abstract: {page_content}
Make sure to include keywords that are likely to be found later by a
search.
{format_instructions}</p>
      <sec id="sec-11-1">
        <title>Listing 4: Summary Prompt Template for AD</title>
        <p>Generate a hypothetical Twitter tweet about the following document:
Title: {title}
Abstract: {page_content}
{format_instructions}
Make sure it looks like a typical tweet from an average person and is
not too long.</p>
      </sec>
      <sec id="sec-11-2">
        <title>Listing 5: Tweet Prompt template to generate additional documents.</title>
        <p>The results on the development set are displayed Table 4. As discussed in Section 4, our primary
objective for semantic retrieval is to ensure high precision, providing strong candidates for downstream
re-ranking. We find that augmentation strategies ofer modest improvements for of-the-shelf models
but yield limited or no benefit when applied to the fine-tuned retriever. We hypothesize that the limited
benefit observed from these augmentation methods stems from the fine-tuned model’s already high
semantic fidelity, which reduces the marginal gains achievable through additional data augmentation.
Therefore, these methods were excluded from the final pipeline.
Approach
INF-Retriever-v1
+ HyDE
+ AD
+ HyDE + AD
INF-Retriever-v1 + Fine-tuning
+ HyDE
+ AD
+ HyDE + AD
k=1</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>E. Hybrid Search using Elasticsearch</title>
      <p>In addition to our main retrieval pipeline, we explored a fully integrated alternative using Elasticsearch4.
This system unifies indexing, retrieval, and ranking into a single framework, while still capturing both
lexical and semantic signals.</p>
      <p>We build an Elasticsearch pipeline closely mirroring our original architecture: it incorporates (1)
a BM25 retriever with fuzzy matching, (2) a  -nearest neighbor (kNN) semantic retriever using our
ifne-tuned embedding model (configured with  = 50 and 200 candidates), and (3) a fusion stage based
on reciprocal rank fusion (RRF) to combine results [32]. The RRF configuration uses a window size of
100 and a rank constant of 20, allowing it to integrate signals from both retrieval branches eficiently.
Unlike our main system, which uses a cross-encoder for deep re-ranking, the Elasticsearch pipeline
relies on this lightweight re-scoring mechanism.
ES
ES + HyDE
ES + AD
ES + AD + HyDE
Performance. Similar to the evaluation of our custom pipeline described in Appendix D, we evaluate
diferent variants of this Elasticsearch—based method, leveraging raw and extended documents, as well
as with and without query expansion. Table 5 presents the results obtained on the development set.</p>
      <p>The best Elasticsearch configuration (ES + AD) achieves MRR@5 of 69.35%, slightly superior to
our custom pipeline’s semantic retriever. However, it lags behind the full system with cross-encoder
re-ranking (MRR@5 = 76.46%). This highlights the benefit of contextual re-scoring for fine-grained
relevance. Nonetheless, the Elasticsearch-based approach remains a viable, scalable option for
latencysensitive applications.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brüggemann</surname>
          </string-name>
          , I. Lörcher,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walter</surname>
          </string-name>
          ,
          <article-title>Post-normal science communication: Exploring the blurring boundaries of science and journalism</article-title>
          ,
          <source>Journal of Science Communication</source>
          <volume>19</volume>
          (
          <year>2020</year>
          )
          <article-title>A02</article-title>
          . doi:
          <volume>10</volume>
          . 22323/2.19030202.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dalvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sajjad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          , et al.,
          <source>Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists</source>
          , Fact-Checkers,
          <article-title>Social Media Platforms, Policy Makers, and the Society, in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics</article-title>
          , Punta Cana, Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>611</fpage>
          -
          <lpage>649</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .findings- emnlp.56.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. V.</given-names>
            ,
            <surname>The</surname>
          </string-name>
          <string-name>
            <given-names>CLEF</given-names>
            -2025 CheckThat! Lab: Subjectivity,
            <surname>Fact-Checking</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Claim</given-names>
            <surname>Normalization</surname>
          </string-name>
          , and Retrieval, in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>467</fpage>
          -
          <lpage>478</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Venktesh</surname>
          </string-name>
          , Overview of the CLEF-2025 CheckThat! Lab: Subjectivity,
          <string-name>
            <surname>Fact-Checking</surname>
            ,
            <given-names>Claim</given-names>
          </string-name>
          <string-name>
            <surname>Normalization</surname>
          </string-name>
          , and Retrieval, in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Boland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bringay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! Lab Task 4 on Scientific Web Discourse</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.), Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2025</year>
          , Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Relevance weighting of search terms</article-title>
          ,
          <source>Journal of the American Society for Information Science</source>
          <volume>27</volume>
          (
          <year>1976</year>
          )
          <fpage>129</fpage>
          -
          <lpage>146</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.4630270302.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          ,
          <article-title>The Probabilistic Relevance Framework: BM25 and Beyond</article-title>
          ,
          <source>Foundations and Trends® in Information Retrieval</source>
          <volume>3</volume>
          (
          <year>2009</year>
          )
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          . doi:
          <volume>10</volume>
          .1561/1500000019.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , M. Douze,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jegou</surname>
          </string-name>
          ,
          <article-title>Billion-Scale Similarity Search with GPUs</article-title>
          ,
          <source>IEEE Transactions on Big Data</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <fpage>535</fpage>
          -
          <lpage>547</lpage>
          . doi:
          <volume>10</volume>
          .1109/TBDATA.
          <year>2019</year>
          .
          <volume>2921572</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Douze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guzhva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Deng</surname>
          </string-name>
          , J. Johnson, G. Szilvasy,
          <string-name>
            <given-names>P.-E.</given-names>
            <surname>Mazaré</surname>
          </string-name>
          , et al.,
          <source>The Faiss library</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/ARXIV.2401.08281, version Number:
          <volume>3</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          , et al.,
          <article-title>inf-retriever-</article-title>
          <string-name>
            <surname>v1</surname>
          </string-name>
          (
          <year>2025</year>
          ). URL: https://huggingface.co/ infly/inf-retriever-v1.
          <source>doi:10</source>
          .57967/HF/4262. E.
          <string-name>
            <surname>Senter</surname>
            , E. Collins,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Barral</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Matias</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Sculley</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Petrov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Fiedel</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hassabis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Farabet</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Buchatskaya</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Alayrac</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Anil</surname>
            , Dmitry, Lepikhin,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Borgeaud</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Bachem</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Andreev</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hardin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Dadashi</surname>
          </string-name>
          , L. Hussenot,
          <source>Gemma 3 technical report</source>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2503.19786.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shakir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Koenig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>Boost Your Search With The Crispy Mixedbread Rerank Models</surname>
          </string-name>
          ,
          <year>2024</year>
          . URL: https://www.mixedbread.ai/blog/mxbai-rerank-v1.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Muennighof</surname>
          </string-name>
          , C-pack:
          <article-title>Packaged resources to advance general chinese embedding</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>07597</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <article-title>Precise Zero-Shot Dense Retrieval without Relevance Labels, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>1762</fpage>
          -
          <lpage>1777</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .acl- long.99.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grattafiori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jauhri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Dahle</surname>
          </string-name>
          , et al.,
          <source>The Llama 3 Herd of Models</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/ARXIV.2407.21783, version Number:
          <volume>3</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>