<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CheckThat! Lab which focuses on
technology that helps automate the journalistic verification process [</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>DS@GT at CheckThat! 2025: Exploring Retrieval and Reranking Pipelines for Scientific Claim Source Retrieval on Social Media Discourse</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jeanette Schofield</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shuyu Tian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hoang Thanh Thanh Truong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilian Heil</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgia Institute of Technology</institution>
          ,
          <addr-line>North Ave NW, Atlanta, GA 30332</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>3</volume>
      <issue>4</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Social media users often make scientific claims without citing where these claims come from, generating a need to verify these claims. This paper details work done by the DS@GT team for CLEF 2025 CheckThat! Lab Task 4b Scientific Claim Source Retrieval which seeks to find relevant scientific papers based on implicit references in tweets. Our team explored 6 diferent data augmentation techniques, 7 diferent retrieval and reranking pipelines, and finetuned a bi-encoder. Achieving an MRR@5 of 0.58, our team ranked 16th out of 30 teams for the CLEF 2025 CheckThat! Lab Task 4b, and improvement of 0.15 over the BM25 baseline of 0.43. Our code is available on Github at https://github.com/dsgt-arc/checkthat-2025-swd/tree/main/subtask-4b.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;cite-worthiness</kwd>
        <kwd>science-related discourse</kwd>
        <kwd>social media</kwd>
        <kwd>data augmentation</kwd>
        <kwd>retrieval and reranking</kwd>
        <kwd>CEUR-WS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Before a source can be retrieved to support a claim, one first needs to determine if a claim is being
made. Cite-worthiness refers to the problem of determining if there are missing references to scientific
results in text. Designed for cite-worthiness detection in scientific text, the CiteWorth dataset [ 8]
contains 1.1M English-language sentences, with 375K sentences designed as cite-worthy. Models
trained using the CiteWorth dataset were found to perform poorly when tasks with scientific citations in
social media discourse [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Consisting of 1,261 tweets and a subset of the SciTweets dataset, SCiteTweets
is dataset created for detecting citations in social media discourse [9].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>Task 4b organizers provided three datasets: two sets of query tweets, one for training and one for
development, and the CORD-19 paper collection. The training (train) dataset contains 12,853 tweets,
while the development (dev) set includes 1,400 tweets. Both datasets share the same data structure with
the following three columns:
1. post_id - A unique identifier for the tweet
2. tweet_text - The textual content of the tweet
3. cord_uid - A unique identifier corresponding to a paper in the CORD-19 collection
The CORD-19 paper collection consists of 7,718 academic papers with 17 columns. Some of the key
columns are:
1. cord_uid - A unique identifier for the paper for linking with tweet queries in the train and dev set
2. title - The title of the paper
3. abstract - The abstract of the paper
4. authors - A list of the paper’s authors
5. journal - The journal where the paper was published
6. publish_time - The publication date
Word clouds in Figures 1 and 2 show that words such as "patients", "COVID-19", "infection", and "risk"
frequently appear in the set of query tweets and the CORD-19 paper collection. Words such as "study"
and "vaccine" appear often in the tweets dataset, but are not as prevalent in paper abstracts. Terms like
"disease" and "pandemic" appear more often in the CORD-19 paper abstracts than in the query tweets.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>For this task, we explored multiple retrieval and reranking pipelines, experimenting with BM25 and
bi-encoders for retrieval and exploring 7 diferent models for reranking. Multiple data augmentation
experiments were performed during the retrieval stage to see if modifying the tweet data would improve
results.</p>
      <sec id="sec-4-1">
        <title>4.1. Mean Reciprocal Rank (MRR)</title>
        <p>The Mean Reciprocal Rank at k (MRR@k) was the oficial metric for the CheckThat! 2025 Task 4b [ 3].
For a set of N queries, the MRR is calculated as:</p>
        <p>where  is the position of the first correct and relevant document in the top-k retrieved list for
the i-th query. If the relevant document does not appear within the top-k results, the reciprocal rank
for that query is treated as zero.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Retrieval and Reranking Pipeline: First Attempt</title>
        <p>Our first attempt at a retrieval and reranking pipeline used BM25 for retrieval, building on the organizers’
baseline. BM25, or Best Match 25, ranks documents based on how many query terms appear in each
document [10]. BM25 is often used to return a subset of documents from a large corpus due to it’s
speed. After the initial subset of documents are retrieved, more computationally expensive reranking
algorithms are applied to improve upon the initial results. In our first pipeline, a cross-encoder trained
on the MS Marco dataset (cross-encoder/ms-marco-MiniLM-L6-v2) was used for reranking [11, 12].</p>
        <p>In all pipelines created, 100 documents were retrieved. Prior to retrieval, tweets and documents were
converted to lowercase, tokenization was applied by splitting sentences up into words (i.e. on spaces),
and stop words were removed. Data from the tweet_text column was used for the query set. For the
CORD-19 paper collecition, we combined the title of the paper with the abstract.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Exploring Data Augmentation</title>
        <p>After setting up an initial retrieval and reranking pipeline, data augmentation techniques were explored.
For Task 4b, the goal was to find the journal article that best matched a user tweet from Twitter.
Language used in social media discourse tends to be very diferent than that used in academic writing.
Social media discourse often takes a more informal tone whereas academic writing uses formal language.</p>
        <p>With this in mind, we explored various means of augmenting the tweet_text in the query dataset:
rewriting tweets using formal (i.e. scientific) language, concatenating the original tweet text with the
rewritten tweets, and replacing the original tweet with science-related keywords used in the tweet.</p>
        <p>Two experiments replaced the original tweet text with rewritten text. The first experiment, "Replace
w/ Formal Rewritten", used the prompt "Rewrite the input using formal language" to generate the
rewritten tweets. Upon noticing that some of the tweets in the query set were not written in English, a
second experiment was performed. "Replace w/ English Formal Rewritten" used the prompt "Rewrite
the input using formal English language."</p>
        <p>Three data augmentation experiments involved combining the original tweet text with the rewritten
text. "Concat w/ Formal" combined the original tweet with tweets were rewritten using "formal
language" prompt. "Concat w/ English formal" combined the original tweet with tweets rewritten using
the "formal English language" prompt. "Concat w/ All" combined the original tweet, the tweet rewritten
using "formal language", and the tweet rewritten using "formal English language."</p>
        <p>The experiment "Replace w/ Keywords" used the prompt "Return a list of only science-related
keywords in the tweet." The motivation was to see whether or not our pipeline would improve by using
only science-related keywords to retrieve relevant documents.</p>
        <p>Table 1 provides an example of the augmented data for each experiment.</p>
        <p>In all data augmentation experiments, BM25 was used for retrieval and
cross-encoder/ms-marcoMiniLM-L6-v2 was used for reranking. gpt-4o was used in all experiments to rewrite tweet data.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Experimenting with Reranking Models</title>
        <p>After our data augmentation techniques, we changed the retrieval stage to use BM25-PyTorch instead
of BM25. This was done to make use of GPU and run the pipeline faster. With BM25-PyTorch as our
retrieval model, we explored 7 diferent reranking models to see how they performed.</p>
        <p>Three of these reranking models were trained on the MS Marco dataset:
cross-encoder/ms-marcoMiniLM-L-6-v2, tomaarsen/reranker-msmarco-ModernBERT-base-lambdaloss, and
tomaarsen/rerankermsmarco-MiniLM-L12-H384-uncased-lambdaloss [11, 12, 13, 14]. MS Marco is a widely used question
answer dataset featuring 100,000 Bing questions with huamn generated answers.</p>
        <p>In an efort to explore models trained on diferent datasets, we tested our pipeline on 3
models trained on the GooAQ dataset which contains 5 million Google queries and 3 million answers:
akr2002/reranker-ModernBERT-base-gooaq-bce, tomaarsen/reranker-ModernBERT-large-gooaq-bce,
and tomaarsen/reranker-NeoBERT-gooaq-bce [15, 16, 17, 18].</p>
        <p>Our last experiment for exploring retrieval and reranking pipelines used Google’s T5 (Text-to-Text
Transfer Transformer) model which is trained on the C4 dataset which was developed by Google and
Meta via scraping the web [19, 20]. We used AnswerDotAI’s Reranker library to implement this model
[21, 22].</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Using Bi-encoders for Retrieval</title>
        <p>In this section, we focus on the bi-encoder and cross-encoder approaches using
bi-encoder/msmarcodistilbert-base-v4 and cross-encoder/ms-marco-MiniLM-L-6-v2, respectively [23, 12]. Both are provided
by the Sentence-Transformers framework [24]. We aim to leverage the speed of bi-encoders to narrow
down a smaller pool of document candidates and then use the precision of cross-encoders to enhance
document retrieval accuracy (see Figure 3).</p>
        <p>In the first stage, the pipeline loads a data set of query tweets and a corpus of scientific papers
comprising titles and abstracts. These inputs are processed by the bi-encoder model
(msmarco-distilbertbase-v4) to generate dense vector embeddings. Then, we perform a semantic search to retrieve the top
100 most similar documents for each tweet based on cosine similarity.</p>
        <p>Since the cross-encoder is computationally expensive, this initial retrieval step helps reduce the
relevant scientific documents to save resources. In the second stage, these top 100 documents are
reranked using the cross-encoder (ms-marco-MiniLM-L-6-v2) to improve retrieval accuracy.</p>
        <p>We evaluate both stages using MRR@k to allow a direct comparison between the initial retrieval and
reranking performance.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Bi-encoder Finetuning</title>
        <p>Building on the strong performance of the bi-encoder and cross-encoder pipeline, we finetuned the
bi-encoder to further improve retrieval quality.</p>
        <p>To expand the training data, we sample sentence-pairs to generate a silver dataset using contextual
text augmentation, following the strategy proposed in Augmented SBERT [25]. We load tweets paired
with its corresponding scientific document, which includes the document’s title and abstract.</p>
        <p>Next, we process the input data using a BERT-based augmenter (ContextualWordEmbsAug with
bert-base-uncased) to insert contextually appropriate words. These augmented tweet–document pairs
are saved as silver examples to diversify data and improve the generalization of the model.</p>
        <p>After augmenting data, we train our bi-encoder (msmarco-distilbert-base-v4) through two phases.
We firstly train only on the train data set and then train on the silver dataset, measured via MRR@100
on the dev set.</p>
        <p>After data augmentation, we finetune our bi-encoder model (msmarco-distilbert-base-v4) in two
phases. In the first phase, we train the model exclusively on the original training dataset. In the second
phase, we continue training on the silver dataset generated through augmentation.</p>
        <p>Both models are trained and evaluated using the same procedure. We employ the
SentenceTransformerTrainer interface with a contrastive cosine similarity loss, and evaluate model performance using
Mean Reciprocal Rank at 100 (MRR@100) on the dev set. The best models with highest MRR@100 are
saved at each phase.</p>
        <p>Together, this two-phase training strategy enables the model to first learn from gold-labeled data and
then improve generalization through silver training examples generated via contextual augmentation.
This progressive finetuning pipeline is designed to enhance retrieval performance while maintaining
robustness on unseen tweet–document pairs.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>For the final submission for Task 4b, our team used BM25-Pytorch for retrieval and T5 for reranking.
No data augmentation was applied. This pipeline achieved an MRR@5 of 0.58 which improved over the
baseline of 0.43 by 0.15. Our team ranked 16 out of 20 teams. The leaders, SourceSnifers, achieved an
MRR@5 of 0.68.</p>
      <sec id="sec-5-1">
        <title>5.1. Data Augmentation</title>
        <p>As shown in table 2, data augmentation experiments that replaced the text of the original tweet
performed worse than the baseline on reranking tasks, with the MRR@5 decreasing by 0.06 or more.
Experiments that combined the original tweet with additional data showed a slight improvement over
the baseline with an increase in MRR@5 of roughly 0.03 after retrieval. After reranking, the experiment
"Concat w/ Formal" and "Concat w/ All" improved over the baseline by 0.01.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Reranking Models</title>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Bi-encoder Finetuning</title>
        <p>Results for bi-encoder finetuning can be found in table 4. Without finetuning, the MRR@5 using a
bi-encoder for retrieval was 0.428. Unfortunately, all 3 finetuning experiments performed worse than
the bi-encoder without any finetuning. The gold dataset with hard negatives from silver data achieved
an MRR@5 of 0.407 which is 0.011 less than the regular bi-encoder.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <sec id="sec-6-1">
        <title>6.1. Data Augmentation</title>
        <p>For data augmentation, all 3 experiments that replaced the original tweet text with rewritten text caused
the MRR@5 to decrease. This suggests that the original tweets contained context that was lost by
rewriting the tweet. This appears to be especially true for the "Replace w/ Keywords" experiment that
removed all words from the tweet that were not related to science in some way.</p>
        <p>The increase in the MRR@5 after reranking and retrieval for experiments where the original tweet
was combined with the rewritten tweet suggests that multiple representations of the same tweet may
improve performance.</p>
        <p>Finally, it is worth noting that our data augmentation experiments were only applied to the query
dataset. The data for the CORD-19 paper collection as not manipulated in any way. For some experiments,
this probably afected results. For example, the "Replace w/ Keywords" likely would have performed
better if the prompt "Return a list of only science-related keywords in the tweet" was applied to both
the query and CORD-19 paper collection.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Reranking Models</title>
        <p>As noted in the results section, 3 the models trained on the MS Marco dataset performed significantly
better than those trained on the GooAQ dataset. The MS Marco dataset was generated from Bing
queries with human-generated answers. The GooAQ dataset was curated using queries from Google
with answers drawn from Google’s responses to questions. Both datasets are generated from search
queries, but the way the answers were generated difers greatly. One possible explanation is that
the human-generated answers to the MS Marco dataset more closely reflect the language used in the
abstracts for the CORD-19 paper collection. Another explanation might stem from diferences in how
users ask questions on Bing and Google. Perhaps users on Bing tend to be more verbose or search for
diferent criteria than Google users.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Bi-encoder Finetuning</title>
        <p>ifnetuning the bi-encoder decreased retrieval performance on all experiments. However, the "Gold
dataset + hard negatives from silver data" was very close to the baseline, suggesting that there is the
potential for improvement with further exploration.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Future Work</title>
      <p>Task 4b leaders SourceSnifers achieved an MRR@5 of 0.68. The problem of finding relevant scientific
claims for social media discourse is still unsolved. Future work should prioritize improving upon the
work CheckThat! Task 4b participants explored this year.</p>
      <sec id="sec-7-1">
        <title>7.1. Data Augmentation</title>
        <p>As noted earlier, data augmentation techniques were only applied to query tweets. To see if data
augmentation yields helpful results, these techniques should be applied to both the query dataset and
the CORD-19 paper collection.</p>
        <p>For the CORD-19 paper collection, it would be interesting to see if summarizing abstracts afects
model performance. The abstracts in the CORD-19 paper collection were anywhere from 2 words to
1800 words. If social media users are more likely to reference mains ideas from the papers in their
tweets, summarizing abstracts might lead to improved retrieval results.</p>
        <p>In our experiments, only the title and abstract from the CORD-19 paper collection were used to
when looking for relevant articles. It would be interesting to see if including additional features like the
journal that the article appeared in and the names of authors would improve performance.</p>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Bi-encoder Finetuning</title>
        <p>The bi-encoder performed worse after finetuning, suggesting that the silver dataset may introduce noise
that hinders the model’s ability to retrieve relevant documents. Future work should focus on filtering
the silver data before finetuning. We could explore a pre-trained cross-encoder to score the augmented
tweet–document pairs and retain only high-confidence examples. The results highlight that more
data may not lead to better performance, especially when additional examples are not meaningfully
contributed to the learning signal.</p>
      </sec>
      <sec id="sec-7-3">
        <title>7.3. Other Avenues to Explore</title>
        <p>If retrieval results were poor, reranking could only do so much. Future work should prioritize techniques
that improve retrieval. One potential avenue for this is to combine multiple approaches: augmenting
data, finetuning a bi-encoder for retrieval, and finding the best model for reranking.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>This paper demonstrates the work performed by DS@GT for CheckThat! 2025 Task 4b, Scientific
Claim Source Retrieval. We explored data augmentation techniques, reranking and retrieval pipelines,
and finetuning bi-encoders. We found that combining original tweets with rewritten tweets that
reflect the language of the document collection may lead to an improvement in retrieval. Using
BM25-Pytorch for retrieval and T5 for reranking, we achieved an MRR@5 of 0.58 on the evaluation
set, and improvement of 0.15 over the baseline of 0.43. Future work should prioritize improving
the retrieval stage of the retrieval and reranking pipeline. Our code is available on Github at https:
//github.com/dsgt-arc/checkthat-2025-swd/tree/main/subtask-4b</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>Thank you to the everyone in the DS@GT CLEF team for their support. Special thanks to Anthony
Miyaguchi and Murilo Gustenelli for their many hours of work organizing and supporting the DS@GT
CLEF team. This paper would not have happened without you!</p>
      <p>Thank you to Partnership for an Advanced Computing Environment (PACE) [26] at the Georgia
Institute of Technology, Atlanta, Georgia, USA for allowing us to use their resources to perform this
research.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>Generative AI was used in the preparation of this work, specifically to format bibliography references
and for formatting tables and figures. After using these tools, the authors reviewed and edited the
content as needed and take full responsibility for the publication’s content.
[3] S. Hafid, Y. S. Kartal, S. Schellhammer, K. Boland, D. Dimitrov, S. Bringay, K. Todorov, S. Dietze,</p>
      <p>Overview of the CLEF-2025 CheckThat! lab task 4 on scientific web discourse, in: [4], 2025.
[4] G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 - Conference and Labs
of the Evaluation Forum, CLEF 2025, Madrid, Spain, 2025.
[5] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, R. Míguez, T. Caselli, M. Kutlu, W.
Zaghouani, C. Li, S. Shaar, H. Mubarak, A. Nikolov, Y. S. Kartal, Overview of the CLEF-2022
CheckThat! lab task 1 on identifying relevant claims in tweets, in: [6], 2022.
[6] G. Faggioli, N. Ferro, A. Hanburg, M. Potthast (Eds.), Working Notes of CLEF 2022 - Conference
and Labs of the Evaluation Forum, CLEF 2022, Bologna, Italy, 2022.
[7] G. Faggioli, N. Ferro, P. Galuščáková, A. G. Seco de Herrera (Eds.), Working Notes of CLEF 2024</p>
      <p>Conference and Labs of the Evaluation Forum, CLEF 2024, Grenoble, France, 2024.
[8] D. Wright, I. Augenstein, Citeworth: Cite-worthiness detection for improved scientific document
understanding (2021).
[9] S. Hafid, S. Schellhammer, S. Bringay, K. Todorov, S. Dietze, Scitweets - a dataset and annotation
framework for detecting scientific online discourse, in: Proceedings of the 31st ACM International
Conference on Information &amp; Knowledge Management, CIKM ’22, Association for Computing
Machinery, New York, NY, USA, 2022, p. 3988–3992. URL: https://doi.org/10.1145/3511808.3557693.
doi:10.1145/3511808.3557693.
[10] Wikipedia contributors, Okapi bm25 – Wikipedia, the free encyclopedia, 2025. URL: https://en.</p>
      <p>wikipedia.org/wiki/Okapi_BM25, accessed: 2025-07-07.
[11] T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, L. Deng, Ms marco: A
human generated machine reading comprehension dataset, CoRR abs/1611.09268 (2016). URL:
http://arxiv.org/abs/1611.09268.
[12] C. T. Hugging Face, cross-encoder/ms-marco-MiniLM-L6-v2, https://huggingface.co/
cross-encoder/ms-marco-MiniLM-L6-v2, 2025. Model card; accessed: 2025-07-07.
[13] T. Arsen, the Hugging Face Community, reranker-msmarco-modernbert-base-lambdaloss,
Hugging Face model hub, 2025. URL: https://huggingface.co/tomaarsen/
reranker-msmarco-ModernBERT-base-lambdaloss.
[14] T. Arsen, the Hugging Face Community,
reranker-msmarco-minilm-l12-h384-uncasedlambdaloss, Hugging Face model hub, 2025. URL: https://huggingface.co/tomaarsen/
reranker-msmarco-MiniLM-L12-H384-uncased-lambdaloss.
[15] D. Khashabi, A. Ng, T. Khot, A. Sabharwal, H. Hajishirzi, C. Callison-Burch, Gooaq: Open question
answering with diverse answer types, CoRR abs/2104.08727 (2021). URL: https://arxiv.org/abs/
2104.08727. arXiv:2104.08727.
[16] akr2002, the Hugging Face Community, reranker-modernbert-base-gooaq-bce, Hugging Face
model hub, 2025. URL: https://huggingface.co/akr2002/reranker-ModernBERT-base-gooaq-bce.
[17] T. Arsen, the Hugging Face Community, reranker-modernbert-large-gooaq-bce, Hugging Face
model hub, 2025. URL: https://huggingface.co/tomaarsen/reranker-ModernBERT-large-gooaq-bce.
[18] T. Arsen, the Hugging Face Community, reranker-neobert-gooaq-bce, Hugging Face model hub,
2025. URL: https://huggingface.co/tomaarsen/reranker-NeoBERT-gooaq-bce.
[19] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring
the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning
Research 21 (2020) 1–67. URL: http://jmlr.org/papers/v21/20-074.html.
[20] AIAAIC Repository, C4 (colossal clean crawled corpus) dataset, AI, Algorithmic &amp;
Automation Incidents and Issues (AIAAIC), 2023. URL: https://www.aiaaic.org/aiaaic-repository/
ai-algorithmic-and-automation-incidents/c4-dataset, last updated Oct 2024.
[21] B. Clavié, the AnswerDotAI team, rerankers: A lightweight python library to unify ranking
methods, GitHub repository, 2024. URL: https://github.com/AnswerDotAI/rerankers, version 0.6.0.
[22] B. Clavié, rerankers: A lightweight python library to unify ranking methods, arXiv preprint
abs/2408.17344 (2024). URL: https://arxiv.org/abs/2408.17344, arXiv:2408.17344.
[23] Sentence-Transformers Team, msmarco-distilbert-base-v4, Hugging Face model hub, 2025. URL:
https://huggingface.co/sentence-transformers/msmarco-distilbert-base-v4.
[24] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing,
Association for Computational Linguistics, 2019. URL: http://arxiv.org/abs/1908.10084.
[25] N. Thakur, N. Reimers, J. Daxenberger, I. Gurevych, Augmented sbert: Data augmentation method
for improving bi-encoders for pairwise sentence scoring tasks, arXiv preprint arXiv:2010.08240
(2020). URL: https://arxiv.org/abs/2010.08240.
[26] PACE, Partnership for an Advanced Computing Environment (PACE), 2017. URL: http://www.
pace.gatech.edu.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kbaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>McJury</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kenny</surname>
          </string-name>
          ,
          <article-title>Prevalence of health misinformation on social media-challenges and mitigation before, during, and beyond the covid-19 pandemic: Scoping literature review</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          <volume>26</volume>
          (
          <year>2024</year>
          )
          <article-title>e38786</article-title>
          . URL: https://www.jmir.org/
          <year>2024</year>
          /1/e38786. doi:
          <volume>10</volume>
          .2196/38786.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ammar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bringay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <article-title>Cite-worthiness detection on social media: A preliminary study</article-title>
          ,
          <source>Natural Scientific Language Processing and Research Knowledge Graphs 19-30</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>