<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ClimateSense at CheckThat! 2025: Combining Fine-tuned Large Language Models and Conventional Machine Learning Models for Subjectivity and Scientific Web Discourse Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Grégoire Burel</string-name>
          <email>gregoire.burel@open.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Lisena</string-name>
          <email>pasquale.lisena@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrico Daga</string-name>
          <email>enrico.daga@open.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphael Troncy</string-name>
          <email>raphael.troncy@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harith Alani</string-name>
          <email>harith.alani@open.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Large Language Models, Scientific Web Discourse, Social Media, Subjectivity</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EURECOM</institution>
          ,
          <addr-line>Sophia Antipolis</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Knowledge Media Institute, The Open University</institution>
          ,
          <addr-line>Milton Keynes</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>These working notes present the ClimateSense team participation in the CheckThat! 2025 Lab Challenge for tasks 1 and 4a that investigated: 1) the subjectivity of news article sentences, and; 2) the detection of scientific content in social media posts. Pre-trained Large Language Models (LLMs), conventional Machine Learning (ML) models, sentence encoders, data augmentation, and filtering techniques were leveraged by the ClimateSense team to investigate these tasks. In this paper, we detail the approaches for each task, present the methodology, and report on the performance of each submission. The fine-tuning of pre-trained models shows particularly strong results for Task 4a, where we achieved the first rank on the final evaluation leaderboard. This result shows that LLMs can benefit from lightweight traditional classification models when performing classification tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1. Monolingual: Train and test on data in a given language;</title>
      </sec>
      <sec id="sec-1-2">
        <title>2. Multilingual: Train and test on data comprising several languages, and;</title>
      </sec>
      <sec id="sec-1-3">
        <title>3. Zero-shot: Train on several languages and test on unseen languages.</title>
        <p>
          The second task (Task 4a) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] focuses on identifying multiple aspects of social media posts that relate
to scientific discourse. The task consists of creating a multi-label classifier that identifies three scientific
aspects in X posts:
1. Scientific claim detection : Identify whether a social media post contains a scientific claim;
2. Scientific reference detection : Determine whether a social media post contains references to
scientific studies or publications;
3. Scientific mentions detection : Determine whether a social media post mentions scientific entities
(e.g., a university, a scientist).
        </p>
        <p>For both tasks, we experiment with fine-tuning Large Language Models (LLMs) and conventional
Machine Learning (ML) classification models (e.g., logistic regression). We achieved good results in
Task 4a, where our approach ranked first on the final evaluation leaderboard. In the next sections, we
discuss our methodology and results in more detail for each task. Our approach can be reproduced
using the code at https://github.com/climatesense-project/climatesense-checkthat2025.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Datasets</title>
      <p>The datasets of Task 1 and Task 4a are significantly diferent given both the diferent purpose of each
task and the content sources used for creating each dataset. Task 1 dataset is collected from news
articles in multiple languages whereas Task 4a data is obtained from X posts. Task 1 is also a binary text
classification task whereas Task 4a is a multi-label classification task.</p>
      <sec id="sec-2-1">
        <title>2.1. Task 1: Sentence Subjectivity</title>
        <p>The dataset used for Task 1 contains sentences extracted from news articles and spans five diferent
languages: Arabic, Bulgarian, German, English, and Italian. Each language folder contains a train,
development (dev) and test dataset, with an additional unlabelled test set provided for the challenge
submission. The training dataset contains 6, 418 sentences, the development dataset has 2, 829 sentences,
and the test dataset consists of 2, 332 sentences. The language subdivision is detailed in the GitLab
README for Task 11 and is listed in Table 1.</p>
        <p>Each entry in the dataset contains three fields designed to enable the classification of the article
sentences as either subjective (SUBJ) or objective (OBJ): 1) sentence_id provides a unique sentence
identifier in the dataset in the UUID format; 2) sentence contains the sentence to annotate in plain
text, and; 3) label indicates if the sentence is objective (OBJ) or subjective (SUBJ).</p>
        <sec id="sec-2-1-1">
          <title>1Task1, https://gitlab.com/checkthat_lab/clef2025-checkthat-lab/-/tree/main/task1.</title>
          <p>It is important to note that the dataset labels are not balanced and the subjective aspect of several
entries is hard to determine, even for a human. Some sentences are very short (see the first two sentences
in Table 2) and their meaning can be ambiguous without having access to the sentence context. For
example, in Table 2), the last two sentences have a similar structure and meaning, but while one is
marked as OBJ, the other is annotated as SUBJ.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Task 4a: Scientific Web Discourse Detection</title>
        <p>The dataset used for the scientific web discourse task ( Task 4b) consists of posts on social media labelled
in three diferent categories: 1) The first category identifies whether the text contains a scientific claim
(cat1) or not; 2) The second category denotes if the social media post contains a reference to a scientific
study or publication (cat2), and; 3) The last category identifies the mentions of scientific entities (e.g. a
university, a scientist) (cat3).</p>
        <p>Similarly to Task 1, the Task 4a data is divided into a training and a development dataset. These two
data sets were provided during the first phase of the challenge for the development and evaluation of
the multi-label classifiers, while a third, unlabelled dataset, was provided later for the final evaluation
and ranking of the model. The training data contains 1, 366 entries. However, it contains two duplicated
entries. After removing duplicates, the training dataset contains 1, 364 distinct posts. The development
dataset contains 137 posts and the final unlabelled evaluation dataset contains 240 posts.</p>
        <p>The training and development datasets are both unbalanced. For the training dataset, only 26.2% of
the posts contain scientific claims ( cat1), 18.3% of the posts refer to a scientific study or publication
(cat2) and 24.9% of the posts mention scientific entities ( cat3). Overall, we also observe that 61.4%
of the data consist of posts that do not contain any scientific claim, references to scientific works and no
scientific entities, and 10.9% of the posts have all three categories. Although this suggests that training
a pre-classifier for such edge cases may be beneficial, we did not find it beneficial for improving the
accuracy of the classification.</p>
        <p>Each of the Task 4a dataset contains the following fields: 1) index: the index for the data sample;
2) text: the preprocessed tweet text (user mentions are replaced with @user), and; 3) labels: a
list of three labels (this field is not provided in the final evaluation data), one for each category (e.g.,
[0.0, 0.0, 1.0]). More information about the task is given in the GitLab README for Task 4a.2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Due to the diferences in goals and data format between Task 1 and Task 4a, distinct methodologies
are employed for each task. This section details the various approaches used to analyse the data and
create the classification models for each task. The code of both experiments is made available on the
ClimateSense code repository.3</p>
      <sec id="sec-3-1">
        <title>2Task 4a, https://gitlab.com/checkthat_lab/clef2025-checkthat-lab/-/tree/main/task4/subtask_4a. 3Tasks 1 and 4b code repository, https://github.com/climatesense-project/climatesense-checkthat2025.</title>
        <sec id="sec-3-1-1">
          <title>3.1. Task 1: Sentence Subjectivity</title>
          <p>
            To identify the most promising classification model, we conducted a first round of experiments on the
Italian dataset, training on the Italian development data and testing on the Italian test data. The baseline
proposed by the Task 1 organisers consists of a logistic regression head trained on a Sentence-BERT [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]
multilingual representation of the data. This solution already yielded good results, achieving an accuracy
of 0.69. Iteratively, we performed experiments by changing:
1. The representation model: we experimented with E5 [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] (chosen for its good performance in
classification tasks), ModernBERT [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] and CovidTwitterBERT [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], as they are trained on news
data;
2. The classifier : we experimented with Support Vector Classifier (SVC) and Multi-Layer Perceptron
(MLP).
          </p>
          <p>
            Separately, we performed some experiments with the Large Language Model Zephyr [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. The prompt
consisted of the task instruction (discriminate between subjective and objective sentences), optionally
the example of a SUBJ and OBJ sentenclimate, and the request to not give any explanation in output
apart from the label corresponding to the classification (SUBJ or OBJ). Our experiment included both
zero-shot or one-shot prompting, respectively including or not including one example in the prompt.
Nevertheless, Zephyr showed lower performances compared to the baseline (0.54), leading us to decide
to put this aside. This may be due to the ambiguity of some dataset entries (see Section 2.1), that can
only be acquired during the training phase.
          </p>
          <p>A second round of experiments was carried out by training models on the Italian training set (instead
of the development set) and testing on the same Italian test set. The results – reported in Table 3 –
demonstrate the superiority of multilingual E54 (24 layers, embedding size = 1024) in combination with
a 100-layer MLP classifier.</p>
          <p>We empirically set the number of training epochs for the classifier to 100. We found that going
beyond this number caused a drop in accuracy, likely due to overfitting. This behaviour was consistent
across the other languages we experimented with, all of which also reached their peak performance
around 100 epochs.
• Monolingual approach: each model was trained and tested solely on data from one language;
• Multilingual configuration : the models were trained using data collected from all five languages.</p>
          <p>and;
• Zero-shot: the same multilingual model is applied to an unknown language with the text translated
into English using Google Translate.
4We used the implementation at https://huggingface.co/intfloat/multilingual-e5-large-instruct.</p>
          <p>All Task 1 experiments were conducted on a Mac computer equipped with an Apple M1 processor.
The M1 chip integrates an 8-core CPU with 4 performance cores and 4 eficiency cores, along with an
8-core GPU, and a 16-core Neural Engine. The system was configured with 16GB of unified memory
and a 512GB SSD for storage.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Task 4a: Scientific Web Discourse Detection</title>
          <p>As mentioned in Section 2.2, the training data contains duplicate entries. Therefore, we removed the
duplicates before building the multi-label classifiers. As the main goal of the task is to obtain the best
macro-average  1 across the labels, we optimised the training based on the provided development
dataset. In particular, the development dataset was used for selecting the best-performing models and
the number of training epochs.</p>
          <p>
            The baseline provided by the challenge organisers was based on a fine-tuned DeBERTaV3 model [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]
and reported a macro-average  1 of 0.8375 ( 1 for cat1: 0.8214, cat2: 0.7925 and cat3: 0.8986).
This result suggests that fine-tuned LLMs perform well. However, using LLMs that have been trained
with social media data or scientific content could lead to improved results. We also observe that cat2
appears to be harder to classify accurately. This is confirmed when looking at the final task leaderboard. 5
Three main approaches were considered when building the multi-label classifiers:
1. LLM fine-tuning : Similarly to the baseline, a pre-trained LLM is fine-tuned on the multi-label task.
2. Embedding classifier : We use a conventional ML classifier (e.g. SVM or logistic regression) on top
of a sentence encoder without fine-tuning it.
3. Eficient Few-shot Learning with Sentence Transformers (SetFit) : We fine-tune sentence encoders
using the training data before using a conventional classification head.
          </p>
          <p>We also investigated whether training separate classifiers yielded better results than using a single
multi-label classifier, and found that creating a single model with diferent classification heads for
each task provided the best results. We also looked at zero-shot classifiers using generative models
(deepseek-r1:7b), but the results were lower than the baseline.</p>
          <p>Since the training data is unbalanced we considered two main approaches for alleviating the risk of
overfitting on the majority classes. First, we used a weighted loss function to improve the classification.
Second, we created a paraphraser model that generated new minority-class examples by paraphrasing
existing minority examples. This approach only works when creating a separate classifier for each
category.</p>
          <p>The oversampling model used deepseek-r1:7b and Ollama6 for rewriting the sentences. The
prompt used a set of personas to generate the new examples based on the following prompt: "Just
answer with the text and nothing else, generate an alternate version of the following tweet as if you were P"
where P is one of the following personas: friendly, formal, humorous, poetic, sarcastic, dramatic, scientific,
mysterious, adventurous, romantic, philosophical, historical, technical, casual, business-like, playful,
empathetic, authoritative, inquisitive, optimistic, pessimistic, cynical, realistic, idealistic, whimsical, nostalgic,
sophisticated, down-to-earth, witty, charming, enigmatic, intellectual and artistic. The oversampling
method was only used with the embedding models, where separate classifiers were trained for each
category.</p>
          <p>
            Due to the data and task overlap between Task 4a and the research conducted in SciTweets [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], we
also investigated the integration of the heuristic methods of SciTweets into the classifiers developed as
part of Task 4a. We based our code on the code provided in the SciTweets GitHub repository.7
          </p>
          <p>
            All Task 4a experiments were conducted on a GPU server with an Intel Xeon Silver 4114
processor with two NVIDIA Tesla P40 with 24GB memory and 269GB RAM. The following sentence
encoders and models for the embedding and SetFit models were considered: all-minilm and
bge-m3 [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]. These models were selected based on the computing resources available for training
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>5Task 4a leaderboard, https://codalab.lisn.upsaclay.fr/competitions/22355.</title>
        <p>
          6Ollama, https://ollama.com/.
7SciTweets GitHub heuristics, https://github.com/AI-4-Sci/SciTweets/tree/main/heuristics.
the models and the MTEB leaderboard rankings [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. After some initial experiments, we used the
twitter-roberta-base-2022-154m [14] model since it is trained on data similar to the data found
in the dataset.
        </p>
        <p>The main results on the development dataset and the final model results are discussed in Section 4.2.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>In this section, we present the results for Task 1 and Task 4a for the development datasets and the
rankings for the best performing model for the test datasets.</p>
      <sec id="sec-4-1">
        <title>4.1. Task 1: Sentence Subjectivity</title>
        <p>The results of the challenge indicate varying performance for diferent configurations for the test data.
The monolingual models show a range of scores, with Mono-German and Mono-English achieving
the highest F1 score of 0.72. The multilingual model achieved a score of 0.65. For the zero-shot
configurations, performance varies significantly, with zero-shot Romanian achieving the highest score
of 0.74, while zero-shot Greek has the lowest at 0.41. We achieved a notable third place for Ukrainian,
with a score of 0.64. These results suggest that the efectiveness of multilingual and zero-shot approaches
is highly dependent on a specific language and context. It is also possible that the performance of
automatic translations influences the results, even though the translations appeared accurate upon
manual review.</p>
        <p>To further investigate the impact of sentence length on model performance, we conducted a dedicated
experiment comparing the classification  1 scores for short sentences (less than 40 characters) and
long sentences (more than 40 characters) across all languages. The results are reported in Table 4. In
most cases, the models performed better on long sentences, suggesting that longer inputs provide richer
linguistic and semantic cues that facilitate the detection of subjectivity. This is likely because subjective
discourse often relies on nuanced expressions, modifiers, or contextual markers that are absent in very
short statements, leading to the observed drop in score.</p>
        <p>This observation is valid for all the monolingual configurations 8. In the multilingual and zero-shot
configurations, this trend is less pronounced and even reversed for Romanian, Greek, and Ukrainian,
where the highest performances are achieved on short sentences. A possible explanation is that the
training process learns to handle longer sentences more efectively, but this ability does not transfer
well in a zero-shot (translated) scenario, where shorter sentences tend to be simpler and, consequently,
easier to classify.
8The perfect  1 score observed on short German sentences is not statistically significant due to the extremely small number
of samples (only six), while in any other configurations they are more than 30.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Task 4a: Scientific Web Discourse Detection</title>
        <p>The results for Task 4a for the testing data are presented in Table 5. When using multiple individual
experiments, we only focused on the individual classification results and did not compute the
microaveraged  1 for these models, as we only used the results to select the best-performing model for the
evaluation on the unlabelled data.</p>
        <p>Although the best model achieved the third position on the development leaderboard (micro-average
 1 = 0.8887), it achieved first position on the final evaluation dataset (micro-average  1 = 0.7998),
suggesting less overfitted results compared to the other models. The model may also benefit from using
a weighted loss function.</p>
        <p>While the use of specific classification heads for each category showed a small improvement over the
regression head used by the SetFit models and the fine-tuned models, the approach may have benefited
the final model rankings given the small diference between the first three results ( &lt; 0.01)</p>
        <p>Besides ranking first for the micro-average F1 across the categories, the model showed strong results
for each category, ranking third for the identification of scientific claims ( cat1,  1 = 0.7932) and
second for both the identification of scientific references ( cat2,  1 = 0.7736) and the identification of
scientific entities ( cat3,  1 = 0.8325).</p>
        <p>The embedding models benefited from using the oversampling method described in Section 3.2, but
during tests, the impact on the fine-tuned model was mostly negative. Therefore, it was not integrated
into the fine-tuned models. Similarly, the heuristics helped with the accuracy of specific models (e.g.
all-minilm) but did not provide advantages when used with fine-tuned models.</p>
        <p>SetFit models were slow to train due to the number of pairs generated from the training data. As
a result, the training of these models was limited and the results were not as good as the fine-tuned
models.</p>
        <p>Overall, fine-tuning performed best, but the top results were obtained when specific classification
heads were used for each category. This approach shares some similarities with SetFit, but instead of
iftting a sentence encoder and then training a classifier on the embeddings, a model is fine-tuned, and
then its embeddings are used for training a conventional classifier (or in our case, multiple classifiers).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this paper, we have presented the ClimateSense’s team results for the CheckThat! 2025 Lab Challenge
Task 1 and Task 4a. Our results highlight the efectiveness of fine-tuning pre-trained language models,
particularly for Task 4a, where ClimateSense secured first place on the final evaluation leaderboard.
This result suggests that combining fine-tuned LLMs with more traditional classification models can be
beneficial compared to simply relying on fine-tuning LLMs with a classification head.</p>
      <p>
        While our approaches have yielded competitive results, several avenues remain for further exploration
and improvement. For Task 1 (Sentence Subjectivity), the performance limitations observed with Zephyr
deserve further study. Future work should explore the use of larger language models and alternative
prompting strategies, such as chain of thought or chain of verification techniques, to potentially enhance
zero-shot capabilities. Additionally, in the zero-shot scenario, it may be beneficial to reverse the current
approach by creating language-specific training sets through the translation of English sentences into
target languages such as Ukrainian, Greek, and others, rather than translating those languages into
English. Finally, given that short or isolated sentences tend to be the most misleading and dificult to
classify, future experiments could evaluate the impact of systematically excluding these instances from
the dataset. During the development of the Task 4a models, we identified data from SciTweet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that
could improve the identification of scientific claims ( cat1). Future work should investigate its potential
as additional training data for Task 4a. Our experiments with zero-shot models and paraphrasing were
constrained by available computational resources. This limitation also influenced the types of models
we could fine-tune. Similarly to Task 1 future work, future research should explore more powerful LLMs.
Future experiments could also investigate optimising the nfie-tuning process on limited resources (e.g.,
quantisation, PEFT [15]) to improve the performance of the classification models.
      </p>
      <p>Finally, it would be beneficial to evaluate the generalisation of the proposed models across additional
social media platforms and news sources to assess the robustness of the models in diferent
environments. Investigating explainability methods for both tasks could also enhance the interpretability and
trustworthiness of the proposed approaches, particularly for fact-checking applications.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the European CHIST-ERA program within the ClimateSense project (Grant
ID ANR-24-CHR4-0002, EPSRC EP/Z003504/1).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used MistralAI’s Le Chat and Google Gemini in order
to: Grammar and spelling check, Paraphrase and reword. After using these tools, the authors reviewed
and edited the content as needed and take full responsibility for the publication’s content.</p>
      <p>URL: https://arxiv.org/abs/2502.13595. doi:10.48550/arXiv.2502.13595.
[14] D. Loureiro, K. Rezaee, T. Riahi, F. Barbieri, L. Neves, L. E. Anke, J. Camacho-Collados, Tweet
insights: A visualization platform to extract temporal insights from twitter, arXiv preprint
arXiv:2308.02142 (2023).
[15] S. Mangrulkar, S. Gugger, L. Debut, Y. Belkada, S. Paul, B. Bossan, Peft: State-of-the-art
parametereficient fine-tuning methods, https://github.com/huggingface/peft, 2022.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.), Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2025</year>
          , Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Venktesh</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! Lab: Subjectivity, fact-checking, claim normalization, and retrieval</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nawrocka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ivasiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Razvan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mihail</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! lab task 1 on subjectivity in news article</article-title>
          ,
          <source>in: [1]</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Boland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bringay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! lab task 4 on scientific web discourse</article-title>
          ,
          <source>in: [1]</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-BERT:
          <article-title>Sentence embeddings using Siamese BERT-networks</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          . URL: https://aclanthology.org/D19-1410/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1410.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Text embeddings by weakly-supervised contrastive pre-training, arXiv (</article-title>
          <year>2024</year>
          ). URL: https://arxiv.org/abs/2212.03533. arXiv:
          <volume>2212</volume>
          .
          <fpage>03533</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Warner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chafin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Clavié</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hallström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Taghadouini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gallagher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ladhak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Aarsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cooper</surname>
          </string-name>
          , G. Adams,
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Poli</surname>
          </string-name>
          , Smarter, Better, Faster, Longer:
          <string-name>
            <given-names>A Modern</given-names>
            <surname>Bidirectional</surname>
          </string-name>
          <article-title>Encoder for Fast, Memory Eficient, and Long Context Finetuning and Inference (</article-title>
          <year>2024</year>
          ). URL: https://arxiv.org/abs/2412.13663. arXiv:
          <volume>2412</volume>
          .
          <fpage>13663</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Salathé</surname>
          </string-name>
          , P. Kummervold,
          <article-title>COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter</article-title>
          ,
          <source>Frontiers in Artificial Intelligence</source>
          <volume>6</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3389/frai.
          <year>2023</year>
          .
          <volume>1023281</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tunstall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Beeching</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lambert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rajani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belkada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Von</given-names>
            <surname>Werra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fourrier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habib</surname>
          </string-name>
          , et al.,
          <source>Zephyr: Direct Distillation of LM Alignment, in: First Conference on Language Modeling</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>Debertav3: Improving deberta using electra-style pre-training with gradientdisentangled embedding sharing</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2111</volume>
          .
          <fpage>09543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bringay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          , S. Dietze,
          <article-title>SciTweets-A Dataset and Annotation Framework for Detecting Scientific Online Discourse</article-title>
          ,
          <source>in: Proceedings of the 31st ACM International Conference on Information &amp; Knowledge Management</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3988</fpage>
          -
          <lpage>3992</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Bge m3-embedding: Multi-lingual, multifunctionality, multi-granularity text embeddings through self-knowledge distillation</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2402</volume>
          .
          <fpage>03216</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Enevoldsen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Chung</surname>
          </string-name>
          , I. Kerboua,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kardos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Siblini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krzemiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. I.</given-names>
            <surname>Winata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sturua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Utpala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciancone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schaefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sequeira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dhakal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rystrøm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Solomatin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ömer</given-names>
            <surname>Çağatan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kundu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernstorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sukhlecha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pahwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Poświata</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. K. GV</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ashraf</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Auras</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Plüster</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          <string-name>
            <surname>Harries</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Magne</surname>
            , I. Mohr,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hendriksen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Gisserot-Boukhlef</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Aarsen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kostkan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Wojtasik</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Šuppa</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Rocca</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hamdy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Michail</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Faysse</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vatolin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Thakur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Vasani</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Chitale</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Tedeschi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tai</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Snegirev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Günther</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>X. H.</given-names>
          </string-name>
          <string-name>
            <surname>Lù</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clive</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Krishnakumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Maksimova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Wehrli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Tikhonova</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Panchal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Abramov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ostendorf</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Clematide</surname>
            ,
            <given-names>L. J.</given-names>
          </string-name>
          <string-name>
            <surname>Miranda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Fenogenova</surname>
            , G. Song,
            <given-names>R. B.</given-names>
          </string-name>
          <string-name>
            <surname>Safi</surname>
            ,
            <given-names>W.-D.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Borghini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Cassano</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Yen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hooker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Adlakha</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Weller</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Reddy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Muennighof</surname>
          </string-name>
          , Mmteb:
          <article-title>Massive multilingual text embedding benchmark</article-title>
          ,
          <source>arXiv preprint arXiv:2502.13595</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>