<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>L. Tedeschini)
 https://github.com/MatteoFasulo (M. Fasulo); https://github.com/ElektroDuck (L. Babboni);
https://github.com/LucaTedeschini (L. Tedeschini)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matteo Fasulo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Babboni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Tedeschini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering (DISI) - University of Bologna</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This paper presents AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles, classifying sentences as subjective/objective in monolingual, multilingual, and zero-shot settings. Training/development datasets were provided for Arabic, German, English, Italian, and Bulgarian; final evaluation included additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) to assess generalization. Our primary strategy enhanced transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations, aiming to improve upon standard fine-tuning. We explored this sentimentaugmented architecture with mDeBERTaV3-base, ModernBERT-base (English), and Llama3.2-1B. To address class imbalance, prevalent across languages, we employed decision threshold calibration optimized on the development set. Our experiments show sentiment feature integration significantly boosts performance, especially subjective F1 score. This framework led to high rankings, notably 1st for Greek (Macro F1 = 0.51).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;subjectivity detection</kwd>
        <kwd>transformers</kwd>
        <kwd>multilinguality</kwd>
        <kwd>sentiment-based features</kwd>
        <kwd>threshold calibration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>• The application of decision threshold calibration to mitigate class imbalance inherent in the
provided datasets, further refining performance.</p>
      <p>We evaluate our system across monolingual, multilingual, and zero-shot settings, focusing on improving
the F1 score for the subjective class. Our work aims to provide insights into efective strategies for
multilingual subjectivity detection, highlighting benefits of integrating sentiment features and careful
handling of imbalanced data within a transformer-based framework.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Subjectivity detection, often used as a preprocessing step to sentiment analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], aims to filter out
objective content and retain subjective sentences, which are then analyzed for polarity. While the two
tasks are closely intertwined and can function complementarily [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], this pipeline-based approach
has been common in early works. Subjectivity detection initially relied on lexical resources (e.g.,
SentiWordNet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) and rule-based systems. While interpretable, these methods lacked adaptability
to diverse linguistic expressions and contexts. This limitation was partially addressed by machine
learning techniques leveraging engineered features (e.g., n-grams, POS tags), which, however, still faced
generalization issues.
      </p>
      <p>
        The advent of deep learning, particularly transformer-based models like BERT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], has significantly
advanced NLP tasks, including both subjectivity detection and sentiment classification. These models
learn rich contextual representations from large unlabeled corpora, enabling superior performance
when fine-tuned. Our work aligns with this literature by combining both perspectives: since our goal
is to identify subjective sentences, we leverage sentiment analysis signals to reinforce subjectivity
predictions—an approach supported by prior findings that highlight the strong interdependence between
subjectivity and sentiment [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Additionally, previous CLEF CheckThat! Labs have also demonstrated
the efectiveness of transformer architectures for related subtasks, such as identifying subjective claims
in news articles [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Multilingual subjectivity detection introduces further complexities. While models like mBERT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or
XLM-R [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] provide strong cross-lingual transfer baselines, their performance varies across language
pairs and task specificities. mDeBERTaV3, with its disentangled attention mechanism [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], has
shown strong performance on NLU benchmarks, making it suitable here. More recent models like
ModernBERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] aim for comparable performance with improved eficiency, often focusing on English.
Augmenting text representations with auxiliary information, like sentiment or emotion, for improved
classification is an active research area. Similar to the use of emotions in sexism detection [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], we
hypothesize that explicit sentiment signals can help disambiguate subjective statements. Addressing
class imbalance is another crucial aspect, especially as one class is often more prevalent in real-world
datasets. Techniques range from data-level resampling to algorithmic approaches like cost-sensitive
learning or threshold adjustment [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Our decision threshold calibration aligns with findings that
post-hoc adjustments can efectively improve performance on imbalanced datasets without altering the
training process.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>
        The data for this task is provided by the CLEF 2025 CheckThat! Lab Task 1 organizers.1 The dataset
consists of sentences extracted from news articles across five languages: Arabic (AR), Bulgarian (BG),
English (EN), German (DE), and Italian (IT). Each sentence is labeled as either subjective (SUBJ) or
objective (OBJ). The annotation guidelines, as described in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], define subjective sentences as "those
expressing personal opinions, sarcasm, exhortations, discriminatory language, or rhetorical figures
conveying an opinion. Objective sentences include factual statements, reported third-party opinions,
open-ended comments, and factual conclusions". For each language, the data is split into training,
development (dev), and development-test (dev-test) sets. An analysis of the label distribution (Table 1
in Section 5) reveals a notable class imbalance across all languages, with the objective class being more
frequent. Italian and Arabic exhibit the most pronounced imbalance. This characteristic significantly
influences model training and evaluation, necessitating strategies to mitigate its impact.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>Our methodology fine-tunes pre-trained transformer models for binary subjectivity classification. A core
architectural element is fusing sentiment features with sentence representations before the classification
layer. We explore this sentiment-enhanced fine-tuning with several transformer architectures (detailed
in Section 4.1). To address class imbalance, we implement decision threshold calibration (Section 4.4).
An alternative, Focal Loss, is discussed in Appendix 8. The general pipeline is illustrated in Figure 1. All
ifne-tuning used a Kaggle environment with a single NVIDIA Tesla P100 GPU (16GB VRAM).</p>
      <sec id="sec-4-1">
        <title>4.1. Model Architectures</title>
        <p>Large Language Model</p>
        <p>LLM
Classification</p>
        <p>Head
Bidirectional Transformer
mDeBERTa-v3 /</p>
        <p>ModernBERT
Bidirectional Transformer
with sentiment
mDeBERTa-
twitter-xlmv3 roberta...</p>
        <p>+
Classifier
Input Sentence
Final predictions
Optimized on dev
Computed on dev-test
Decision Threshold</p>
        <p>Calibration
Raw Logit
Softmax
Threshold</p>
        <sec id="sec-4-1-1">
          <title>We experiment with three main types of transformer-based models:</title>
          <p>• mDeBERTaV3-base: A powerful multilingual model chosen for its strong cross-lingual
generalization capabilities, essential for handling the diverse languages in the task.
• ModernBERT-base: A more recent English-centric model designed for eficiency and performance.</p>
          <p>We evaluate this primarily for the English monolingual task.
• Llama3.2-1B: A smaller-scale Large Language Model. We adapt this by adding a classification
head and fine-tuning it, primarily for English, to compare its capabilities against BERT-like
architectures on this specific task. Due to resource constraints on the environment, this model
was fine-tuned using 8-bit quantization with LoRA as to fit inside a single P100 GPU.
For all models, a standard classification head (a simple feed-forward neural network) is added on top of
the [CLS] token representation (or the equivalent final hidden state for Llama).</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Sentiment Augmentation</title>
        <p>To provide the models with explicit signals about the afective content of a sentence, which we
hypothesize correlates with subjectivity, we incorporate sentiment scores as additional features.
• Sentiment Prediction: For each input sentence, we first predict its sentiment using an external
pre-trained multilingual sentiment analysis model, twitter-xlm-roberta-base-sentiment [18]. This
model outputs a three-dimensional vector representing probabilities for positive, neutral, and
negative sentiment. It was selected primarily for its robust multilingual capabilities and its
widespread adoption in sentiment analysis tasks, despite its training domain (Twitter data) being
diferent from our context of news articles.
• Feature Concatenation: These three sentiment scores are then concatenated with the [CLS]
token embedding (the output of the base transformer model) before being passed to the final
classification layer. This efectively expands the input dimensionality of the classifier to include
both the learned textual representation and the explicit sentiment signal. This approach was
primarily applied with the mDeBERTaV3-base model.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Data Preprocessing and Tokenization</title>
        <p>Sentences are tokenized using the specific tokenizer associated with each pre-trained model (mDeBERTa,
ModernBERT, Llama). We apply padding and truncation to a maximum sequence length of 256 tokens,
which covers the majority of sentence lengths in the datasets (more than 75% of sentences lenght).
Recognizing potential performance disparities across languages when using multilingual models, and
with a view to addressing specific complexities that might arise with languages like Arabic (which, as
we will discuss, presented challenges), we explored an additional strategy for the Arabic experiments.
This involved translating the Arabic data into English using the Helsinki-NLP/opus-mt-ar-en model
[19, 20] prior to fine-tuning. The aim was to assess if this could mitigate some of the language-specific
dificulties, though this particular avenue did not ultimately lead to improved performance in our final
configuration while giving slightly worse results. We attribute this outcome to several potential factors:
(1) inaccuracies and loss of fidelity introduced by the machine translation process; (2) the inherent
dificulty in preserving subtle, culturally-specific linguistic nuances crucial for subjectivity detection
when translating from Arabic to English; and (3) a resultant mismatch in sentiment representation,
as the sentiment features for this experimental branch would have been derived from the translated
English text, potentially not reflecting the original Arabic sentiment accurately.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Training and Decision Threshold Calibration</title>
        <p>Models are fine-tuned using the AdamW optimizer with a linear learning rate scheduler and warmup,
employing Cross-Entropy Loss with class weights to initially mitigate class imbalance. Batch size was
16, learning rate 1 × 10 −5 , for 6 epochs. The best checkpoint is selected based on development set
performance.</p>
        <p>Addressing the challenge of substantial class imbalance, especially concerning the subjective class,
we employed a post-hoc decision threshold optimization strategy. Initially, the model is trained on
the training set using cross-entropy loss. We then select the best-performing checkpoint based on
development set metrics. For this checkpoint, an optimal decision threshold is determined by conducting
a grid search over values ranging from 0.1 to 0.9 (0.01 increment), aiming to maximize the macro
F1 score on the development set. Finally, this optimized threshold is applied to the model’s softmax
outputs for classification on the test set. This procedure allows for fine-tuning the decision boundary
to the dataset’s class distribution while ensuring proper methodological separation between training,
development, and testing phases, thereby guarding against overfitting to the test set.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Results</title>
      <p>
        We conducted experiments for the monolingual, multilingual, and zero-shot subjectivity detection
subtasks defined by CLEF 2025 CheckThat! Lab Task 1 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Evaluation primarily focuses on
macroaverage F1 and SUBJ F1 scores, given the latter’s importance amidst class imbalance. All reported
dev-test results utilize the decision threshold calibration from Section 4.4.
      </p>
      <sec id="sec-5-1">
        <title>5.1. Monolingual Task</title>
        <p>In the monolingual setting, models were trained and evaluated on each language independently (Table
2). mDeBERTaV3-base generally performed well, particularly for German and Italian. Adding sentiment
features (mDeBERTa-V3-sentiment) consistently improved SUBJ F1 scores across most languages,
with notable gains for English (0.4046 to 0.5279) and Italian (0.6291 to 0.6804), suggesting sentiment
information provides valuable cues for subjective content. ModernBERT (English only) was competitive,
slightly outperforming baseline mDeBERTaV3-base on English SUBJ F1. Llama3.2-1B, even with LoRA,
did not match BERT-like architectures for English. Pre-translating Arabic data into English (Section 4.3)
did not improve results and was not pursued for final models.
Impact of Threshold Calibration Table 3 demonstrates the impact of the decision threshold
calibration. For languages with significant class imbalance like Arabic and Italian, calibration leads to
substantial improvements in both Macro F1 and SUBJ F1 scores. For more balanced languages (e.g.,
Bulgarian, German), the gains are marginal or, in some cases like English for mDeBERTa-V3 baseline,
standard thresholding performed slightly better by one metric, indicating the complexity of interaction
between model, data distribution, and thresholding. Overall, however, calibration proved beneficial,
especially for the target SUBJ class in imbalanced scenarios.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Multilingual and Zero-Shot Tasks</title>
        <p>For the multilingual task, mDeBERTaV3-base was fine-tuned on a combined dataset of all languages. The
model achieved a Macro F1 of 0.6942 and a SUBJ F1 of 0.6114 (Table 4). When Arabic was excluded from
the training and evaluation (given its consistently challenging nature), performance on the remaining
languages improved to a Macro F1 of 0.7817 and SUBJ F1 of 0.6887. Adding sentiment features in
the multilingual setting (mDeBERTa-V3 + Sentiment) showed mixed results when all languages were
included but provided the best performance when Arabic was excluded (Macro F1 0.7962, SUBJ F1
0.7114).</p>
        <p>In the zero-shot setting, where models were trained on a subset of languages and tested on unseen ones,
performance varied depending on the specific language combinations. Generally, models performed
better when the training set included linguistically diverse languages or those with larger datasets.
The challenges observed with Arabic in monolingual and multilingual settings persisted in zero-shot
scenarios, often leading to lower performance when Arabic was a target unseen language. Detailed
zero-shot results (e.g., Table 5) indicate that achieving robust generalization to entirely unseen languages
remains a significant challenge, though sentiment augmentation sometimes provided benefits.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Analysis of Sentiment Augmentation</title>
        <p>The positive impact of sentiment augmentation, especially for English and Italian SUBJ F1 scores,
warrants further investigation. As detailed in our discussion, we observed that sentences correctly
classified as subjective by the sentiment-enhanced model (but misclassified by the baseline) often
exhibited stronger negative sentiment scores (Table 6 and 7). This suggests the model learns to associate
pronounced sentiment (particularly negative, in the context of news critique or opinion) with subjectivity.
The distribution of sentiment scores across the dataset further indicates a tendency for subjective
sentences to carry more polarized sentiment.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Error Analysis and Language-Specific Challenges</title>
        <p>A consistent challenge across all tasks was the performance on Arabic. Monolingual Arabic models
lagged behind others, and including Arabic in multilingual training often diluted overall performance.
This suggests that either the pre-trained multilingual embeddings for Arabic are less aligned with this
specific task, or that the linguistic expression of subjectivity in the Arabic news sentences provided
difers significantly in ways not easily captured by current models without more targeted data or
architectural adaptations. Figure 2 and Figure 3 (violin plots) illustrate difering sentiment profiles for
subjective sentences in English versus Arabic, potentially explaining why sentiment augmentation was
more beneficial for some languages than others. More illustrations can be found in Section 8. For English,
a high negative sentiment often correlated with subjective labels, a pattern the sentiment-augmented
model could leverage. For Arabic, this pattern was less clear or even inverted in the provided dataset,
potentially confusing the sentiment-augmented model. Examples of sentences where sentiment helped:
• "But then Trump came to power and sidelined the defense hawks, ushering in a dramatic shift in
Republican sentiment toward America’s allies and adversaries." (Sentiment: P:0.109, Ntl:0.035,
Neg:0.856) - Strong negative sentiment aided correct SUBJ classification.
• "Boxing Day ambush &amp; flagship attack Putin has long tried to downplay the true losses his army
has faced in the Black Sea." (Sentiment: P:0.056, Ntl:0.014, Neg:0.930) - Similarly, high negative
sentiment helped.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We presented AI Wizards’ system for subjectivity detection in multilingual news articles for the CLEF
2025 CheckThat! Lab Task 1. Our experiments demonstrate that fine-tuned BERT-like architectures,
particularly mDeBERTaV3-base, ofer robust performance. A key finding is the significant improvement
in detecting subjective sentences achieved by augmenting input representations with explicit sentiment
scores, especially for languages like English and Italian. Furthermore, decision threshold calibration
proved efective for addressing class imbalance, substantially boosting F1 scores on the minority
subjective class for languages with skewed distributions. While explored, Llama3.2-1B in our setup was
less competitive than specialized BERT-like models for this task. Performance on Arabic remained a
consistent challenge, indicating a need for further research into language-specific modeling or
crosslingual transfer for this language. Our results highlight the value of combining strong base models with
task-relevant feature engineering (sentiment augmentation) and post-processing (threshold calibration)
for nuanced NLP problems in multilingual contexts. The code for our system is open-sourced, and
a multilingual model incorporating sentiment analysis is available for inference via a Hugging Face
dashboard, allowing interactive testing (see Appendix 8 for links). This work contributed to our team
achieving high rankings, notably 1st place for Greek (Macro F1 = 0.51).</p>
      <sec id="sec-6-1">
        <title>6.1. Challenge results</title>
        <p>In the following table (Table 8), we report our position in all the settings of the challenge that were
ranked over a real test set.
Unfortunately, due to an error on our part during the submission process, our multilingual score is very
low. As the challenge had already ended, we were unable to correct it. Afterwards, we checked the
score we would have achieved, obtaining an Macro F1 score of 0.68: that would have placed us in ninth
place.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Limitations</title>
      <p>Our study has several limitations. Sentiment features were derived from a general-purpose model,
which may not perfectly capture news-specific subjectivity nuances; its efectiveness also varied by
language. The explored Arabic pre-translation introduced potential noise. Computational constraints
limited our LLM exploration (Llama3.2-1B); larger or diferently fine-tuned LLMs might yield diferent
results. While early fusion of sentiment features during pre-training could ofer benefits, our late fusion
approach was adopted due to resource constraints. Finally, nfidings are based on the provided dataset,
and generalization to other news sources or subjectivity domains may vary.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Perspectives for Future Work</title>
      <p>Building upon the findings of this work, several promising directions for future research emerge. Our
approach highlights the value of sentiment augmentation but also reveals areas for refinement and
deeper exploration.</p>
      <p>• Enhanced Sentiment and Emotion Modeling: The sentiment features used in this study
were derived from a general-purpose, Twitter-trained model. Future work could involve
finetuning a sentiment or emotion analysis model specifically on news corpora to capture more
domain-relevant nuances. Exploring more granular emotional features beyond
positive/negative/neutral—such as anger, irony, or surprise—could provide even stronger signals for subjectivity.
A multi-task learning framework, where a model is simultaneously trained to predict both
subjectivity and sentiment/emotion, could also foster a more synergistic learning process.
• Leveraging Larger Language Models: Our exploration with Llama3.2-1B was limited by
computational constraints. Future research should investigate the capabilities of larger LLMs (e.g.,
7B+ parameter models) through more advanced parameter-eficient fine-tuning (PEFT) techniques
or full fine-tuning where feasible.
• Deeper Architectural and Fusion Exploration: While our simple concatenation (late
fusion) of sentiment scores proved efective, more sophisticated fusion mechanisms could yield
better performance. Techniques such as attention-based fusion, which would allow the model
to dynamically weigh the importance of semantic content versus sentiment signals, warrant
investigation. Furthermore, developing interpretability methods to analyze how the model utilizes
the concatenated features would provide valuable insights into the decision-making process and
help diagnose failures.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used OpenAI-GPT-4 in order to: grammar and
spelling check, paraphrase and reword. After using these tool(s)/service(s), the author(s) reviewed and
edited the content as needed and take(s) full responsibility for the publication’s content.
V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceedings of the 2024 Joint International Conference
on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA
and ICCL, Torino, Italia, 2024, pp. 273–285. URL: https://aclanthology.org/2024.lrec-main.25/.
[18] F. Barbieri, L. Espinosa Anke, J. Camacho-Collados, XLM-T: Multilingual language models in
Twitter for sentiment analysis and beyond, in: Proceedings of the Thirteenth Language Resources
and Evaluation Conference, European Language Resources Association, Marseille, France, 2022,
pp. 258–266. URL: https://aclanthology.org/2022.lrec-1.27.
[19] J. Tiedemann, M. Aulamo, D. Bakshandaeva, M. Boggia, S.-A. Grönroos, T. Nieminen, A. Raganato,
Y. Scherrer, R. Vazquez, S. Virpioja, Democratizing neural machine translation with OPUS-MT,
Language Resources and Evaluation (2023) 713–755. doi:10.1007/s10579-023-09704-w.
[20] J. Tiedemann, S. Thottingal, OPUS-MT – building open translation services for the world, in:
A. Martins, H. Moniz, S. Fumega, B. Martins, F. Batista, L. Coheur, C. Parra, I. Trancoso, M. Turchi,
A. Bisazza, J. Moorkens, A. Guerberof, M. Nurminen, L. Marg, M. L. Forcada (Eds.), Proceedings
of the 22nd Annual Conference of the European Association for Machine Translation, European
Association for Machine Translation, Lisboa, Portugal, 2020, pp. 479–480. URL: https://aclanthology.
org/2020.eamt-1.61/.</p>
      <sec id="sec-9-1">
        <title>Dealing with Class Imbalance</title>
        <p>We also experimented with using Focal Loss to address class imbalance in the subjectivity detection
task. However, it produced results similar to those obtained using class weights with Cross-Entropy
Loss, combined with the post-hoc decision threshold calibration employed in our final submissions.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Online Resources</title>
        <sec id="sec-9-2-1">
          <title>The source code for our system and models are available at: • GitHub: github.com/MatteoFasulo/clef2025-checkthat • Hugging Face Dashboard (Model Inference): huggingface.co/spaces/MatteoFasulo/SubjectivityDetection</title>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nawrocka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ivasiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Razvan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mihail</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! lab task 1 on subjectivity in news article</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.), Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2025</year>
          , Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <surname>V. V.</surname>
          </string-name>
          ,
          <article-title>The clef-2025 checkthat! lab: Subjectivity, fact-checking, claim normalization, and retrieval</article-title>
          , in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>467</fpage>
          -
          <lpage>478</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Venktesh</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! Lab: Subjectivity, fact-checking, claim normalization, and retrieval</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kamal</surname>
          </string-name>
          ,
          <article-title>Subjectivity classification using machine learning techniques for mining feature-opinion pairs from web opinion sources</article-title>
          ,
          <year>2013</year>
          . URL: https://arxiv.org/abs/1312.6962. arXiv:
          <volume>1312</volume>
          .
          <fpage>6962</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen, Deberta:
          <article-title>Decoding-enhanced bert with disentangled attention</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2021</year>
          . URL: https://openreview.net/forum? id=XPZIaotutsD.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>Debertav3: Improving deberta using electra-style pre-training with gradientdisentangled embedding sharing</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2111</volume>
          .
          <fpage>09543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Warner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chafin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Clavié</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hallström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Taghadouini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gallagher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ladhak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Aarsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cooper</surname>
          </string-name>
          , G. Adams,
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Poli</surname>
          </string-name>
          , Smarter, better, faster, longer
          <article-title>: A modern bidirectional encoder for fast, memory eficient, and long context finetuning and inference, 2024</article-title>
          . URL: https://arxiv.org/abs/2412.13663. arXiv:
          <volume>2412</volume>
          .
          <fpage>13663</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>A. G.</surname>
          </string-name>
          et al.,
          <source>The llama 3 herd of models</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407.21783. arXiv:
          <volume>2407</volume>
          .
          <fpage>21783</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wiebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <article-title>Recognizing contextual polarity in phrase-level sentiment analysis</article-title>
          ,
          <source>in: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL</source>
          <year>2005</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Vancouver, Canada,
          <year>2005</year>
          , pp.
          <fpage>347</fpage>
          -
          <lpage>354</lpage>
          . URL: https://www.cs.cornell. edu/people/pabo/papers/acl04_cutsent.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Naveed</surname>
          </string-name>
          , S. u. H.
          <string-name>
            <surname>Jafry</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Subjectivity and polarity detection: A survey and comparative analysis</article-title>
          ,
          <source>Future Internet</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <article-title>191</article-title>
          . URL: https://www.mdpi.com/1999-5903/14/7/191. doi:
          <volume>10</volume>
          . 3390/fi14070191.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Baccianella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Esuli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          , SentiWordNet
          <volume>3</volume>
          .
          <article-title>0: An enhanced lexical resource for sentiment analysis and opinion mining</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Piperidis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rosner</surname>
          </string-name>
          , D. Tapias (Eds.),
          <source>Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Valletta, Malta,
          <year>2010</year>
          . URL: https://aclanthology.org/L10-1531/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Leistra</surname>
          </string-name>
          , T. Caselli, Thesis titan at checkthat! 2023:
          <article-title>Language-specific fine-tuning of mdebertav3 for subjectivity detection</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , M. Vlachos (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ),
          <source>CEUR Workshop Proceedings, CEUR Workshop Proceedings (CEUR-WS.org)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>351</fpage>
          -
          <lpage>359</lpage>
          . Publisher Copyright:
          <article-title>© 2023 Copyright for this paper by its authors</article-title>
          .
          <source>; 24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN</source>
          <year>2023</year>
          ; Conference date:
          <fpage>18</fpage>
          -
          <lpage>09</lpage>
          -2023 Through 21-
          <fpage>09</fpage>
          -
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>1911</year>
          .02116. arXiv:
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>M. E. Muti</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Enriching hate-tuned transformer-based embeddings with emotions for the categorization of sexism. ceur-ws</article-title>
          .,
          <source>in: CEUR-WS Workshop Proceedings</source>
          , volume
          <volume>3497</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2023</year>
          , pp.
          <fpage>1012</fpage>
          -
          <lpage>1023</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdelhamid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desai</surname>
          </string-name>
          ,
          <article-title>Balancing the scales: A comprehensive study on tackling class imbalance in binary classification, 2024</article-title>
          . URL: https://arxiv.org/abs/2409.19751. arXiv:
          <volume>2409</volume>
          .
          <fpage>19751</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>Antici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fedotova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <article-title>A corpus for sentence-level subjectivity detection on English news articles</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
          </string-name>
          , M.-Y. Kan,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>