<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SmolLab_SEU at CheckThat! 2025: How Well Do Multilingual Transformers Transfer Across News Domains for Cross-lingual Subjectivity Detection?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Md. Abdur Rahman</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Al Amin</string-name>
          <email>alaminhossine@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Sabbir Dewan</string-name>
          <email>sabbirdewan11@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Jahid Hasan</string-name>
          <email>mdjahid.hasan@eastern.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Ashiqur Rahman</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eastern University</institution>
          ,
          <addr-line>Philadelphia, Pennsylvania</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Murdoch University</institution>
          ,
          <addr-line>Perth</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Southeast University</institution>
          ,
          <country country="BD">Bangladesh</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>St. Francis College</institution>
          ,
          <addr-line>Brooklyn, New York</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Automated detection of subjectivity in news articles is an important problem for fighting against fake news and promoting journalistic accountability, but this is a challenging task in various linguistic settings. This paper shows our method on Task 1: Subjectivity of the CLEF 2025 CheckThat! Lab to identify objective information vs subjective opinion in news content available in several languages. To this aim, we experimented with a variety of Transformer models using architecture specifically designed for a particular language (GermanELECTRA-large, MARBERT-v2, RoBERTa-large) as well as multilingual (XLM-RoBERTa-large, mDeBERTaV3-base, InfoXLMlarge) and zero-shot (mBERT-base) models. Our approach utilized the fine-tuning of pre-trained models with hyperparameters and class-weighted loss functions so as to tackle the imbalanced data. Experimental results show that our models perform well: German-ELECTRA-large achieved 0.8520 F1 in German, XLM-RoBERTa-large got 0.8356 F1 in Italian and 0.8040 in Romanian zero-shot, RoBERTa-large is best, with 0.7948 F1 on English, and InfoXLM-large achieves 0.7114 F1 on multilingual setting. In oficial ranking, our systems obtained 1 st rank in Monolingual German, 2nd in Zero-shot Romanian, 3rd in Monolingual Italian, 5th in Zero-shot Ukrainian, 6th in Multilingual, 8th in Zero-shot Polish and 9th in Monolingual Arabic and English. Error analysis shows that monolingual models excelled monolingually, and multilingual architectures achieve better cross-lingual generalization in the zero-shot settings. This work provides insights into the suitability of Transformer in multilingual subjectivity detection and demonstrates the dificulties in recognizing subtle subjective cues in diferent linguistic environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Subjectivity Detection</kwd>
        <kwd>Transformer Models</kwd>
        <kwd>Cross-lingual Transfer</kwd>
        <kwd>Zero-shot Learning</kwd>
        <kwd>Multilingual NLP</kwd>
        <kwd>News Articles Analysis</kwd>
        <kwd>Fine-tuning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid spread of news and opinions in the modern information-saturated environment shapes
individual beliefs and societal debate to an unparalleled extent. The ability to separate objective facts
from subjective claims is more important than ever given the growing frequency of subjective language
in news reporting, particularly in the online media [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and its regular connection with misleading
information and fake news [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As a language tool, subjectivity captures the speaker’s or writer’s
attitude, posture, and feelings toward the topic under discussion, therefore imprinting a personal mark
on the communication [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Natural language processing (NLP) ofers a powerful means to automate
this diference, thus enhancing media literacy and supporting the campaign against misinformation.
      </p>
      <p>
        This paper details our participation in Task 1: Subjectivity of the CLEF 2025 CheckThat! Lab [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
The work tests algorithms for determining whether news article sentences fall into objective (OBJ) or
subjective (SUBJ) classes. Using datasets [7, 8] in Arabic, Bulgarian, English, German, and Italian for
training and development, it spans monolingual, multilingual, and zero-shot assessment scenarios with
additional undiscovered languages (Greek, Polish, Romanian, Ukrainian) incorporated in the test phase.
The macro-averaged F1-score is the formal benchmark for performance. Section 3 gives more details on
the work and the datasets.
      </p>
      <p>Our basic approach involves fine-tuning a wide range of pre-trained Transformer-based language
models [9], specifically chosen and modified for each language or context. We examined strong
multilingual designs as well as language-specific models.</p>
      <p>Our main contributions consist of the following:
• We evaluated a varied suite of Transformer-based models for subjectivity detection across
monolingual, multilingual, and zero-shot settings.
• We investigated model performance holistically on the subjectivity task, providing information on
their capacities, cross-lingual transfer eficacy, and common error patterns over several languages
and settings.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>The problem of detecting subjective expressions in text, commonly known as subjectivity detection,
is a well-established research area in Natural Language Processing (NLP) and is closely related to
sentiment analysis. Wiebe et al. [10] were among the early researchers to annotate subjectivity in text
data. They have made their own tool called OpinionFinder [11], which tried to use diferent lexical and
syntactic indicators to identify and analyze opinions. The task has recently been addressed successfully
by using various deep-learning methods, leading to impressive performances. After preprocessing
and annotating a political and ideological discussion dataset using a unique mix of lexicon-based and
syntactic pattern-based approaches, Al Hamoud et al. [12] conducted a sentence subjectivity analysis
on it. GloVe word embeddings enable them to evaluate six deep learning models (LSTM, BiLSTM, GRU,
BiGRU, LSTM with attention, and BiLSTM with attention). On their two-class subjectivity classification
problem, their best-performing model, LSTM with attention, achieved an accuracy of 97.39%, an F1-score
of 99.20%, and a Kappa coeficient of 94.76%.</p>
      <p>For joint polarity and subjectivity detection, Satapathy et al. [13] presented a
knowledge-sharingbased multitask learning (MTL) paradigm. They linked the two tasks using a neural tensor network
(NTN) and BERT embeddings. Their MTL framework, which combines BERT and NTN
(MTLsharedNTN), achieved an accuracy of 95.1% for subjectivity detection and 94.6% for polarity detection on
movie review datasets.</p>
      <p>ThatiAR, a sizable manually annotated Arabic news sentence dataset for subjectivity detection, was
ifrst presented by Suwaileh et al. [ 8], along with GPT-4-generated explanations and directions for
LLM fine-tuning. They benchmarked several large language models (LLMs) and Pre-trained Language
Models (PLMs). On their ThatiAR data, GPT-4 with 3-shot in-context learning attained an F1-score of
0.800 for the “Subjective” class.</p>
      <p>For the CLEF 2023 CheckThat! Lab, Pachov et al. [14] developed a method for subjectivity
identification in English news stories. Combining three separate techniques, a fine-tuned sentence embedding
encoder with dimensionality reduction (SBERT, PCA, ElasticNet), a few-shot learning SetFit model,
and a fine-tuned xlm-roberta-base model and their best-performing strategy was a simple majority
voting ensemble. The last ensemble on the English exam set achieved an overall macro F1-score of 0.77,
scoring 0.77 for the “subjective” class and 0.78 for the “objective” class.</p>
      <p>Using Google’s pre-trained large language model (LLM), Gemini, Gruman and Kosseim [15] detailed
their zero-shot method for subjectivity categorization (Task 2) at CLEF 2024. Their approach consisted
of fast engineering, in which a random portion of training data in the input prompt for context
and augmenting the test data by producing two paraphrases; a majority vote over the original and
paraphrased words selected the final label. Their F1 score on the English test set for Task 2-Subjectivity
was 0.370.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task and Dataset Description</title>
      <p>
        We took part in Task 1: Subjectivity of the CLEF 2025 CheckThat! Lab [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. The main goal of the task
is to distinguish sentences taken from news articles between subjective (SUBJ), reflecting the opinion of
the author, and objective (OBJ), provisioning a factual information. This binary classification problem
is organized into three distinct settings: monolingual (training/testing with one language), multilingual
(training/testing with a combined dataset of languages), and zero-shot (training with diferent languages
and testing with unseen languages).
      </p>
      <p>The oficial training dataset [ 7, 8] is a set of sentences from news articles. Training and development
set splits are available for five languages: Arabic, Bulgarian, English, German, and Italian. Table 1
summarizes the number of sentences with SUBJ / OBJ counts per split for the five languages. We also
have test sets in Greek, Polish, Romanian and Ukrainian (mainly zero-shot) and one multilingual test
set. For multilingual and zero-shot setting, data can be pooled across all language-specific languages.
The overall system performance is measured by the macro-average F1-score over the SUBJ and OBJ
classes.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This section describes the methods used for Task 1: Subjectivity, which includes monolingual,
multilingual, and zero-shot setting. Our main methodology consists of fine-tuning a diverse set of pre-trained
Transformer-based language models, adapted to either a language or setting, as illustrated in Figure 1.
The main objective was eficient classification of sentences into subjective (SUBJ) and objective (OBJ)
while maximizing the macro F1-score.</p>
      <p>Dataset</p>
      <p>Data
Preprocessing</p>
      <sec id="sec-4-1">
        <title>Feature Extraction</title>
      </sec>
      <sec id="sec-4-2">
        <title>Monolingual</title>
        <p>BERT-based
Tokenizer</p>
      </sec>
      <sec id="sec-4-3">
        <title>Multilingual</title>
        <p>BERT-based
Tokenizer</p>
      </sec>
      <sec id="sec-4-4">
        <title>Zero-Shot</title>
        <p>BERT-based
Tokenizer</p>
        <p>English</p>
        <p>DeBERTaV3 base
XLM-RoBERTa large</p>
        <p>Arabic
AraBERT v2
ARBERTv2</p>
        <p>MARBERTv2
XLM-RoBERTa large</p>
        <p>German
German BERT large</p>
        <p>GottBERT
German ELECTRA large
XLM-RoBERTa large</p>
        <p>Italian
UmBERTo Commoncrawl
cased
BERT-Base Italian XXL cased
XLM-RoBERTa large</p>
        <p>Multilingual
InfoXLM large</p>
        <p>mT5 base
mDeBERTaV3 base</p>
        <p>mBERT cased
XLM-RoBERTa large</p>
        <p>Greek
XLM-RoBERTa large
mDeBERTaV3 base
mBERT cased</p>
        <p>Polish
XLM-RoBERTa large
mDeBERTaV3 base
mBERT cased</p>
        <p>Ukrainian
XLM-RoBERTa large
mDeBERTaV3 base
mBERT cased</p>
        <p>Romanian
XLM-RoBERTa large
mDeBERTaV3 base
mBERT cased</p>
        <sec id="sec-4-4-1">
          <title>4.1. Data Preprocessing and Feature Extraction</title>
          <p>data split (train, development, and test). A standard text cleaning procedure was adopted, which largely
involved the removal of leading and trailing whitespaces. We also discarded any rows from the dataset
that were missing either the sentence text or its corresponding label. For feature extraction, we used
pre-trained, model-specific tokenizers using AutoTokenizer from the Hugging Face Transformers
library1 for the range of diferent pre-trained architectures across our experiments (i.e., RoBERTa,
DeBERTa, XLM-RoBERTa, language-specific BERTs, and mT5). Sentences were tokenized into subword
units. This returned the input_ids and attention_mask tensors, which are the direct input features
for the Transformer models. These sequences were tokenized by padding shorter sequences and
truncating longer sequences to a max length of 128 tokens. Special tokens ([CLS], [SEP]) specific to
each Transformer model were automatically added by the tokenizer. Finally, the vocabulary ’SUBJ’
and ’OBJ’ were indexed sequentially as 1 and 0 for training and evaluation.</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>4.2. Transformer-Based Models</title>
          <p>The core of our experimental methodology for all types of experimental settings, monolingual,
multilingual and zero-shot, was fine-tuning of pre-trained Transformer models [ 9]. These models, in particular
famous for their self-attention mechanisms, have a great potential to capture intricate contextual
nuances and long-range dependencies in text, making them extremely powerful for a variety of Natural
Language Processing (NLP) tasks [16, 17]. An overview of the general experimental pipeline,
demonstrating the process from raw data ingestion and pre-processing, tokenization, feature extraction and
subsequent classification by a variety of Transformer model species designed for the task setting, is given
in Fig. 1. For each Transformer model, a classifier head was attached to the pre-trained encoder stack.
This head generally consisted of a dropout layer for regularization and then a final linear layer, which
projected the context-sensitive representation of the [CLS] token’s final hidden state (or equivalent
pooled output representation based on the architecture of the model) into a logit space of the two target
classes: subjective (SUBJ) and objective (OBJ).</p>
          <p>In the monolingual experiments, we concentrated on tuning models on individual language-specific
corpora. In the case of English, we performed our experiments using RoBERTa-large [18], which is
more robust to pre-training, and DeBERTaV3-base [19], which is equipped with advancements such
as disentangled attention. The analysis of subjectivity in Arabic involved utilizing Arabic-pretrained
models such as AraBERT v2 [20], ARBERTv2 and MARBERTv2 [21] all specifically pre-trained on large
Arabic corpora, in addition to the widely multilingual model XLM-RoBERTa large [22]. For German,
we tried German BERT large [23], GottBERT [24] (trained on historical and modern German), German
ELECTRA large [23] (with replaced token detection as a pre-training objective) as well as XLM-RoBERTa
large. Our experiments on Italian were performed with UmBERTo Commoncrawl cased2 and BERT-Base
Italian XXL cased3, Italian-tuned transformers, and XLM-RoBERTa large. It was a choice that sought to
benefit from not only language-specific optimization but also the strong multilingual representations in
monolingual training.</p>
          <p>In the multilingual setting, our approach consisted of training models on an aggregated dataset created
by concatenating the training data from all five main languages: Arabic, Bulgarian, English, German,
and Italian. This approach was proposed so the models could learn mutual, language-independent cues
for subjectivity. For this task, we used pre-trained multilingual models that were chosen specifically
for their multilingual capacity, which are XLM-RoBERTa large, InfoXLM large [25] (that includes
cross-lingual pre-training objectives like Translation Language Modeling), mT5 base [26] (modified
from its initial sequence-to-sequence architecture for this classification task), mDeBERTaV3 base, and
mBERT cased [16].</p>
          <p>The zero-shot scenario clearly measured the multilingual model’s generalization capability. Models
were first pre-trained on the joint multilingual dataset (Arabic, Bulgarian, English, German and Italian).
We then test the tuned models using test sets of unseen test languages (Greek, Polish, Ukrainian and
Romanian) without any fine-tuning in between. This configuration profoundly questions the extent
to which models are able to generalize the learned subjectivity properties across diferent linguistic
contexts based exclusively on the cross-lingual information that they have gathered during pre-training
1https://huggingface.co/transformers/
2https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1
3https://huggingface.co/dbmdz/bert-base-italian-xxl-cased
and multilingual fine-tuning. We mostly investigated the eficiency of zero-shot transfer using
XLMRoBERTa large, mDeBERTaV3 base and mBERT cased architectures.</p>
          <p>The fine-tuning procedure was mostly the same throughout all experiments. We used the AdamW
optimizer [27] that provides an enhanced weight decay regularization compared to the original Adam.
A linear learning rate warm-up and decay scheduler was adopted to stabilize the training process,
with the warm-up phase set as the first 10% steps regarding the total training steps. The loss function
was the standard Cross Entropy Loss, as natively implemented within the Hugging Face
AutoModelForSequenceClassification classes. For some models, if the class imbalance was high in the training
set of a language (e.g., RoBERTa-large for English), pre-calculated class weights were supplied to the
model, enabling the framework to internally adjust the loss computation to give more importance to
underrepresented classes. During training, models were first trained on the specified training sets, and
their performance was measured on the corresponding development sets using the macro F1-score.
This optimal checkpoint was then subjected to a subsequent, shorter phase of fine-tuning (3 epochs) on
a combined dataset comprising both the original training and development data, often employing a
reduced learning rate. This last recalibrated model was then applied to predict the specific test sets. All
experiments were performed with a set of fixed random seeds (42) for reproducibility. A max-norm of
1.0 was used to clip gradients to avoid exploding during backpropagation. Hyperparameter settings for
the main models used in each experiment are detailed in Table 2. All experiments were conducted in a
Kaggle environment with an NVIDIA Tesla P100 GPU, which provided the necessary computational
resources to fine-tune large-scale models.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Result Analysis</title>
      <p>This section provides a comprehensive experimental analysis of our model’s performance for the task
of subjectivity classification in typical monolingual, multilingual and zero-shot settings. The
macroaveraged F1 is the oficial evaluation metric. Table 3 presents the recall, precision and F1-score of
Transformer-based models on the corresponding test set.</p>
      <p>In the monolingual setting, for English, RoBERTa-large (0.7948 F1) outperformed DeBERTaV3-base.
In Arabic, MARBERT-v2 (0.5885 F1) achieved the best performance, closely followed by XLM-RoBERTa
large (0.5747 F1), outperforming AraBERTv2 and ARBERTv2. The German ELECTRA large (0.8520
F1) yielded the best performance for German. Its success can be attributed to its pre-training on a
massive German-only corpus combined with the eficient "replaced token detection" objective, allowing
it to capture language-specific nuances of subjectivity more efectively than generalist multilingual
models. Conversely, in Italian, the multilingual XLM-RoBERTa large (0.8356 F1) slightly outperformed
the Italian-specific BERT-Base XXL Cased (0.8309 F1). This indicates that the sheer scale and diversity
of XLM-RoBERTa’s pre-training can sometimes create representations robust enough to compete with,
or even exceed, those of large models trained on a single language’s corpus.</p>
      <p>In the multilingual setting, where models were trained on data from all five languages, InfoXLM-large
(0.7114 F1) and mDeBERTaV3-base (0.7109 F1) performed best. The strong performance of InfoXLM is
particularly noteworthy; its pre-training includes cross-lingual objectives like Translation Language
Modeling (TLM), which explicitly forces the model to align sentence representations across languages.
This is a powerful advantage for learning a language-independent concept like subjectivity from a
mixedcorpus. XLM-RoBERTa large also achieved strong results (0.6993 F1), leveraging its vast multilingual
vocabulary, while the older mBERT-cased and sequence-to-sequence mT5-base models were notably
behind.</p>
      <p>The zero-shot setting tested generalization to unseen languages. Here, the architectural strengths and
pre-training data of the models become paramount. XLM-RoBERTa large excelled on Ukrainian (0.6730
F1) and Romanian (0.8040 F1). This success is likely due to its massive pre-training on 100 languages from
CommonCrawl, which includes both Slavic (related to Ukrainian) and Romance (related to Romanian)
languages. This linguistic proximity allows the model to efectively transfer learned subjectivity patterns
to unseen but related languages. In contrast, mDeBERTaV3-base proved superior for Greek (0.4945 F1)
and Polish (0.5737 F1). Its architectural improvements, such as disentangled attention, may allow it to
capture more abstract syntactic and semantic cues of subjectivity that generalize better across more
distant language families. These results underscore that successful zero-shot transfer depends not only
on multilingual exposure but also on the specific relationship between the source and target languages.</p>
      <p>While specialized monolingual models can achieve peak performance in their native language, our
ifndings consistently show that large, robustly pre-trained multilingual models like
XLM-RoBERTalarge and mDeBERTaV3-base are top performers across all settings. Their strength lies in their ability
to learn universal linguistic patterns. XLM-RoBERTa’s advantage comes from its vast, 100-language
pre-training corpus, making it a powerful baseline for cross-lingual transfer. Meanwhile, models like
InfoXLM and mDeBERTaV3 leverage more advanced pre-training objectives (TLM) or architectural
designs (disentangled attention) to further enhance this generalization capability.</p>
      <sec id="sec-5-1">
        <title>5.1. Inference Time and Resource Constraints</title>
        <p>To assess the practical viability of our approach, particularly in zero-shot settings with potential resource
constraints, we measured the inference time of our top-performing zero-shot models. Inference was
performed on a single NVIDIA Tesla P100 GPU. We measured the total time required to process the
entire test set for each unseen language, and the results are presented in Table 4.</p>
        <p>The analysis highlights a clear trade-of between model performance and computational cost. The
larger model, XLM-RoBERTa-large, while achieving superior F1-scores on Ukrainian and Romanian, is
more than twice as slow as the mDeBERTaV3-base model. Specifically, XLM-RoBERTa-large required
approximately 13-16 ms per sentence, whereas the base-sized mDeBERTaV3-base was significantly
faster at around 6-7 ms per sentence. This disparity is a direct result of model size and complexity;
larger models demand more GPU memory and compute cycles, leading to higher latency. This is a
critical factor for deployment in production systems where low latency is often a firm requirement.
For scenarios with tight resource constraints, a smaller model like mDeBERTaV3-base could be a more
pragmatic choice, despite a potential drop in performance for certain languages.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Error Analysis</title>
      <p>We conducted both quantitative and qualitative error analyses of the models to gain a deeper
understanding of the reasoning behind model behaviour and identify potential avenues for improvement in
each model.</p>
      <sec id="sec-6-1">
        <title>6.1. Quantitative Analysis</title>
        <p>Figure 2 shows an aggregation of confusion matrices for our best models on individual monolingual
sets, the best multilingual model, and the best zero-shot model for each of the previously unseen target
languages. In monolingual settings, models such as German-ELECTRA-large (German) or
RoBERTalarge (English) demonstrated good performance with less biased confusion between Objective (OBJ)
and Subjective (SUBJ) classes. A very modest misclassification bias of SUBJ as OBJ was found in English
RoBERTa-large (25 False Negatives for SUBJ vs 25 False Positives for OBJ). For Arabic, MARBERT-v2 had
more problems identifying the subjective sentences, with 171 SUBJ instances classified as OBJ versus
193 OBJ instances classified as SUBJ. In Italian, XLM-RoBERTa-large also showed a slight asymmetry,
with more SUBJ instances misclassified as OBJ.</p>
        <p>Our top-performing multilingual model, InfoXLM-large, exhibited a clear tendency to make more
frequent mistakes in mixing cases than in separating them (339 False Negatives as SUBJ and 89 as
OBJ), which suggested that the infused information in this model leaned towards separating SUBJ from
OBJ. This implies that if trained on various linguistic inputs, the model could take a more conservative
position, where it defaults to an objective reading for ambiguous sentences.</p>
        <p>Error rates were generally higher for the zero-shot cases. For Greek, mDeBERTaV3-base made more
classification errors involving instances of SUBJ misclassified as OBJ. There was a strong bias to predict
OBJ in Polish under mDeBERTaV3-base (top-left, bottom row) in which 121 of SUBJ sentences were
categorized as OBJ, while there were only 2 errors due to OBJ as SUBJ. Ukrainian and Romanian,
evaluated with XLM-RoBERTa-large, also showed a tendency to misclassify subjective statements as
objective, albeit less pronounced than in Polish.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Qualitative Analysis</title>
        <p>Figure 3 shows prediction examples and their corresponding actual labels. Here are several simple
statements of fact (example 1 - English) and unambiguously opinionated sentences (example 5 - German)
that were accurately discriminated in both groups of settings.</p>
        <p>Mistakes often occur with mild or subtle notes of subjectivity. Consider the zero-shot Greek example
in instance 12, which conveys a subjective hope but was classified as OBJ by mDeBERTaV3-base. While
the true label is also OBJ, this case may reflect annotation noise or highlight the model’s dificulty
in handling such nuanced expressions. It is clear that the model has a strong bias toward the OBJ
in zero-shot Polish and correctly identifies example 14. Short and possibly biased phrases, such as
example 10, were correctly recognized as OBJ by InfoXLM-large in the multilingual condition, likely
due to contextual training. Final sentences that use rhetorical questions or permeate with hidden
subjectivity, as the Ukrainian example 16, which are annotated as OBJ but may have some subjective
weight, demonstrate that even dificult tasks are an issue for state-of-the-art models. These findings also
suggest that future research may want to focus on enhancing models’ ability to utilize subtle linguistic
OBJ
SUBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ
OBJ</p>
        <p>OBJ
SUBJ
SUBJ
SUBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ
OBJ
SUBJ</p>
        <p>OBJ
SUBJ</p>
        <p>SUBJ
OBJ</p>
        <p>OBJ
cues for subjectivity, particularly in cross-lingual and low-resource settings, and addressing model
biases towards majority classes observed in some zero-shot transfers.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this paper, we present our contribution to the CLEF 2025 CheckThat! Lab Task 1: Subjectivity.
We conducted an extensive comparison of monolingual, multilingual, and zero-shot
Transformerbased models. Our results reveal a clear trade-of: while language-specific models (e.g., German
ELECTRA large) excel on monolingual tasks by capturing local linguistic nuances, robust multilingual
models show superior versatility and transferability. The strong performance of XLM-RoBERTa-large
and mDeBERTaV3-base in multilingual and zero-shot settings stems from their underlying design.
We attribute XLM-RoBERTa’s success to its massive 100-language pre-training data, which fosters
generalization to related languages (e.g., Romanian, Ukrainian). Meanwhile, mDeBERTaV3’s advanced
architecture appears to help in transferring more abstract patterns to less related languages (e.g., Greek,
Polish). Error analysis confirmed that challenges with subtle subjectivity and majority-class bias persist,
especially in zero-shot scenarios. Our future work will focus on developing more efective fine-tuning
techniques, addressing data imbalance, and improving cross-lingual generalization. Furthermore, we
plan to explore model optimization techniques, such as knowledge distillation and quantization, to better
balance the trade-of between predictive performance and the computational eficiency highlighted in
our analysis.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was supported by Southeast University, Bangladesh.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly for grammar and spelling checks.
After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.
on subjectivity in news article, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes
of CLEF 2025 - Conference and Labs of the Evaluation Forum, CLEF 2025, Madrid, Spain, 2025.
[7] F. Antici, F. Ruggeri, A. Galassi, K. Korre, A. Muti, A. Bardi, A. Fedotova, A. Barrón-Cedeño, A
corpus for sentence-level subjectivity detection on English news articles, in: N. Calzolari, M.-Y. Kan,
V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceedings of the 2024 Joint International Conference
on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA
and ICCL, Torino, Italia, 2024, pp. 273–285. URL: https://aclanthology.org/2024.lrec-main.25/.
[8] R. Suwaileh, M. Hasanain, F. Hubail, W. Zaghouani, F. Alam, Thatiar: subjectivity detection in
arabic news sentences, arXiv preprint arXiv:2406.05559 (2024).
[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,</p>
      <p>Attention is all you need, Advances in neural information processing systems 30 (2017).
[10] J. Wiebe, R. Bruce, T. P. O’Hara, Development and use of a gold-standard data set for subjectivity
classifications, in: Proceedings of the 37th annual meeting of the Association for Computational
Linguistics, 1999, pp. 246–253.
[11] T. Wilson, P. Hofmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, C. Cardie, E. Rilof,
S. Patwardhan, Opinionfinder: A system for subjectivity analysis, in: Proceedings of HLT/EMNLP
2005 interactive demonstrations, 2005, pp. 34–35.
[12] A. Al Hamoud, A. Hoenig, K. Roy, Sentence subjectivity analysis of a political and ideological
debate dataset using lstm and bilstm with attention and gru models, Journal of King Saud
University-Computer and Information Sciences 34 (2022) 7974–7987.
[13] R. Satapathy, S. R. Pardeshi, E. Cambria, Polarity and subjectivity detection with multitask learning
and bert embedding, Future Internet 14 (2022) 191.
[14] G. Pachov, D. Dimitrov, I. Koychev, P. Nakov, Gpachov at checkthat! 2023: a diverse multi-approach
ensemble for subjectivity detection in news articles, arXiv preprint arXiv:2309.06844 (2023).
[15] S. Gruman, L. Kosseim, Clac at checkthat! 2024: a zero-shot model for check-worthiness and
subjectivity classification, Faggioli et al.[22] (2024).
[16] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
for language understanding, in: Proceedings of the 2019 conference of the North American chapter
of the association for computational linguistics: human language technologies, volume 1 (long
and short papers), 2019, pp. 4171–4186.
[17] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao,
S. Gugger, M. Drame, Q. Lhoest, A. Rush, Transformers: State-of-the-art natural language
processing, in: Q. Liu, D. Schlangen (Eds.), Proceedings of the 2020 Conference on Empirical
Methods in Natural Language Processing: System Demonstrations, Association for
Computational Linguistics, Online, 2020, pp. 38–45. URL: https://aclanthology.org/2020.emnlp-demos.6/.
doi:10.18653/v1/2020.emnlp-demos.6.
[18] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,</p>
      <p>Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
[19] P. He, J. Gao, W. Chen, Debertav3: Improving deberta using electra-style pre-training with
gradient-disentangled embedding sharing, arXiv preprint arXiv:2111.09543 (2021).
[20] W. Antoun, F. Baly, H. Hajj, AraBERT: Transformer-based model for Arabic language
understanding, in: H. Al-Khalifa, W. Magdy, K. Darwish, T. Elsayed, H. Mubarak (Eds.), Proceedings of the 4th
Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Ofensive
Language Detection, European Language Resource Association, Marseille, France, 2020, pp. 9–15.</p>
      <p>URL: https://aclanthology.org/2020.osact-1.2/.
[21] M. Abdul-Mageed, A. Elmadany, E. M. B. Nagoudi, ARBERT &amp; MARBERT: Deep bidirectional
transformers for Arabic, in: Proceedings of the 59th Annual Meeting of the Association for
Computational Linguistics and the 11th International Joint Conference on Natural Language
Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online, 2021, pp.
7088–7105. URL: https://aclanthology.org/2021.acl-long.551. doi:10.18653/v1/2021.acl-long.
551.
[22] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, arXiv
preprint arXiv:1911.02116 (2019).
[23] B. Chan, S. Schweter, T. Möller, German’s next language model, arXiv preprint arXiv:2010.10906
(2020).
[24] R. Scheible, F. Thomczyk, P. Tippmann, V. Jaravine, M. Boeker, Gottbert: a pure german language
model, arXiv preprint arXiv:2012.02110 (2020).
[25] Z. Chi, L. Dong, F. Wei, N. Yang, S. Singhal, W. Wang, X. Song, X.-L. Mao, H. Huang, M. Zhou,
InfoXLM: An information-theoretic framework for cross-lingual language model pre-training, in:
K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell,
T. Chakraborty, Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies, Association
for Computational Linguistics, Online, 2021, pp. 3576–3588. URL: https://aclanthology.org/2021.
naacl-main.280/. doi:10.18653/v1/2021.naacl-main.280.
[26] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Rafel, mt5: A
massively multilingual pre-trained text-to-text transformer, arXiv preprint arXiv:2010.11934
(2020).
[27] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101
(2017).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Blake</surname>
          </string-name>
          , et al.,
          <article-title>News in a digital age: Comparing the presentation of news information over time and across media platforms</article-title>
          ,
          <source>Rand Corporation</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C. L. M.</given-names>
            <surname>Jeronimo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Marinho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Campelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Veloso</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. S.</surname>
          </string-name>
          <article-title>da Costa Melo, Fake news classification based on subjective language</article-title>
          ,
          <source>in: Proceedings of the 21st International Conference on Information Integration and Web-based Applications &amp; Services</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>On the subjectivity and intersubjectivity of language</article-title>
          ,
          <source>Communication and Linguistics Studies</source>
          <volume>6</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <surname>V. V.</surname>
          </string-name>
          ,
          <article-title>The clef-2025 checkthat! lab: Subjectivity, fact-checking, claim normalization, and retrieval</article-title>
          , in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>467</fpage>
          -
          <lpage>478</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schellhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Setty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundriyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Venktesh</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2025 CheckThat! Lab: Subjectivity, fact-checking, claim normalization, and retrieval</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nawrocka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ivasiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Razvan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mihail</surname>
          </string-name>
          , Overview of the CLEF-2025
          <source>CheckThat! lab task 1</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>