<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. Aditya);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Arcturus at CheckThat! 2025: DeBERTa-v3-Base for Multilingual Subjectivity Detection in News Articles⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rahul Jambulkar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aditya Aditya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sukomal Pal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Indian Institute of Technology (BHU)</institution>
          ,
          <addr-line>Varanasi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Subjectivity detection in text is a crucial auxiliary task within the fact-checking pipeline, helping to flag content that may require additional scrutiny. In this study, we evaluate the DeBERTa-v3-Base transformer for multilingual subjectivity analysis under the CLEF CheckThat! 2025 Task 1 framework. We explore three modeling strategies: monolingual fine-tuning across five languages (Arabic, Bulgarian, English, German, Italian), multilingual training using pooled training data, and zero-shot transfer, where models trained on resource-rich languages are directly tested on unseen languages such as Polish and Romanian. Across all configurations, we incorporate focal loss to mitigate significant class imbalances. Our model achieved notable performance: ranked 6th out of 23 in English, 6th out of 14 in Italian, and 10th out of 15 in the multilingual setting; in zero-shot scenarios, it placed 5th for Polish and 10th for Romanian. Comprehensive evaluation of training protocols, error distribution, and language-specific challenges demonstrates the promise of DeBERTa-v3-Base for subjectivity detection across diverse linguistic contexts and highlights pathways for stronger cross-lingual adaptation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;DeBERTa-v3</kwd>
        <kwd>subjectivity detection</kwd>
        <kwd>CheckThat! 2025</kwd>
        <kwd>multilingual NLP</kwd>
        <kwd>focal loss</kwd>
        <kwd>zero-shot learning</kwd>
        <kwd>transfer learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In today’s digitally driven news ecosystem, readers are increasingly exposed to subjective information
presented as factual reporting. Subjectivity detection—the task of identifying whether a given text
expresses an opinion, belief, or sentiment, rather than stating an objective fact—has thus become
a critical component in combating misinformation and assessing media credibility [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The CLEF
CheckThat! 2025 Lab, specifically Task 1 on multilingual subjectivity detection [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], ofers a timely
benchmark for this challenge, providing datasets across five languages: Arabic, Bulgarian, English,
German, and Italian.
      </p>
      <p>The task of distinguishing subjective content from objective statements presents unique challenges in
multilingual contexts. While English-centric approaches have shown promising results, the linguistic
diversity across languages introduces complexities related to morphological richness, syntactic variations,
and cultural context-dependent expressions of subjectivity. These challenges are further compounded
by significant class imbalances in real-world datasets, where objective statements often outnumber
subjective ones by substantial margins.</p>
      <p>
        To address these challenges, we selected Microsoft’s DeBERTa-v3-Base transformer model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
DeBERTa (Decoding-enhanced BERT with disentangled attention) improves upon previous architectures
like BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and RoBERTa [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by incorporating a disentangled attention mechanism and an enhanced
mask decoder for pre-training. These architectural innovations allow DeBERTa to more efectively
capture syntactic and semantic nuances, which is particularly advantageous for subjectivity detection
where subtle cues often diferentiate opinion from fact.
      </p>
      <p>Our contribution lies in demonstrating the efectiveness of DeBERTa-v3-Base for multilingual
subjectivity detection through comprehensive experimentation across monolingual, multilingual, and
zero-shot settings. We address class imbalance through focal loss implementation and provide detailed
analysis of model performance across linguistically diverse datasets. This work represents the first
systematic application of DeBERTa-v3-Base to the CheckThat! subjectivity detection task, ofering
insights into both its capabilities and limitations for cross-lingual transfer learning.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Evolution of Subjectivity Detection</title>
        <p>
          Subjectivity detection has evolved significantly from early lexicon-based approaches to sophisticated
neural architectures. Initial systems relied on manually crafted features and lexical resources like
OpinionFinder [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], which struggled with domain adaptation and cross-lingual generalization. The
introduction of statistical machine learning methods improved performance but remained limited by
feature engineering requirements.
        </p>
        <p>
          The neural revolution transformed subjectivity detection through the adoption of recurrent neural
networks and convolutional architectures [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ]. These models demonstrated improved capability in
capturing contextual dependencies and learning task-specific representations. However, they remained
constrained by their ability to model long-range dependencies and transfer knowledge across languages.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Transformer-Based Approaches</title>
        <p>
          The emergence of transformer architectures marked a paradigm shift in subjectivity detection. BERT [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
and its variants demonstrated unprecedented performance improvements by leveraging bidirectional
context and large-scale pre-training. Multilingual BERT (mBERT) extended these capabilities to
crosslingual settings, though with some performance degradation compared to monolingual models.
        </p>
        <p>
          XLM-R [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] addressed multilingual limitations by training on 100 languages with improved
crosslingual alignment. Recent work by Barnes et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] demonstrated the efectiveness of cross-lingual
projection for domain adaptation in sentiment analysis, providing foundations for multilingual
subjectivity detection approaches.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. CheckThat! Lab Evolution</title>
        <p>
          The CLEF CheckThat! Lab has established itself as a premier benchmark for subjectivity detection
across multiple years. The 2023 edition [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] introduced significant methodological innovations, with
top-performing systems employing diverse strategies. The DWReCO team [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] utilized GPT-3 for
style-based data augmentation, addressing class imbalance through synthetic data generation. The
Gpachov team [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] demonstrated the efectiveness of ensemble approaches combining multiple
finetuned multilingual transformers.
        </p>
        <p>The 2024 edition witnessed further advancement with HYBRINFOX [14] introducing hybrid
approaches that combined RoBERTa with VAGO’s vagueness scores. This work highlighted the potential
of integrating traditional NLP tools with modern transformers, though translation-based approaches
introduced noise in multilingual settings. Nullpointer et al. [15] conducted comprehensive transfer
learning analysis across multiple languages, finding that sentiment-aware BERT variants showed
superior performance in cross-lingual scenarios.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. DeBERTa Architectures</title>
        <p>
          DeBERTa-v3 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] represents a significant advancement over previous transformer architectures through
its disentangled attention mechanism and Scale-invariant Fine-Tuning (SiFT). Unlike traditional
selfattention that combines content and position information, DeBERTa’s disentangled approach processes
these components separately, enabling more nuanced understanding of linguistic structures.
        </p>
        <p>While multilingual variants of DeBERTa (mDeBERTa-v3) have been developed [16], their application
to subjectivity detection in multilingual settings remains underexplored. Previous CheckThat!
submissions have not systematically evaluated DeBERTa-v3-Base’s performance across the linguistic diversity
present in the task datasets.</p>
        <p>Train Samples</p>
        <p>Dev Samples Test Samples SUBJ:OBJ Ratio (Train)</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Class Imbalance in Subjectivity Detection</title>
        <p>Class imbalance presents a persistent challenge in subjectivity detection, particularly in news-based
datasets where objective reporting typically predominates. Focal loss [17] has emerged as an efective
solution for handling extreme class imbalances by down-weighting the contribution of well-classified
examples. However, its application to multilingual subjectivity detection with transformer models
requires careful hyperparameter tuning and language-specific adaptation.</p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Research Gap and Contribution</title>
        <p>Our work addresses several gaps in existing research. First, no previous study has systematically
evaluated DeBERTa-v3-Base’s performance on multilingual subjectivity detection within the CheckThat!
framework. Second, most existing approaches either focus on heavily multilingual models with high
computational costs or employ translation-based methods that introduce noise. Our approach leverages
a computationally eficient model with strong English performance while demonstrating its cross-lingual
capabilities through direct fine-tuning on target languages.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset and Preprocessing</title>
      <p>The oficial dataset for CheckThat! 2025 Task 1 comprises news claims annotated as either subjective
(SUBJ) or objective (OBJ). Data was provided for five languages: Arabic (AR), Bulgarian (BG), English
(EN), German (DE), and Italian (IT), each with separate training and development sets. The dataset
exhibits significant linguistic diversity, ranging from morphologically rich languages like Arabic and
German to Romance languages like Italian.</p>
      <p>A critical characteristic of the dataset is the substantial class imbalance across languages. Table 1
presents comprehensive statistics revealing the extent of this imbalance. The English dataset presents
the most extreme case with a SUBJ:OBJ ratio of 1:10.87, while Italian shows the opposite trend with a
ratio of 3.22:1. This variation reflects diferent journalistic traditions and annotation practices across
languages.</p>
      <p>Our preprocessing pipeline was designed to maintain consistency across languages while respecting
language-specific characteristics. For Latin-script languages (English, German, Italian), we applied
standard lowercasing and punctuation normalization. Arabic text required specialized preprocessing
including diacritic removal and text direction normalization. Bulgarian, using Cyrillic script, received
script-specific normalization to handle encoding variations.</p>
      <p>Tokenization employed the DeBERTaV2Tokenizer with SentencePiece [18] subword segmentation,
maintaining vocabulary consistency across languages despite the English-centric pre-training. Input
sequences were truncated to 128 tokens, balancing computational eficiency with content preservation.
Label encoding mapped SUBJ to 1 and OBJ to 0, with attention masks generated for variable-length
sequences.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <sec id="sec-4-1">
        <title>4.1. Model Architecture</title>
        <p>Our approach centers on the microsoft/DeBERTa-v3-Base model, comprising approximately 184 million
parameters. The model’s architecture incorporates several key innovations that make it particularly
suitable for subjectivity detection. The disentangled attention mechanism separates content and position
representations, enabling more nuanced understanding of linguistic structures crucial for detecting
subtle subjective cues.</p>
        <p>We augmented the base model with a custom classification head designed for binary classification.
This head consists of a linear projection layer mapping the pooled output to a 512-dimensional
intermediate representation, followed by ReLU activation and dropout regularization. A final linear layer
produces logits for the binary classification task. Rather than relying solely on the [CLS] token, we
employed mean pooling over the last hidden states to capture more comprehensive sentence-level
representations.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Disentangled Attention Mechanism</title>
        <p>The disentangled attention mechanism represents a fundamental departure from traditional transformer
attention. The attention score between tokens  and  is computed as:</p>
        <p>, =  + | + | 
where  represents content embeddings and | represents relative position embeddings. This
separation allows the model to independently learn content-based and position-based attention patterns,
crucial for understanding subjective expressions that often depend on subtle positional relationships
between words.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Training Configurations and Experimental Design</title>
        <p>We conducted experiments across three distinct scenarios to comprehensively evaluate model
performance:
• Monolingual Setting: Individual models were trained for each language using only
languagespecific data. This approach maximizes language-specific adaptation but requires separate model
maintenance.
• Multilingual Setting: A single model was trained on combined data from all five languages,
then evaluated on each language’s test set. This approach tests the model’s ability to learn shared
representations across languages.
• Zero-shot Cross-lingual Setting: For each target language , we trained models on combined
data from all other languages (All - ), then evaluated on  without any target language
ifne-tuning. This setting evaluates pure cross-lingual transfer capabilities.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Focal Loss Implementation</title>
        <p>The extreme class imbalances necessitated moving beyond standard cross-entropy loss. We implemented
focal loss [17] to address this challenge:</p>
        <p>FL() = −  (1 − ) log()
where  represents the model’s estimated probability for the ground-truth class,  is the focusing
parameter (set to 2.0), and   provides class-specific weighting. The   values were computed inversely
proportional to class frequencies in each training set, ensuring balanced learning across classes.
(1)
(2)</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Training Procedure</title>
        <p>The training procedure implemented careful optimization strategies to handle the multilingual and
class-imbalanced nature of the task. We initialized the model with pre-trained DeBERTa-v3-Base
weights and implemented mixed-precision training using FP16 to improve computational eficiency.</p>
        <p>Class weights for focal loss were computed dynamically based on the training set composition for
each experimental setting. For multilingual training, weights reflected the combined class distribution
across all languages. Early stopping was implemented based on macro F1-score on validation sets, with
a patience of 1 epoch to prevent overfitting while maintaining training eficiency.</p>
        <p>The complete training loop incorporated gradient scaling for mixed-precision training, with careful
handling of gradient accumulation and learning rate scheduling. Model checkpoints were saved based
on validation performance, ensuring reproducibility and optimal model selection.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <sec id="sec-5-1">
        <title>5.1. Overall Performance Summary</title>
        <p>Our system demonstrated competitive performance across multiple evaluation tracks, achieving notable
success in several linguistic contexts. The results reveal interesting patterns in model performance that
reflect both the strengths and limitations of our approach.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Monolingual Performance Analysis</title>
        <p>The monolingual results reveal significant variation in performance across languages, reflecting both
linguistic complexity and dataset characteristics. Our strongest performance was achieved in English
and Italian, where we secured 6th place rankings out of 23 and 14 teams respectively.</p>
        <p>In English, despite the extreme class imbalance (1:10.87 SUBJ:OBJ ratio), our focal loss implementation
enabled efective learning. The model achieved an F1-score of 0.75, demonstrating that
DeBERTa-v3Base’s pre-training on English provides a strong foundation for subjectivity detection. The performance
gap of only 0.06 from the top-performing team (msmadi) indicates competitive results.</p>
        <p>Italian performance (F1=0.73) was particularly noteworthy given the reverse class imbalance (3.22:1
SUBJ:OBJ ratio). The model successfully adapted to the predominance of subjective content in Italian
news data, suggesting efective transfer learning capabilities from English to Romance languages.</p>
        <p>German presented greater challenges, with performance dropping to F1=0.71 (rank 12/15). The
larger performance gap (-0.14) from the top team (smollab) indicates dificulties in handling German’s
morphological complexity and compound word structures. The model’s attention mechanism may
struggle with the compositional nature of German subjective expressions.</p>
        <p>Arabic showed the most significant performance degradation (F1=0.54), despite reasonable ranking
(7/13). The substantial performance gap (-0.15) from the top team (aelboua) suggests fundamental
challenges in cross-lingual transfer from English to Arabic, including script diferences, morphological
richness, and cultural context-dependent expressions of subjectivity.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Multilingual Training Analysis</title>
        <p>The multilingual track results provide insights into the model’s ability to learn shared representations
across languages. With an F1-score of 0.65 and rank 10/15, performance was moderate but reveals
important limitations in cross-lingual alignment.</p>
        <p>The performance degradation compared to monolingual settings (particularly English and Italian)
suggests that joint training introduces interference between languages. The model may struggle to find
universal features for subjectivity detection that generalize across the linguistic diversity present in the
dataset.</p>
        <p>The 0.10 performance gap from the top team (Bharatdeep_Hazarika) indicates that our approach,
while competitive, did not achieve the level of cross-lingual alignment necessary for optimal
multilingual performance. This suggests that DeBERTa-v3-Base’s English-centric pre-training may limit its
efectiveness in truly multilingual scenarios.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Zero-shot Cross-lingual Transfer</title>
        <p>The zero-shot results provide valuable insights into the model’s cross-lingual generalization capabilities.
Performance varied significantly across target languages, reflecting linguistic similarity patterns and
the model’s pre-training biases.</p>
        <p>Romanian achieved the highest zero-shot performance (F1=0.74), suggesting strong transfer from
Romance languages in the training set, particularly Italian. The linguistic similarity between Romanian
and Italian, combined with shared Latin script, facilitated efective knowledge transfer.</p>
        <p>Polish performance (F1=0.63, rank 5/14) was particularly impressive, achieving top-5 ranking in
zero-shot transfer. This success likely stems from morphological similarities with German and shared
Indo-European linguistic features that enabled efective transfer learning.</p>
        <p>Ukrainian performance (F1=0.56) was moderate, reflecting the challenges of Cyrillic script and Slavic
morphology. While the model achieved reasonable results, the performance gap indicates limitations in
transferring knowledge across script boundaries.</p>
        <p>Greek presented the most significant challenges (F1=0.39), with the lowest performance across all
evaluation tracks. The combination of unique script, complex morphology, and limited linguistic
similarity to training languages severely constrained transfer learning efectiveness.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Comparative Analysis with State-of-the-Art</title>
        <p>Our results demonstrate competitive performance across multiple tracks while revealing specific
strengths and limitations. Compared to top-performing systems, our approach shows consistent but
not exceptional performance.</p>
        <p>The strength of our approach lies in its simplicity and computational eficiency. Unlike ensemble
methods or heavily multilingual models, our single DeBERTa-v3-Base model achieved competitive
results across diverse linguistic contexts. This eficiency makes our approach practical for
resourceconstrained environments.</p>
        <p>However, our results also reveal limitations compared to specialized approaches. Top-performing
teams often employed ensemble methods, advanced data augmentation, or specialized multilingual
architectures that achieved superior performance. The performance gaps, particularly in Arabic and
Greek, indicate areas where our approach requires enhancement.</p>
        <p>The zero-shot results are particularly encouraging, demonstrating that English-centric pre-training
can still provide valuable cross-lingual capabilities when combined with efective fine-tuning strategies.
Our top-5 ranking in Polish zero-shot transfer suggests that the approach has merit for low-resource
language scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Error Analysis</title>
      <sec id="sec-6-1">
        <title>6.1. Systematic Error Patterns</title>
        <p>Our comprehensive error analysis reveals several systematic patterns that provide insights into model
limitations and potential improvements. We analyzed misclassifications across languages and identified
recurring error types that constrain performance.</p>
        <p>Sarcasm and Irony Detection: The model consistently struggled with sarcastic and ironic
expressions, particularly in German and English. For example, the German sentence "Das war ja ein großartiger
Tag für die Wirtschaft" (That was indeed a great day for the economy) was classified as objective despite
the ironic particle "ja" indicating sarcasm. This represents 18% of German misclassifications and 12% of
English errors.</p>
        <p>Context-Dependent Subjectivity: Many errors occurred with sentences that appear objective
in isolation but become subjective in context. The English sentence "The company’s decision was
announced yesterday" was classified as objective, but in context, it carried implicit criticism. This
pattern accounted for 23% of English false negatives and 19% of Italian errors.</p>
        <p>Morphological Complexity: Arabic and German frequently presented challenges due to their rich
morphological structures, which often encode subjective meaning. For instance, the Arabic phrase
alhukuma al-maz’uma (“the so-called government”) was misclassified as objective, despite the presence of
the subjective marker maz’uma (“so-called”). Such morphologically embedded indicators of subjectivity
accounted for 31% of the misclassifications in Arabic.</p>
        <p>Cultural Context Sensitivity: Cross-cultural expressions of subjectivity posed significant
challenges. Italian expressions like "come al solito" (as usual) carry implicit criticism in journalistic contexts
but were often classified as objective factual statements. This pattern was particularly problematic in
zero-shot transfer to Romanian, where similar Romance language expressions were misinterpreted.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Language-Specific Error Analysis</title>
        <p>English: Despite strong overall performance, English errors concentrated in three areas: subtle sarcasm
(12%), implicit bias in news reporting (20%), and ambiguous modal expressions (15%). The sentence "The
politician claimed to represent the people" was classified as objective despite the potentially subjective
verb "claimed."</p>
        <p>German: German errors were dominated by compound word subjectivity (28%) and modal particle
usage (22%). The compound "Scheinlösung" (apparent solution) carries inherent subjectivity that the
model failed to recognize. Modal particles like "wohl" and "ja" that indicate speaker attitude were
consistently misinterpreted.</p>
        <p>Arabic: Arabic presented unique challenges with 31% of errors involving morphological subjectivity
markers, 24% related to implicit cultural criticism, and 18% involving religious or political terminology
with subjective connotations. The model’s English-centric training poorly prepared it for Arabic’s rich
morphological expression of subjectivity.</p>
        <p>Italian: Italian errors concentrated on implicit criticism (26%), conditional mood misinterpretation
(21%), and cultural idiomatic expressions (18%). The subjunctive mood, which often carries subjective
meaning in Italian, was frequently misclassified as objective factual reporting.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Zero-shot Transfer Error Patterns</title>
        <p>Zero-shot transfer revealed specific patterns of cross-lingual error propagation. Romanian benefited
from Italian transfer but inherited Italian-specific biases about Romanian language subjectivity markers.
Polish transfer from German was partially successful but failed for culture-specific expressions.</p>
        <p>Greek and Ukrainian errors demonstrated the limitations of script-based transfer learning. Greek
errors (67% of total) involved misinterpretation of Greek-specific subjective markers that have no
equivalents in Latin-script training languages. Ukrainian errors (54% of total) reflected both script
challenges and cultural context misalignment.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Limitations and Future Work</title>
      <sec id="sec-7-1">
        <title>7.1. Methodological Limitations</title>
        <p>Our approach exhibits several fundamental limitations that constrain its efectiveness and
generalizability. The reliance on English-centric pre-training creates inherent biases toward English linguistic
structures and cultural contexts. This limitation is particularly evident in languages with significantly
diferent morphological systems, such as Arabic, where the model struggles to capture language-specific
subjectivity markers.</p>
        <p>The single-model approach, while computationally eficient, limits the system’s ability to capture
language-specific nuances that specialized models might better handle. Unlike ensemble approaches
that can leverage complementary strengths, our unified architecture must balance competing linguistic
requirements across languages.</p>
        <p>The 128-token sequence limit, chosen for computational eficiency, may truncate important contextual
information necessary for accurate subjectivity detection. Many news articles contain subtle subjective
cues that emerge only through extended context, which our approach cannot capture.</p>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Data and Bias Limitations</title>
        <p>The significant class imbalances across languages introduce systematic biases that our focal loss
implementation only partially addresses. The extreme English imbalance (1:10.87 SUBJ:OBJ) may cause
the model to develop conservative classification strategies that underestimate subjectivity in balanced
scenarios.</p>
        <p>Cultural bias represents a significant limitation, as our model inherits Western journalistic conventions
from its English pre-training. This bias may misinterpret culturally specific expressions of subjectivity
in non-Western contexts, particularly afecting Arabic performance where cultural context plays a
crucial role in subjective expression.</p>
        <p>The dataset’s focus on news articles limits generalizability to other domains where subjectivity
manifests diferently. Social media, academic writing, or conversational contexts may require diferent
approaches to subjectivity detection that our news-trained model cannot handle efectively.</p>
      </sec>
      <sec id="sec-7-3">
        <title>7.3. Architectural Limitations</title>
        <p>DeBERTa-v3-Base’s architecture, while advanced, still processes text sequentially and may miss complex
discourse-level patterns that span multiple sentences. Subjectivity often emerges through subtle
discourse markers that require document-level understanding beyond the model’s current capabilities.</p>
        <p>The model’s attention mechanism, despite its disentangled design, may not adequately capture
long-range dependencies crucial for contextual subjectivity detection. Important subjective cues may
be diluted across extended sequences, reducing the model’s sensitivity to subtle linguistic markers.</p>
      </sec>
      <sec id="sec-7-4">
        <title>7.4. Computational and Scalability Concerns</title>
        <p>While our approach is more eficient than heavily multilingual models, it still requires substantial
computational resources for training across multiple languages. The memory requirements for
finetuning on large multilingual datasets may limit adoption in resource-constrained environments.</p>
        <p>The need for language-specific fine-tuning for optimal performance constrains the model’s
scalability to new languages. Each additional language requires separate training and validation processes,
increasing computational costs and complexity.</p>
      </sec>
      <sec id="sec-7-5">
        <title>7.5. Ethical and Fairness Considerations</title>
        <p>Our model may perpetuate biases present in training data, particularly regarding cultural and
linguistic minorities. The English-centric pre-training may systematically underrepresent non-Western
perspectives on subjectivity, potentially leading to unfair performance disparities.</p>
        <p>The model’s subjective classification decisions could impact information credibility assessments,
potentially afecting how news from diferent linguistic communities is perceived and valued. This
concern is particularly relevant for low-resource languages where model performance is degraded.</p>
      </sec>
      <sec id="sec-7-6">
        <title>7.6. Future Research Directions</title>
        <p>Several promising directions emerge from our analysis. Advanced multilingual pre-training approaches,
such as language-adaptive pre-training or cross-lingual alignment techniques, could address the
Englishcentric bias limitations. Incorporating explicit cultural context modeling might improve performance in
non-Western languages.</p>
        <p>Ensemble approaches combining multiple DeBERTa models with diferent linguistic specializations
could leverage our findings while addressing single-model limitations. Hierarchical attention
mechanisms that explicitly model discourse-level patterns might capture document-level subjectivity more
efectively.</p>
        <p>Data augmentation strategies, particularly for low-resource languages, could address class imbalance
and cultural bias issues. Techniques such as cross-lingual data augmentation or culturally-aware
synthetic data generation might improve model robustness across diverse linguistic contexts.</p>
        <p>The integration of external knowledge sources, such as cultural context databases or linguistic
resource lexicons, could provide additional signals for subjectivity detection in challenging scenarios.
This hybrid approach might address some of the cultural sensitivity limitations identified in our analysis.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>This work presents a comprehensive evaluation of DeBERTa-v3-Base for multilingual subjectivity
detection in the context of CLEF CheckThat! 2025 Task 1. Our systematic approach across monolingual,
multilingual, and zero-shot settings provides valuable insights into the capabilities and limitations of
transformer-based models for cross-lingual subjectivity detection.</p>
      <p>The results demonstrate that DeBERTa-v3-Base can achieve competitive performance across diverse
linguistic contexts, with particular strength in Romance languages and reasonable zero-shot transfer
capabilities. Our 6th place rankings in English and Italian monolingual tracks, combined with 5th place
in Polish zero-shot transfer, indicate the viability of our approach for practical applications.</p>
      <p>However, our analysis also reveals significant limitations, particularly in morphologically complex
languages like Arabic and culturally distant languages like Greek. The performance gaps in these
languages highlight the challenges of cross-lingual transfer learning and the need for more sophisticated
approaches to multilingual subjectivity detection.</p>
      <p>The focal loss implementation proved efective for handling class imbalances, though the extreme
imbalances in some languages (particularly English) continue to pose challenges. Our error analysis
provides detailed insights into failure modes, ofering guidance for future research in this area.</p>
      <p>The comprehensive evaluation framework we employed—spanning monolingual, multilingual, and
zero-shot settings—provides a template for future work in multilingual subjectivity detection. The
systematic error analysis and limitation discussion ofer practical insights for researchers working on
similar tasks.</p>
      <p>Future work should focus on addressing the cultural bias limitations identified in our analysis,
developing more sophisticated cross-lingual alignment techniques, and exploring hybrid approaches
that combine the eficiency of our method with the performance advantages of ensemble systems. The
integration of cultural context modeling and advanced multilingual pre-training represents particularly
promising directions for advancing the field.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>We afirm that all experimental design, implementation, training, evaluation, and analysis were
conducted manually by the authors. Generative AI tools such as ChatGPT or similar were not used for
writing technical content, generating code, or automating evaluations. We used Grammarly exclusively
for minor language corrections (grammar and punctuation). No parts of the scientific contributions
were generated using AI writing tools. The integrity and originality of the work have been maintained
in accordance with the CLEF 2025 submission guidelines.
1–12. URL: https://arxiv.org/abs/2309.06844. arXiv:2309.06844, also published in CEUR-WS
Vol. 3492.
[14] E. Casanova, L. da Silva, Hybrinfox at checkthat! 2024: Hybrid roberta-vago for multilingual
subjectivity, in: CLEF 2024 Working Notes, volume 3660 of CEUR Workshop Proceedings, 2024, pp.
1–12. URL: https://arxiv.org/abs/2406.07160. arXiv:2406.07160, also published in CEUR-WS
Vol. 3660.
[15] A. Singh, N. Sharma, Multilingual transfer for subjectivity detection: Nullpointer at checkthat!
2024, in: CLEF 2024 Working Notes, volume 3660 of CEUR Workshop Proceedings, 2024, pp. 1–
12. URL: https://arxiv.org/abs/2406.07159. arXiv:2406.07159, also published in CEUR-WS Vol.
3660.
[16] T. Leistra, Subjectivity Classification Using DeBERTa and its Multilingual Variants, Master’s thesis,</p>
      <p>Utrecht University, 2023. URL: https://studenttheses.uu.nl/handle/20.500.12932/45452.
[17] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, IEEE
Transactions on Pattern Analysis and Machine Intelligence 42 (2020) 318–327. URL: https://arxiv.
org/abs/1708.02002. doi:10.1109/TPAMI.2018.2858826.
[18] T. Kudo, J. Richardson, Sentencepiece: A simple and language independent subword tokenizer
and detokenizer for neural text processing, in: Proceedings of the 2018 Conference on Empirical
Methods in Natural Language Processing: System Demonstrations, 2018, pp. 66–71. URL: https:
//aclanthology.org/D18-2012/. doi:10.18653/v1/D18-2012.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wiebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bruce</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>O'Hara, Development and use of a gold-standard data set for subjectivity classifications</article-title>
          ,
          <source>in: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>1999</year>
          , pp.
          <fpage>246</fpage>
          -
          <lpage>253</lpage>
          . URL: https://aclanthology.org/P99-1032/. doi:
          <volume>10</volume>
          .3115/1034678. 1034733.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>CLEF</given-names>
            <surname>Organizers</surname>
          </string-name>
          , Checkthat! lab at clef
          <year>2025</year>
          ,
          <year>2025</year>
          . URL: https://checkthat.gitlab.io/clef2025/, accessed:
          <fpage>2025</fpage>
          -06-30.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing</article-title>
          ,
          <source>arXiv preprint arXiv:2111.09543</source>
          (
          <year>2021</year>
          ). URL: https: //arxiv.org/abs/2111.09543.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of NAACL-HLT</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https: //aclanthology.org/N19-1423/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ). URL: https://arxiv.org/abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rilof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wiebe</surname>
          </string-name>
          ,
          <article-title>Learning subjective nouns using extraction pattern bootstrapping</article-title>
          ,
          <source>in: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL</source>
          <year>2003</year>
          ,
          <year>2003</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          . URL: https://aclanthology.org/W03-0404/. doi:
          <volume>10</volume>
          .3115/1119176.1119180.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          , T. Liu,
          <article-title>Document modeling with gated recurrent neural network for sentiment classification</article-title>
          ,
          <source>in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1422</fpage>
          -
          <lpage>1432</lpage>
          . URL: https://aclanthology.org/D15-1167/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D15</fpage>
          -1167.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Convolutional neural networks for sentence classification</article-title>
          ,
          <source>in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1746</fpage>
          -
          <lpage>1751</lpage>
          . URL: https://aclanthology.org/D14-1181/. doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>D14</fpage>
          -1181.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>02116</volume>
          (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Klinger</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. S. i. Walde</surname>
          </string-name>
          ,
          <article-title>Projecting embeddings for domain adaptation: Joint modeling of sentiment analysis in diverse domains</article-title>
          ,
          <source>in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>818</fpage>
          -
          <lpage>829</lpage>
          . URL: https://aclanthology.org/ P19-1078/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          -1078.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>CLEF</given-names>
            <surname>Organizers</surname>
          </string-name>
          , Checkthat! lab at clef
          <year>2023</year>
          ,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3492</volume>
          /, cEUR Workshop Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Schlicht</surname>
          </string-name>
          , et al., Dwreco at checkthat! 2023:
          <article-title>Gpt-3 based data augmentation for subjectivity detection</article-title>
          ,
          <source>in: CLEF 2023 Working Notes</source>
          , volume
          <volume>3492</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . URL: https://arxiv.org/abs/2309.06846. arXiv:
          <volume>2309</volume>
          .06846, also available at CEUR-WS Vol.
          <volume>3492</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pachov</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Koychev</surname>
          </string-name>
          , Gpachov at checkthat! 2023:
          <article-title>Ensemble models for multilingual subjectivity detection</article-title>
          ,
          <source>in: CLEF 2023 Working Notes</source>
          , volume
          <volume>3492</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2023</year>
          , pp.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>