<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TF-IDF to Transformers: Benchmarking Classical and XLNet Approaches for Hope Speech Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Usha Raman Adapa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sheshi Sree Sama</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Texas Tech University</institution>
          ,
          <addr-line>Lubbock, TX</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Detecting hope in social media text is a vital task for understanding emotional resilience, supporting mental health monitoring, and fostering constructive online discourse. This paper explores both classical machine learning and transformer-based approaches (XLNet) for the automatic detection of hope or lack of hope in social media texts, including both binary and multiclass classification tasks. All experiments were conducted on English language data. Subtask 1 addresses binary classification (hope vs. not hope), while Subtask 2 focuses on fine-grained multiclass categorization (generalized hope, realistic hope, unrealistic hope, not hope, and sarcasm). We initially experimented with traditional models such as Logistic Regression, SVM, and XGBoost using n-gram enriched TF-IDF features. These models demonstrate competitive performance, highlighting their suitability for lightweight and resource-constrained scenarios. To further improve classification accuracy, we fine-tuned a pre-trained XLNet transformer using supervised training with class imbalance handling. The XLNet-based model achieves higher F1 scores in most classes and demonstrates superior generalization in the development set, reinforcing the value of transformer-based architectures in nuanced emotion classification tasks. Our findings ofer a robust benchmark for hope speech classification and highlight trade-ofs between eficiency and expressiveness in real-world NLP tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hope Speech Detection</kwd>
        <kwd>Emotion Classification</kwd>
        <kwd>TF-IDF</kwd>
        <kwd>XLNet</kwd>
        <kwd>Transformer Models</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Sarcasm Detection</kwd>
        <kwd>Multiclass Classification</kwd>
        <kwd>Social Media Text Mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Hope is a vital psychological resource that fosters resilience, emotional well-being, and
motivation. In recent years, the detection of hope speech in social media has emerged as a key task
in the broader field of computational afective analysis, with applications ranging from digital
mental health to social inclusion. Unlike traditional sentiment categories like joy or sadness,
hope often appears in nuanced, implicit, or even sarcastic expressions, making its automatic
identification especially challenging.</p>
      <p>
        Initial eforts to define and detect hope speech began with the HopeEDI project, which
proposed a multilingual dataset focused on Equality, Diversity, and Inclusion (EDI) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This
was followed by a series of shared tasks at IberLEF from 2021 to 2025, each expanding the
conceptual and technical scope of the task. For instance, the IberLEF 2023 HOPE task explored
multilingual hope detection across several languages [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], while IberLEF 2024 emphasized both
optimistic discourse and expected or sarcastic expressions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Hope is a cognitive and emotional state characterized by the anticipation of positive outcomes,
even amidst uncertainty. In the context of social media discourse, it may be explicitly expressed
through motivational language or implicitly conveyed through expectations, desires, or even
sarcastic comments. Following the definition adopted in the PolyHope shared task [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], hope is
defined a spectrum of expressions including: Generalized Hope - broad expressions of positivity;
Realistic Hope - goal oriented and feasible expectations; Unrealistic Hope - improbable or
wishful sentiments; Not Hope - neutral or irrelevant expressions and Sarcasm - hope like
expressions intended ironically. This granularity reflects a growing recognition of the complex
ways hope is expressed and perceived in digital communication [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Prior work in this space
has shown that such categories are not only linguistically diverse but also culturally situated
and context-dependent [5].
      </p>
      <p>To address these challenges, a variety of modeling approaches have been explored—from
traditional machine learning with lexical features to transformer-based deep learning models [6].
This paper builds on these advancements, evaluating classical and transformer-based methods
(specifically XLNet) for both binary and multiclass hope speech detection in English-language
data.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>
        The computational modeling of hope speech has evolved significantly over the past few years.
One of the earliest large-scale initiatives in this space was HopeEDI, which introduced a
multilingual dataset aimed at promoting equality, diversity, and inclusion through automated
detection of hopeful language [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This work laid the foundation for further shared tasks that
expanded the focus from binary hope detection to fine-grained, multiclass categorization.
      </p>
      <p>
        The IberLEF 2023 shared task on multilingual hope detection explored variations across
languages and cultural contexts [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], while the 2024 iteration examined hope in both its motivational
and sarcastic dimensions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This continued focus on evolving challenges within the field is
also reflected in the organization of subsequent shared tasks, such as the upcoming IberLEF
2025 [7]. These studies emphasized the semantic complexity of hope-related discourse and the
importance of contextual modeling.
      </p>
      <p>
        Hope speech detection has progressed from binary classification to a nuanced, multiclass
paradigm informed by theoretical and empirical studies. The baseline system proposed by
Butt et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] defines hope across five distinct categories—generalized, realistic, unrealistic, not
hope, and sarcasm—based on psychological plausibility and linguistic framing. This definition
is grounded in afective computing research and reflects real-world complexities in emotional
communication. The original PolyHope dataset incorporated this classification framework,
annotating thousands of social media posts across English and Spanish using expert-curated
guidelines [8]. This provided a robust benchmark for evaluating a wide range of models, from
traditional machine learning to deep neural networks. Such distinctions were operationalized
through expert-labeled datasets and comprehensive annotation guidelines. This multi-class
framing was inspired by prior work such as PolyHope’s two-level detection system, which
separated coarse hope detection from nuanced subcategory classification [9].
      </p>
      <p>Transformer-based models have shown strong performance in this domain. For instance,
Sidorov et al. [6] analyzed the eficacy of BERT and XLNet on datasets involving both regret and
hope speech, demonstrating that transformers significantly outperform traditional classifiers in
handling figurative and subtle sentiment. Moreover, these models benefit from pretraining on
large corpora, allowing them to capture pragmatic and cultural cues often missed by
featurebased systems.</p>
      <p>Recent developments have also explored hope speech in specific social contexts. García-Baena
et al. [5] focused on the LGTB community, illustrating how hope manifests uniquely within
marginalized groups. This work, along with multilingual and inclusive modeling initiatives like
those described in Chakravarthi et al. [10], highlights the importance of culturally sensitive
hope speech detection systems.</p>
      <p>In summary, the current literature underscores two core needs: (1) robust classification
models that can handle nuanced categories of hope, and (2) comprehensive, annotated datasets
that reflect cultural, emotional, and linguistic diversity.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Our methodology consists of five primary stages: data collection and preprocessing, feature
extraction, model selection and training, evaluation, and prediction and output generation. We
implemented two approaches—traditional machine learning classifiers and transformer-based
deep learning models (XLNet)—to classify text into binary and multiclass categories of hope.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Collection and Preprocessing</title>
        <p>The dataset used in this study comprises user-generated content from social media platforms.
These texts often contain informal language, emojis, hashtags, mentions, and URLs, necessitating
a robust preprocessing pipeline.</p>
        <p>Text normalization included converting emojis into textual representations using
emoji.demojize(), removing URLs, user mentions, and hashtags with regular expressions,
eliminating non-alphabetic characters and punctuation, and lowercasing all text for consistency.
Stopword removal was conducted using the NLTK library to reduce noise.</p>
        <p>For the transformer-based model, we used the XLNet tokenizer, which segments inputs into
context-aware subword tokens, preserving semantic integrity even for rare or misspelled words.
Unlike traditional pipelines, stopword removal was selectively retained for transformers to limit
clutter from non-informative tokens.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Feature Extraction</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Traditional Models</title>
          <p>Text was vectorized using Term Frequency–Inverse Document Frequency (TF-IDF), a widely
used statistical measure that reflects word importance by down-weighting frequent terms across
documents. TF-IDF is calculated as:</p>
          <p>TF-IDF(, ) = TF(, ) × log
︂(</p>
          <p>)︂
DF()
,
where TF(, ) is the term frequency of term  in document , DF() is the number of
documents containing term , and  is the total number of documents.
(1)
(2)</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. XLNet Model</title>
          <p>For transformer-based models, feature extraction is handled internally. Tokenized inputs from
XLNetTokenizerFast were passed to the XLNet model, which generates dense contextualized
embeddings through permutation-based autoregressive pretraining. These embeddings encode
rich semantic information and were fine-tuned for the classification task.</p>
          <p>Label encoding was applied for both binary and multiclass targets to convert categorical
labels into numerical format. Separate encoders were used to ensure alignment across training
and prediction phases.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model Selection and Training</title>
        <p>We implemented both classical machine learning models and transformer-based deep learning
to evaluate performance across binary (Hope vs. Not Hope) and multiclass (Generalized Hope,
Realistic Hope, Unrealistic Hope, Not Hope, and Sarcasm) settings. Implementations were done
in Python using scikit-learn, XGBoost, and Hugging Face Transformers.
3.3.1. Traditional Machine Learning Models
• Logistic Regression (LR): A linear classifier that models the probability of a class using
the logistic function:
 ( = 1|) =</p>
        <p>1
1 + − w x ,
where w are the model weights and x the input features. LR is eficient and interpretable.
• Random Forest (RF): An ensemble of decision trees using bagging and majority voting.</p>
        <p>Each tree is trained on a bootstrap sample, and splits are chosen to maximize information
gain or Gini index. RF is robust to overfitting and noise.
• XGBoost: An optimized gradient boosting technique that sequentially builds trees by
minimizing a regularized loss function:
ℒ() = ∑︁ (ˆ, ) + ∑︁ Ω( ),</p>
        <p>where Ω( ) =  + 21  ‖‖2 is a regularization term. XGBoost is known for speed and
high predictive accuracy.
• Support Vector Machine (SVM): Finds the optimal hyperplane maximizing the margin
between classes. Given training vectors  and labels , the optimization is:
1
min ‖w‖2
w, 2
s.t. (w  + ) ≥ 1.</p>
        <p>(3)
(4)
(5)</p>
        <p>Linear SVMs perform well in high-dimensional sparse feature spaces like TF-IDF.
• Naive Bayes (NB): A probabilistic model using Bayes’ theorem assuming feature
independence:

 (|1, ..., ) ∝  () ∏︁  (|).</p>
        <p>=1</p>
        <p>Multinomial NB is eficient and efective for text classification with sparse features.</p>
        <p>Hyperparameter tuning was conducted via stratified k-fold cross-validation. Grid search was
applied for SVM and LR, and performance was evaluated based on weighted F1-score.</p>
        <sec id="sec-3-3-1">
          <title>3.3.2. Transformer-Based XLNet Models</title>
          <p>We fine-tuned two models using the Hugging Face Transformers library:
• Binary XLNet Model: Implemented with XLNetForSequenceClassification using
a sigmoid output. Loss was computed using BCEWithLogitsLoss and class weighting.
• Multiclass XLNet Model: Configured with a softmax activation and</p>
          <p>CrossEntropyLoss for five-class classification.</p>
          <p>The XLNet architecture captures bidirectional dependencies by learning permutations of
input tokens without masking, ofering advantages over BERT for long or reordered sequences
[? ]. Both models were trained with the Adam optimizer, a linear learning rate scheduler, and
dropout to prevent overfitting.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Evaluation Metrics</title>
        <p>To evaluate model performance, we employed a comprehensive set of metrics that assess
accuracy, class-wise discrimination, and balance:
• Accuracy (Acc):</p>
        <p>+  
Accuracy = , (6)</p>
        <p>+   +   +  
where   is true positives,   is true negatives,   is false positives, and   is false
negatives. This metric reflects overall correctness but can be misleading for imbalanced
datasets.</p>
        <p>Precision × Recall
Precision = , Recall = , F1-Score = 2 × .
  +     +   Precision + Recall
(7)
Precision measures exactness, recall measures completeness, and F1-score balances the
two.
• Macro-Averaged Metrics: Calculated by averaging precision, recall, and F1-score across
all classes equally, regardless of their frequency.
• Weighted-Averaged Metrics: Calculated by taking into account the proportion of each
class in the dataset. More informative for imbalanced data.
• Confusion Matrix: Provides a tabular visualization of predictions vs. actual labels,
helping diagnose class-specific errors.
• ROC Curve and AUC (Area Under Curve): Useful for binary classification tasks to
assess trade-of between true positive rate and false positive rate.</p>
        <p>The weighted F1-score was the primary metric for model selection, due to its robustness to
class imbalance and balanced emphasis on both precision and recall.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Prediction and Output Generation</title>
        <p>The best models from each approach were used to generate predictions on the test set. Outputs
were stored as CSV files for both binary and multiclass classifications, enabling downstream
tasks like visualization, interpretability analysis, and system integration.</p>
        <p>This hybrid methodology enables a balance between explainability and contextual depth,
leveraging classical models for eficiency and XLNet for semantic richness.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>This section presents an in-depth evaluation of classical TF-IDF-based models and
transformerbased XLNet models for both binary (Hope vs. Not Hope) and multiclass (Generalized Hope,
Realistic Hope, Unrealistic Hope, Not Hope, Sarcasm) hope speech classification. Results are
presented across training, development (dev), and test sets, with a focus on generalization,
per-class performance, and model robustness.</p>
      <sec id="sec-4-1">
        <title>4.1. Binary Classification Results</title>
        <sec id="sec-4-1-1">
          <title>4.1.1. Classical Models (TF-IDF)</title>
          <p>All classical models demonstrated excellent training set performance (F1-scores &gt; 0.92). This
gap between train and dev performance is a classic symptom of overfitting in sparse feature
spaces such as TF-IDF, especially when dealing with semantically nuanced text where surface
features alone are insuficient.XGBoost and Random Forest emerged as the most stable, with
dev F1-scores around 0.78. SVM ofered a balanced performance with minimal variance, while
Logistic Regression and Naive Bayes performed slightly lower.</p>
          <p>The confusion matrix for SVM (the top-performing classical model) indicated a strong ability
to distinguish between Hope and Not Hope, with moderate false positives and negatives. Top
predictive features included “hoping”, “hopeful”, “wish”, and “believe”, emphasizing lexical
indicators of optimism. However, reliance on these terms occasionally led to false positives
when hope-related words were used in sarcastic or negated contexts (e.g., “hoping this turns
out to be another disaster”). ROC-AUC score for SVM was 0.86, showing solid discriminative
capacity.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Transformer-Based XLNet Model</title>
          <p>XLNet outperformed all classical models, achieving near-perfect training scores (F1 0˜.994),
and generalizing better on the dev set (F1 0˜.84). The loss curve illustrated stable learning.
Classical models showed strong training scores but lacked generalization. This performance gap
reduction suggests that XLNet’s permutation-based attention mechanism captures the contextual
nuances missed by linear models. In contrast to classical models’ reliance on isolated n-grams,
XLNet efectively disambiguated complex constructs like sarcasm and double negatives through
contextual embeddings. Interestingly, even XLNet struggled with samples where the expression
of hope was highly metaphorical or embedded in indirect sentiment, which underscores the
need for models trained on richer linguistic constructs or augmented with external knowledge.
Confusion matrix analysis revealed strong balance between true positives and true negatives,
with relatively fewer misclassifications.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Multiclass Classification Results</title>
        <sec id="sec-4-2-1">
          <title>4.2.1. Classical Models (TF-IDF)</title>
          <p>Among classical models, XGBoost led with a dev set accuracy of 0.69 and weighted F1-score
of 0.67. Random Forest followed closely, while Logistic Regression and SVM were moderate
(accuracy 0˜.64). Naive Bayes was the weakest, especially for sparse and rare classes.</p>
          <p>F1-score breakdown indicated “Not Hope” was the easiest to classify (F1 0˜.9), followed by
“Realistic Hope” (F1 0˜.77). “Generalized Hope” was moderately captured, while “Sarcasm” and
“Unrealistic Hope” were the most challenging (F1 &lt; 0.5), due to semantic overlap and class
imbalance. This performance disparity is primarily attributed to two factors: (1) class imbalance,
which led the model to under-prioritize underrepresented classes during training; and (2)
semantic overlap between hope categories, especially when surface features (like “hope” or
“wish”) appeared in both genuine and sarcastic texts.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Transformer-Based XLNet Model</title>
          <p>The multiclass XLNet model exhibited a smoother training curve and achieved superior
generalization. Despite close-class confusion (e.g., Generalized Hope vs. Realistic Hope), the model
captured minority classes (e.g., Sarcasm) more accurately than classical models. XLNet
significantly outperformed all classical models on multiclass classification. It maintained a better
class-wise balance, especially for underrepresented labels.These results highlight that
transformer models, when properly fine-tuned and class-weighted, can internalize subtle semantic
and pragmatic cues that would otherwise be lost in feature-engineered models. Future work
could explore label smoothing or focal loss to further reduce confusion between close categories.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Result Error Analysis</title>
      <p>While both classical and transformer-based models yielded promising results in binary and
multiclass hope speech classification, several sources of misclassification and model limitations
were observed. This section analyzes the root causes of these errors and outlines targeted
strategies for improvement.</p>
      <sec id="sec-5-1">
        <title>5.1. Dificulties in Capturing Sarcasm and Unrealistic Hope</title>
        <p>The most significant challenge was accurately distinguishing Sarcasm and Unrealistic Hope,
particularly in the multiclass setting. Sarcasm, by nature, depends on tone, context, and
sometimes contradiction between literal words and intended meaning—elements often absent
from textual input alone. Even advanced models like XLNet struggled with sarcastic expressions
that contained hope-related words used ironically, such as “I just love hoping for miracles that
never happen.”</p>
        <p>The following examples are not from the actual test data set but are created and tested to
illustrate representative cases of misclassification.</p>
        <p>Example 1: “Just waiting for a flying unicorn to fix everything. Fingers crossed.”
Predicted: Generalized Hope Actual: Sarcasm
Reason: The presence of hopeful phrases misled the model, but sarcasm was conveyed through
absurdity and emoji tone, which the model misinterpreted as genuine optimism.</p>
        <p>Example 2: “I know my lottery ticket will solve my student loans. Manifesting hard.”
Predicted: Realistic Hope Actual: Unrealistic Hope
Reason: The model classified this as realistic due to the serious context (loans), but failed to
infer the low plausibility and sarcasm embedded in "manifesting" and emoji use.</p>
        <p>Similarly, Unrealistic Hope was frequently confused with Generalized Hope or Not Hope.
This is because recognizing the implausibility of a hopeful statement (e.g., “I hope unicorns
ifx climate change”) often requires world knowledge or commonsense reasoning—areas where
traditional classifiers and even transformers with limited pretraining context fall short.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Efect of Class Imbalance</title>
        <p>Another key source of error was the dataset’s inherent class imbalance. Classes such as Sarcasm
and Unrealistic Hope had fewer training samples compared to dominant classes like Not Hope
and Realistic Hope. This led to skewed model learning, favoring frequent labels and reducing
recall for minority categories. Although class-weighted loss functions (e.g.,
CrossEntropyLoss(weight=...)) were used, they only partially mitigated this imbalance.</p>
        <p>Example 3: “I hope scientists find a cure for being broke.”
Predicted: Generalized Hope Actual: Sarcasm
Reason: The model likely favored Generalized Hope due to class frequency and the presence of
the phrase "I hope", despite the humorous exaggeration signaling sarcasm.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Overfitting in Classical Models</title>
        <p>Classical models like XGBoost and Random Forest exhibited signs of overfitting. High F1-scores
on training data contrasted with lower development and test set performance, indicating that
these models may have memorized patterns rather than generalized them. This is typical in
high-dimensional feature spaces produced by TF-IDF vectorization, especially when using
shallow models on limited data.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Generalization Strength of XLNet</title>
        <p>In contrast, XLNet demonstrated better generalization across both tasks, maintaining a more
consistent gap between training and development scores. Its strength lies in its
permutationbased training objective and bidirectional context modeling, which enables it to capture subtle
dependencies missed by classical models. However, it still faced dificulty with low-resource
classes and irony due to the limited signal in the data itself.</p>
        <p>In summary, while XLNet significantly outperformed traditional models in nuanced hope
classification, it remains sensitive to dataset limitations, particularly class imbalance and subtle
semantic phenomena like sarcasm. Addressing these limitations through data augmentation,
specialized models, and smarter loss functions will be critical to further advancing hope speech
detection.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This study presented a comparative analysis of classical machine learning models and XLNet
for binary and multiclass hope speech detection. While traditional models like XGBoost ofered
strong baseline performance, they struggled with class imbalance and subtle categories such as
Sarcasm and Unrealistic Hope.</p>
      <p>XLNet consistently outperformed classical models across both tasks, demonstrating
better generalization and sensitivity to nuanced expressions. However, challenges like sarcasm
detection and underrepresented classes persist.</p>
      <p>Future work will explore advanced transformers, context integration, and augmentation
strategies to address these limitations. Overall, combining robust language models with balanced
datasets and targeted optimization proves key to accurate and reliable hope speech classification.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This research was conducted as part of the HopeEDI shared task at IberLEF 2025. The authors
would like to thank the organizers: Sabur Butt (Institute for the Future of Education, Tecnológico
de Monterrey, Mexico), Fazlourrahman Balouchzahi (Independent Researcher, Mexico), Maaz
Amjad (Texas Tech University, USA), Salud María Jiménez-Zafra (SINAI, Universidad de Jaén,
Spain), Hector G. Ceballos (IFE, Tecnológico de Monterrey, Mexico), and Grigori Sidorov (CIC,
Instituto Politécnico Nacional, Mexico), for providing valuable data and guidance throughout
the competition.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used OpenAI’s GPT-4 to assist with grammar
and spelling checks, as well as for refining the structure and clarity of technical content. We
reviewed and edited all outputs carefully and take full responsibility for the final content of this
publication.
[5] D. García-Baena, M. García-Cumbreras, S. M. Jiménez-Zafra, J. A. García-Díaz, R.
ValenciaGarcía, Hope speech detection in spanish: The lgtb case, Language Resources and
Evaluation (2023) 1–31.
[6] G. Sidorov, F. Balouchzahi, S. Butt, A. Gelbukh, Regret and hope on transformers: An
analysis of transformers on regret and hope speech detection datasets, Applied Sciences
13 (2023) 3983.
[7] J. Á. González-Barba, L. Chiruzzo, S. M. Jiménez-Zafra, Overview of IberLEF 2025: Natural
Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings
of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st
Conference of the Spanish Society for Natural Language Processing (SEPLN 2025),
CEURWS. org, 2025.
[8] S. Butt, F. Balouchzahi, M. Amjad, S. M. Jiménez-Zafra, H. G. Ceballos, G. Sidorov, Overview
of polyhope at iberlef 2025: Optimism, expectation or sarcasm?, Procesamiento del
Lenguaje Natural (2025).
[9] F. Balouchzahi, G. Sidorov, A. Gelbukh, Polyhope: Two-level hope speech detection from
tweets, Expert Systems with Applications 225 (2023) 120078.
[10] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, S. Cn, J. P. McCrae, M. García,
S. M. Jiménez-Zafra, R. Valencia-García, P. Kumaresan, R. Ponnusamy, D. García-Baena,
J. García-Díaz, Overview of the shared task on hope speech detection for equality, diversity,
and inclusion, in: Proceedings of the Second Workshop on Language Technology for
Equality, Diversity and Inclusion organized as part of ACL 2022, 2022, pp. 378–388. https:
//doi.org/10.18653/v1/2022.ltedi-1.58.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Hopeedi: A multilingual hope speech detection dataset for equality, diversity, and inclusion</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Computational Modeling of People's Opinions</source>
          , Personality, and
          <article-title>Emotion's in Social Media, Association for Computational Linguistics</article-title>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>53</lpage>
          . https://aclanthology.org/
          <year>2020</year>
          .peoples-
          <volume>1</volume>
          .5.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Garcia-Cumbreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>García-Baena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Garcia-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Ureña-López</surname>
          </string-name>
          , Overview of hope at iberlef 2023:
          <article-title>Multilingual hope speech detection</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          )
          <fpage>371</fpage>
          -
          <lpage>381</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>García-Baena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>García-Cumbreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Tonja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>GarcíaDíaz</surname>
          </string-name>
          , S. M.
          <article-title>Jiménez-Zafra, Overview of hope at iberlef 2024: Approaching hope speech detection in social media from two perspectives, for equality, diversity and inclusion and as expectations</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>73</volume>
          (
          <year>2024</year>
          )
          <fpage>407</fpage>
          -
          <lpage>419</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Amjad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Amjad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. G.</given-names>
            <surname>Ceballos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Optimism, expectation, or sarcasm? multi-class hope speech detection in spanish and english</article-title>
          ,
          <source>ResearchGate</source>
          ,
          <year>2025</year>
          . https://doi.org/10.13140/RG.2.2.19761.90724.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>