<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PLD at HOMO-LAT 2025: Enhancing Dialectal Sentiment Analysis through Contextual Retrieval and Translation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jose Olivert-Iserte</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Caballero-García-Alcaide</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucía Guijarro-Martínez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science and Engineering Department, Carlos III University of Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In the digital age, online social media platforms have transformed human communication, but they have also become venues for the propagation of hate speech, particularly targeting marginalized communities such as the LGBTQ+ population. In this context, Natural Language Processing (NLP) ofers valuable tools for the automatic detection and classification of harmful content. The HOMO-LAT25 shared task, part of the IberLEF 2025 evaluation campaign, focuses on sentiment analysis toward the LGBTQ+ community in Latin American Spanish. It includes two subtasks: a multi-dialect track with consistent training and evaluation dialects, and a cross-dialect track, which challenges systems to generalize sentiment classification across unseen dialects. This study addresses the linguistic and cultural complexities inherent in dialectal variation, which significantly afect how sentiment is expressed and perceived. To tackle these challenges, several Transformer-based approaches using pre-trained models such as BERT and its domain-adapted variants are proposed. Additionally, some innovative enhancements are introduced, including context-based techniques and Retrieval-Augmented Generation (RAG), to incorporate semantically relevant information during inference. These methods aim to improve robustness and accuracy in polarity detection, ultimately contributing to safer and more inclusive online spaces for LGBTQ+ communities.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hate Speech Detection</kwd>
        <kwd>NLP</kwd>
        <kwd>Text Classification</kwd>
        <kwd>RAG</kwd>
        <kwd>Context-Based Inference</kwd>
        <kwd>LLM</kwd>
        <kwd>BERT</kwd>
        <kwd>Prompting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the digital era, online social media has revolutionized how we communicate, interact, and share ideas.
Unfortunately, this transformation has also led to harmful applications of online spaces, such as the
propagation of hate speech. More specifically, the LGTBQ+ community frequently experiences such
aggression, facing constant targeting on these online platforms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In this context, Natural Language Processing (NLP) ofers a powerful tool for the automatic detection
of online hate speech [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Shared tasks play a crucial role in this field by providing common datasets,
evaluation metrics, and a collaborative environment to foster advancements. The Iberian Languages
Evaluation Forum (IberLEF) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a shared evaluation campaign for NLP systems which focuses on
Spanish and Iberian languages. Its aim is to promote the research of text processing, understanding,
and generation, fostering innovation and advancing the state-of-the-art in NLP for these language
communities through the organization of competitive shared tasks.
      </p>
      <p>
        Within the framework of its 2025 edition, the HOMO-LAT25 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] shared task is dedicated to studying
societal perceptions of the LGBTQ+ community through language on social media, as its predecessors
[
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], focusing on polarity detection (sentiment analysis) in Latin American Spanish. It requires systems
to classify posts as having positive, negative, or neutral sentiment towards the LGBTQ+ community
or related keywords across two distinct settings: a multi-dialect track (Subtask 1), where training and
evaluation dialects are consistent, and a cross-dialect track (Subtask 2), which introduces the complexity
of difering dialects between training and evaluation, testing model generalization. A primary objective,
central to the work, is to address the complexities caused by such linguistic diversity. These dialectal
variations significantly influence how sentiment, particularly negative and potentially phobic language,
is expressed and perceived, posing a challenge for developing systems sensitive to local slang and
culturally specific expressions. Thus, the core challenge is to accurately identify targeted sentiment
robustly across diverse linguistic forms, especially in the cross-dialect setting.
      </p>
      <p>Successfully addressing this challenge through the development of models capable of robust and
precise polarity detection is intended to foster a safer online environment for Latin American LGBTQ+
individuals.</p>
      <p>To address these challenges, this work proposes several approaches based on Transformer
architectures, specifically leveraging pre-trained models such as BERT and its domain-adapted variants for
social media. In addition, innovative techniques are explored to enhance model predictions, including
Retrieval-Augmented Generation (RAG) and context-based strategies that incorporate semantically
relevant information during inference to improve robustness and accuracy, particularly in the presence
of dialectal variation. These approaches and enhancements not only demonstrated strong empirical
performance but also led to the system achieving the top position in the HOMO-LAT25 shared task,
validating the efectiveness of the proposed methodologies.</p>
    </sec>
    <sec id="sec-2">
      <title>2. State of the Art</title>
      <sec id="sec-2-1">
        <title>2.1. Transformers</title>
        <p>
          Transformers [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] have marked a revolutionary advancement within the field of NLP and other areas of
Artificial Intelligence. This architectural paradigm has proven to be exceptionally efective for tasks
demanding sequence comprehension, including machine translation, text generation, classification,
and sentiment analysis. In contrast to prior models such as Recurrent Neural Networks (RNNs) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ],
Transformers employ an attention mechanism that captures long-range dependencies more eficiently,
enabling them to outperform earlier approaches across a diverse array of tasks.
        </p>
        <p>
          The fundamental architecture of Transformers primarily consists of two main components: an
encoder and a decoder. The encoder processes an input sequence, transforming it into a rich feature
representation. Subsequently, the decoder uses this representation to generate an output sequence. Both
the encoder and decoder blocks are constructed from layers incorporating attention mechanisms and
fully connected feed-forward neural networks [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Central to this design is the attention mechanism [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
It empowers the model to selectively focus on diferent segments of the input sequence by enabling each
element to relate to others through calculated attention weights. These weights determine the relative
importance of each part of the sequence, thus enhancing the model’s capacity to capture long-range
dependencies and consider the global context of the input.
        </p>
        <p>
          This architecture has been indispensable to virtually all significant advancements in AI in recent
years. Despite their success, Transformer-based models also come with limitations [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Firstly, they
demand substantial computational resources for both training and inference. Secondly, the complexity
of these models often impedes their interpretability, introducing challenges for applications where
transparency is a critical requirement.
2.2. LLMs
Large Language Models (LLMs) have emerged as one of the most significant innovations in the field
of Artificial Intelligence in recent years. These models are fundamentally built upon
Transformerbased architectures, designed to capture complex, long-range relationships within data, as previously
mentioned [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Furthermore, LLMs are trained on vast text corpora, through which they acquire an
implicit understanding of linguistic patterns, context, and semantic relationships.
        </p>
        <p>
          The capabilities of Large Language Models have significantly impacted various NLP tasks, sentiment
analysis being one among them [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Traditional sentiment analysis methods often relied on feature
engineering, lexicons, or simpler machine learning models, which could struggle with the complexities
of language, such as emotions, attitudes, and implicit opinions [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          LLMs, with their deep understanding of language acquired from pre-training on massive datasets,
ofer several advantages for sentiment analysis. They can often perform sentiment classification with
high accuracy even in zero-shot or few-shot settings, meaning they can determine sentiment without
requiring task-specific training data, or with very little [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This is particularly beneficial for domains
or languages where labeled sentiment data is scarce.
        </p>
        <p>
          However, the use of LLMs in sentiment analysis also presents disadvantages. In the context of
this work, like their general use, they can inherit biases present in their training data, potentially
leading to skewed sentiment predictions for certain demographic groups or topics [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Nonetheless,
the integration of LLMs has significantly advanced the capabilities and accuracy of sentiment analysis
systems, enabling a more nuanced and context-aware understanding of opinions expressed in text.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Aspect-Based Sentiment Analysis</title>
        <p>
          Beyond determining the overall polarity of a text, Aspect-Based Sentiment Analysis (ABSA) provides a
more fine-grained analytical approach [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The core objective of ABSA is to identify specific aspects,
defined as attributes or components of an entity or topic mentioned within the text, and subsequently
ascertain the sentiment expressed towards each of these distinct aspects [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>
          This process typically involves deconstructing the text into its constituent pieces, identifying the
core elements or aspects under examination, and subsequently analyzing the sentiment expressed
towards each of these [
          <xref ref-type="bibr" rid="ref18">18, 19</xref>
          ]. The capacity of LLMs to understand complex linguistic structures
and contextual relationships has notably enhanced the performance of ABSA systems, particularly
in accurately identifying subtle aspect mentions and their associated sentiment. Thus, ABSA ofers
a detailed and structured understanding of opinions, moving beyond general sentiment scores. In
the context of tasks like HOMO-LAT25, this approach could further identify the specific themes or
topics within pertinent discussions that drive particular sentiments, ofering deeper insights into the
characteristics of problematic or supportive language.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. Retrieval-Augmented Generation (RAG)</title>
        <p>Retrieval-Augmented Generation (RAG) has become a key strategy to enhance the relevance, adaptability,
and factual grounding of LLM outputs across diverse domains [20, 21]. In sentiment analysis, particularly
for nuanced or domain-specific contexts such as social media discourse, RAG helps address limitations
in the model’s internal knowledge by retrieving semantically relevant examples or documents from
curated repositories before generation. This retrieved context is then incorporated into the input prompt,
grounding the model’s predictions in up-to-date and task-specific data [22].</p>
        <p>RAG systems typically consist of a knowledge base, embedding models for retrieval, vector databases
(e.g., FAISS, Pinecone), and a retrieval pipeline [23, 22]. In the case of sentiment classification, such
systems can retrieve similar past posts labeled with sentiment classes, user feedback, or even related
community guidelines, providing the model with more informed context. Fine-tuning embeddings on
sentiment-labeled data and applying prompt engineering strategies further improve retrieval accuracy
and classification performance.</p>
        <p>Recent innovations, such as hybrid keyword-semantic retrieval and multimodal RAG approaches,
enable more context-aware outputs, particularly useful in emotionally complex or dialectally diverse
settings [24, 21]. Moreover, integrating feedback mechanisms allows the system to iteratively refine its
retrieval and classification pipeline, improving both consistency and robustness [25].</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.5. Related Work</title>
        <p>Multiple studies have addressed the challenge of identifying various forms of online hate, with part of
this work specifically focusing on content targeting the LGBTQ+ community. Early approaches often
relied on lexicon-based methods and traditional machine learning classifiers [ 26]. However, recent
research has increasingly leveraged deep learning, particularly Transformer-based models, to better
capture contextual nuances.</p>
        <p>For instance, [27] presented work on detecting homophobia and transphobia in YouTube comments
across English and Tamil, introducing a new dataset and benchmarking various machine learning models.
Their study, highlighted the challenges and successes in recognizing such content across diferent
linguistic contexts. Similarly, [28] focused on identifying LGBT+-directed hate speech in tweets using
multi-class and multi-label approaches with models like BERT and RoBERTa. They specifically addressed
practical challenges such as data imbalance through preprocessing and oversampling techniques. Further
exploring Transformer capabilities, [29] employed a GPT-2 model for recognizing homophobic and
transphobic content in social media comments across five languages, including Spanish, noting variance
in detection ease across languages. Addressing the complexities of multilingual and low-resource
scenarios, [30] investigated the detection of homophobia and transphobia in code-mixed social media
text. They proposed data augmentation techniques, such as pseudo labeling through transliteration, to
improve model performance, demonstrating strategies to tackle resource scarcity in this domain. These
studies collectively underscore the shift towards sophisticated neural models and the ongoing eforts to
tackle the nuanced and evolving nature of online homophobia, often with a multilingual perspective.</p>
        <p>For processing Spanish-language social media text, robust pre-trained language models are crucial.
RoBERTuito [31] has emerged as a significant resource in this regard. Developed by pre-training a
RoBERTa architecture on a large corpus of Spanish tweets (over 500 million), RoBERTuito is specifically
tailored to understand the informal language, slang, and idiosyncrasies prevalent in social media
conversations in Spanish. Its strong performance on various downstream tasks, including sentiment
analysis and ofensive language identification in Spanish, has made it a widely adopted baseline and a
foundational model for researchers working with Spanish social media data, often serving as a starting
point for fine-tuning in tasks related to sentiment and hate speech detection.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Datasets</title>
      <p>This section provides a detailed description of the data made available by the organizers for the tasks
introduced in Section 1. A total of four datasets were provided for training and evaluating the systems
developed for the competition.</p>
      <p>The datasets summarized in Table 1 correspond to the training and development partitions used for
Track 1, which focuses on multi-dialect sentiment analysis as outlined in Section 1. These datasets
were released at separate times and contain user-generated content labeled with sentiment and dialect
metadata. Each instance includes five features: id, country, keyword, post content and label. Specifically,
four dialects are represented in both sets: Argentinean, Colombian, Chilean, and Mexican Spanish. The
dialects remain fixed in Track 1, which allows models to specialize in these regional variations.</p>
      <p>The challenge escalates in complexity when transitioning to Track 2, the cross-dialect setting, where
the distribution of dialects in the test data includes variations not present in the training corpus. This
divergence introduces a domain adaptation issue, further complicating model generalization.</p>
      <p>Figure 1 illustrates the class distribution across the training and development sets. As shown, there is
a marked imbalance, with the positive (POS) class being significantly underrepresented. This imbalance
poses well-known dificulties for machine learning models, such as a tendency to overfit on the majority
classes and poor generalization for minority ones, which directly impacts performance in real-world
applications.</p>
      <p>Figure 2 shows the distribution of text lengths across the dataset. As shown, in both the training
and development sets, 95% of the texts contain fewer than 300 words, with only a small number of
instances exceeding this length. This observation is particularly relevant for determining an appropriate
maximum token limit that the models can eficiently handle during training and inference.</p>
      <p>In addition to the training and development sets, two separate test sets were provided for evaluation.
The first test set, corresponding to Track 1 (cross-dialect), comprises 3,588 instances. The second test
set, associated with Track 2 (multi-dialect), contains 5,475 instances.</p>
      <p>It is worth noting that the number of instances in the test sets is approximately equal to the total
number of training and development instances. This parity in size ensures a balanced evaluation
scenario; however, it also necessitates that models exhibit strong generalization capabilities, as the
evaluation phase involves datasets of comparable complexity and volume to those used during training.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This section outlines the methodology adopted to tackle the tracks presented in the HOMO-LAT25
challenge. It begins by detailing the preprocessing procedures applied to the original dataset, followed
by the data augmentation strategies implemented. Subsequently, the context extraction is described
and some conclusions are presented with an overview of the various approaches explored to address
the problem.</p>
      <sec id="sec-4-1">
        <title>4.1. Data Processing and Augmentation</title>
        <p>To begin with, as the texts originated from Reddit posts, it is performed a cleaning of each entry in the
original dataset. This preprocessing step involved the removal of superfluous and redundant elements
such as URLs, user mentions, markdown syntax, and extra whitespaces. Additionally, all text was
converted to lowercase in order to minimize vocabulary size.</p>
        <p>Subsequently, all texts were translated into English. This decision was motivated by two key
considerations:
• Pre-trained models: The search for open-source pre-trained models revealed that the majority,
particularly those demonstrating superior performance, were trained predominantly on
Englishlanguage data. Even multilingual models were largely biased toward English, with Spanish
comprising only a minor portion of the training data.
• Spanish dialectal variation: Given that the second track includes Spanish dialects in the test
set that are not represented in the training data, translating all texts into English prior to model
training and inference served as an efective strategy to mitigate dialectal discrepancies.</p>
        <p>Therefore, the hypothesis is that leveraging high-quality English pre-trained models, combined with
the mitigation of dialectal variation through translation, would significantly enhance model performance
across both tracks, particularly in Track 2.</p>
        <p>Following the preprocessing stage, online data augmentation is implemented due to the limited
number of training examples. Firstly, each text was paraphrased twice in sequence, with the goal of
ensuring that the second paraphrase difered as much as possible from both the original and the first
paraphrased version. Note that all paraphrased texts were also generated in English, based on the
previously translated data.</p>
        <p>During training, each input instance was randomly selected from a set comprising the original text
and its two paraphrased versions. This strategy ensured that the model was exposed to slightly varied
data in each epoch, thereby enhancing generalization and reducing the risk of overfitting.</p>
        <p>It is worth noting that both the translation of the texts and the generation of paraphrases were
performed using a Large Language Model, specifically GPT-4o-mini.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Context Engine</title>
        <p>Most of the methodologies presented in the following section incorporate a novel component referred
to as the Context Engine. As the name implies, this module is designed to retrieve semantically relevant
contextual information to support the prediction of a given post. Due to the inherent complexity
and subjectivity of the task at hand, as well as challenges such as class imbalance and limited data
availability, providing the model with contextual examples prior to inference has proven to be a robust
and efective strategy.</p>
        <p>The approaches that leverage the Context Engine will be detailed in the subsequent section. In general
terms, the component retrieves the top- most semantically similar posts from the training dataset
for each test instance. For example, when  = 1, the methodology retrieves the most semantically
similar post from each class (NEG, NEU, and POS) with respect to the post being predicted. As will be
demonstrated, this strategy not only enhances model performance but also highlights the importance of
incorporating relevant and class-specific context in tasks that are highly subjective. To this end, cosine
similarity is employed as the similarity metric. Cosine similarity is widely adopted in natural language
processing tasks for quantifying the degree of similarity between high-dimensional vectors [32].</p>
        <p>In this setting, similarity computations are not performed directly on raw text, but rather on
embedding vectors, dense numerical representations of the input text. These embeddings are generated using
the SentenceTransformers library [33], which provides state-of-the-art pre-trained models optimized
for various semantic similarity tasks, including semantic search and paraphrase identification.</p>
        <p>To improve eficiency and reduce computational overhead, the embeddings corresponding to all
posts in the training corpus are precomputed and stored locally. The same embedding model is applied
consistently to both the training data and the posts being evaluated during inference, ensuring semantic
alignment and vector compatibility. When a new post is presented for classification, it is encoded into
a 384-dimensional embedding. Cosine similarity is then computed between this embedding and all
precomputed training embeddings, and the top- most similar posts from each class are retrieved. These
selected examples are subsequently used to provide context during the inference process.</p>
        <p>To illustrate the functionality of the Context Engine with a real-world example, consider the following
post to be classified:
“being gay and moving to the middle east is not a very bright idea much
less contacting someone through an app knowing that it is illegal i do
not justify their medieval customs but you dont have to be very bright to
realize that they handed themselves on a silver platter to the authorities.”
When applying the methodology with  = 1, the Context Engine retrieves one semantically similar
post from each class. For instance, from the POS class, the retrieved post is: “nowadays yes the country
is promoted as a gay friendly destination attracting tourists from that community and dollars come in.”
From the NEG class: “i wouldnt use tinder even if i were straight in the middle east now imagine being
gay there.” And from the NEU class: “I don’t understand why homosexuals consciously go to these places
that hate them. What is worse, it makes them illegal just for being homosexual. I hope no gay person
approaches the Middle East from now on.” These contextual examples are then used in diferent ways in
order to add contextual information to the diferent approaches and models used.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Approaches</title>
        <p>This section presents a comprehensive analysis of the approaches evaluated during the HOMO-LAT25
challenge. In total, four distinct methodologies are described. As an initial step, a baseline model is
introduced to establish a performance reference using the preprocessed dataset. This model serves to
demonstrate the fundamental capabilities of a standard architecture when applied to the task.</p>
        <p>Subsequently, three alternative approaches are proposed, each of which outperforms the baseline
in terms of predictive performance. These methods incorporate advanced techniques from the field
of NLP and are further enhanced through the integration of novel strategies, including the use of
contextual information and ensemble mechanisms. The specific design and implementation details of
these enhancements are discussed in the following subsections.</p>
        <sec id="sec-4-3-1">
          <title>4.3.1. Base Line Models</title>
          <p>To establish an initial performance benchmark, two well-established machine learning algorithms were
implemented as baseline models: Random Forest (RF) and Support Vector Machine (SVM). These models
were selected due to their proven efectiveness in classification tasks, their relative interpretability, and
their ability to provide competitive performance without the need for extensive hyperparameter tuning
or high computational overhead. Their robustness and simplicity make them suitable for exploratory
analyses and for serving as reference points in comparative evaluations.</p>
          <p>The task at hand involves the detection of homophobic comments in Reddit posts, specifically within
the Latin American sociolinguistic context. This introduces several challenges related to language
variation, informal syntax, code-switching, and the presence of culturally nuanced expressions. While
these characteristics complicate the task for traditional models, they also underscore the value of
establishing a strong baseline to assess the complexity of the dataset before applying more advanced
neural or context-aware architectures.</p>
          <p>In this setting, both RF and SVM were trained on vectorized textual inputs derived from embedding
representations of the Reddit posts. These embeddings capture semantic and syntactic features of the
text, facilitating the application of classical classifiers that rely on fixed-length input vectors. Although
these models do not inherently exploit sequential information beyond the word-level features captured
by the embeddings, they ofer an useful perspective on the discriminative power of the raw data.</p>
          <p>Preliminary evaluation using these models helped to gauge the inherent dificulty of the classification
task. It also revealed some of the limitations of traditional approaches when applied to imbalanced and
noisy datasets, as is often the case with user-generated content in online platforms. Specifically, the class
imbalance between homophobic and non-homophobic comments adversely afected the performance
of both classifiers, resulting in high accuracy but suboptimal recall for the minority class, which is
a critical metric in hate speech detection tasks. These initial findings served as a foundation for the
development and evaluation of more advanced models discussed in subsequent sections.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.3.2. LLM Prompting</title>
          <p>The following approach leverages the GPT-4-o-mini model to perform sentiment classification based on
a tailored natural language prompt. This stage directly follows the execution of the Context Engine
(See Section 4.2), which is responsible for enriching the input instance with class-specific contextual
information retrieved from the training corpus. The integration of contextual examples into the prompt
is key to addressing the subjectivity of the classification task and improving model generalization,
particularly in imbalanced or data-scarce environments.</p>
          <p>The constructed prompt follows a standardized template designed to align the model’s response with
the sentiment classification task. Specifically, the prompt frames the model as an assistant whose sole
responsibility is to determine the polarity, Positive (POS), Negative (NEG), or Neutral (NEU), of a Reddit
post in relation to a given keyword. The complete format of the prompt is shown below:
You are an assistant tasked with classifying Reddit posts based on the
sentiment they express towards a given Keyword. There are three possible
classes: Positive (POS), Negative (NEG), or Neutral (NEU). Classify this
Reddit post as POS, NEG, or NEU based on its sentiment or polarity towards
this given keyword. Keyword: [keyword] Post to classify: [target post]
Return only the label. To give you more context, I will provide three
lists containing the most similar texts to the post to be classified.
Examples of positive posts: [list of k POS posts] Examples of negative
posts: [list of k NEG posts] Examples of neutral posts: [list of k NEU
posts] Remember, return only the label: POS, NEG, or NEU.</p>
          <p>This approach ensures that the model is exposed not only to the raw post but also to semantically
aligned examples from each sentiment class. These examples are the output of the Context Engine, as
explained in Section 4.2.</p>
          <p>Once the tailored prompt is constructed, it is passed to the GPT-4-o-mini model for inference. The
model then outputs a single label, POS, NEG, or NEU, based on its interpretation of the post’s sentiment
with respect to the provided keyword and contextual examples. As illustrated in Figure 3, this output
constitutes the final classification result of the entire system.</p>
          <p>By embedding semantic context into the prompt, this approach significantly enhances the model’s
ability to handle subtle sentiment variations, particularly in complex or ambiguous posts. As will be
demonstrated in the evaluation section, this methodology consistently outperforms baseline models,
underscoring the critical role of contextual prompting in sentiment classification tasks involving social
media data.</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>4.3.3. BERT Finetuning</title>
          <p>This approach involves fine-tuning a BERT-based model. Specifically, an open-source text classification
model available on Hugging Face: “cardifnlp/twitter-roberta-base-sentiment-latest” [ 34] is selected.
This is a RoBERTa-base model trained on 124 million tweets from January 2018 to December 2021, and
ifnetuned for sentiment analysis. Although some experimentation was made with other models, this
one consistently yielded superior results, likely due to the linguistic and stylistic similarities between
tweets and Reddit posts.</p>
          <p>Given that the task at hand is aspect-based sentiment analysis, a modification to the input structure
was made to ensure the model focuses on the relevant keyword within each post. During tokenization,
each input instance is constructed by concatenating the text and the corresponding keyword, separated
by a designated separator token, as follows:</p>
          <p>[’&lt;s&gt;’, TEXT-TOKENS, ’&lt;/s&gt;’, ’&lt;/s&gt;’, KEYWORD-TOKENS, ’&lt;/s&gt;’]
This structure encourages the model to attend more closely to the text tokens that overlap with the
keyword tokens during prediction.</p>
          <p>The pre-trained model is fine-tuned using the training data, systematically exploring various
hyperparameter configurations and training strategies to identify the setup that delivered the best performance.
To further enhance the robustness and generalization of the model, an ensemble technique has been
applied to this approach. As described in Section 4.1, the strategy involves generating paraphrases of
each input post and performing inference not only on the original text but also on its two paraphrased
variants. Each of these three versions is independently processed through the model, yielding three
predicted sentiment labels. The final classification decision is then determined by majority voting among
these predictions. In instances where the three predictions yield distinct labels, the final prediction is
assigned to the label with the highest associated probability.</p>
          <p>This ensemble method helps to mitigate the impact of potential ambiguities or artifacts present in a
single formulation of the post, efectively smoothing out variability introduced by lexical or syntactic
diferences. As will be shown in the evaluation section, this strategy leads to improved performance
and greater stability compared to relying solely on the original post for inference.</p>
        </sec>
        <sec id="sec-4-3-4">
          <title>4.3.4. Context-Based BERT</title>
          <p>This approach also leverages the Context Engine. However, in contrast to the LLM Prompting strategy,
which injects these examples directly into a natural language prompt, this method integrates the
contextual information into the architecture of a fine-tuned transformer-based classifier through a
specialized attention mechanism. Specifically, it employs the
“cardifnlp/twitter-roberta-base-sentimentlatest” model, mentioned in the BERT finetuning approach, as the backbone and enhances it with
class-specific multi-head attention modules.</p>
          <p>The proposed model, referred to as “SentimentClassifierWithMultiAttention”, extends a pre-trained
RoBERTa encoder with three parallel multi-head attention blocks, each responsible for integrating
contextual signals from a diferent sentiment class: Positive (POS), Neutral (NEU), and Negative (NEG).
These blocks are crucial for tailoring the representation of the input post based on its semantic alignment
with prototypical examples from each sentiment class.</p>
          <p>During inference, the following sequence of operations is executed:
1. The input post is encoded using the RoBERTa backbone, and the output corresponding to the
[CLS] token is extracted as the global representation of the post, denoted as hcls ∈ R.
2. Using the Context Engine,  nearest posts from each class are retrieved. These posts are also
encoded via RoBERTa to obtain their respective [CLS] embeddings:
• Hpos ∈ R× 
• Hneu ∈ R× 
• Hneg ∈ R× 
3. For each class  ∈ {pos, neu, neg}, a multi-head attention module is applied where:
Attention(Q, K, V) = softmax
︂( QK⊤ )︂
√</p>
          <p>V
Here, Q = hcls, K = H, and V = hcls, making the attention mechanism focus on how much
the representation of the input post is aligned with the embeddings of the respective class. This
design ensures that the attention weights capture how the semantics of the post correlate with
prototypical sentiment examples, but the value vector remains grounded in the input post’s own
representation.
4. The outputs of the three attention modules apos, aneu, and aneg represent contextualized versions
of hcls, modulated by each sentiment class.
5. These four vectors are concatenated to form a unified representation:</p>
          <p>hcombined = [︀ hcls; apos; aneg; aneu]︀ ∈ R4
6. This combined vector is then passed through a deep feedforward network with dropout and
nonlinear activations. Finally, a linear classifier projects it into a 3-dimensional output corresponding
to the sentiment classes.</p>
          <p>This architecture allows the model to not only consider the content of the input post but also to
explicitly attend to semantically similar posts from each class. By modeling interactions between
the input and class-specific exemplars via attention, the system gains a structured inductive bias
that enhances robustness, particularly under the presence of ambiguous or borderline cases. As
with the LLM Prompting approach, the selection of contextual posts is powered by cosine similarity
over SentenceTransformer embeddings. This shared mechanism ensures consistency in the semantic
grounding across architectures while allowing each model to exploit context in a way best suited to its
inference paradigm. In addition, to further enhance the robustness and generalization of the model, the
same ensemble technique applied in BERT Finetuning approach (Section 4.3.3) has been applied to this
approach.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>This section presents the evaluation methodology applied to all the approaches described in Section 4.3,
along with a discussion of the corresponding results. It is important to highlight that the evaluation
procedure for the baseline models difers from that employed for the subsequent approaches. Specifically,
a train-test split was conducted using the training dataset provided to the participants at the outset of
the competition. Also, an exhaustive grid search was conducted to explore all possible hyperparameter
configurations, with the objective of identifying the optimal settings for each model in terms of macro
F1-score performance. This preliminary evaluation of the baseline models served to establish a reference
point and to assess the initial dificulty of the task. The final reported results correspond to the oficial
evaluation conducted during the HOMO-LAT25 competition, encompassing both Track 1 and Track 2.</p>
      <p>Despite their general applicability, both baseline models exhibited limited efectiveness across both
tracks, primarily due to the significant class imbalance present in the dataset. As shown in Tables 2
and 3, both Random Forest and SVM struggled to correctly classify instances from the minority class
(POS), with an F1-score of 0.00. This result is indicative of the models’ bias toward the majority classes,
particularly the NEU class, which dominates the dataset.</p>
      <p>The Random Forest model achieved an overall accuracy of 58%, with a macro-average F1-score of
0.33. The SVM model slightly improved the overall accuracy to 59% and the macro-average F1-score to
0.39. However, the zero recall and precision for the POS class in both cases suggest that these models
are not suitable for handling highly imbalanced and subjective tasks without further enhancements
such as resampling strategies, class weighting, or the integration of contextual information.</p>
      <p>These results confirm the limitations of conventional machine learning algorithms in scenarios
characterized by strong class imbalance and semantic complexity. The inability of these models to detect
any instance of the POS class highlights the need for more advanced techniques that can incorporate
external contextual knowledge and exploit semantic similarities between posts. As will be shown
in subsequent sections, the proposed Context Engine addresses these shortcomings by enriching the
input space with relevant and class-balanced contextual examples, significantly improving performance
across all classes.</p>
      <p>From now on the results obtained on the test set provided by the organizers of HOMO-LAT25 are
presented and analyzed. In order to obtain robust and generalizable results, it is essential to explore how
diferent hyperparameter configurations afect model performance. Testing a wide range of settings
allows us to better understand the sensitivity of the models to various training dynamics and to identify
the most optimal combinations for this specific task. This is particularly relevant in a challenging
context such as sentiment classification of social media posts in the Latin American space, where
linguistic variability and data imbalance introduce additional complexity.</p>
      <p>Each approach was trained and evaluated under diferent combinations of these parameters, allowing
us to conduct a thorough analysis of their impact on model efectiveness. The results of these experiments
and the final configuration are discussed in the following subsections. In Table 4 the diferent parameters
that have been tested in the diferent approaches can be seen. For context, max_len refers to the maximum
number of tokens that the model can get as input, d_aug parameter refers to a binary variable to control
whether to use data augmentation techniques, as explained in Section 4.1, or not, and parameter 
refers to the number of posts of each class extracted by the Context Engine.</p>
      <p>A total of five diferent submissions were made for the task, and the results for both tracks are
summarized in Table 5. Each submission corresponds to a distinct approach, which is described as
follows:
• BERT: Fine-tuned BERT model with ensemble strategy, trained on 90% of the dataset and validated
on the remaining 10% to select the best-performing model during training. The final configuration
of this approach is: max_len = 256, batch_size = 16, epochs = 10, learning_rate = 2e-5, weight_decay
= 0.01 and d_aug = 1.
• BERT-FULL: Same as the BERT approach, but trained on the full dataset. The selected model
corresponds to the epoch that yielded the best results in validation during prior tuning. The final
configuration is the same as in BERT approach.
• LLM: Predictions generated using a prompting-based LLM approach.
• CONTEXT: Context-enhanced BERT model, trained on 90% of the data and evaluated on the
remaining 10%. The final configuration of this approach is: max_len = 256, batch_size = 16, epochs
= 10, learning_rate = 1e-5, weight_decay = 0.01, d_aug = 1 and k = 2.
• CONTEXT-ENSEMBLE: Context-enhanced BERT model with ensemble strategy, trained on
90% of the data and validated on the remaining 10%. The final configuration is the same as in
CONTEXT approach.</p>
      <p>For Track 1, the best-performing approach was the LLM prompting method, achieving a macro
F1-score of approximately 0.53. Both BERT-based fine-tuning approaches produced slightly lower
yet comparable scores. In contrast, the context-based models obtained the lowest results, with
F1scores close to 0.50. In Track 2, the highest macro F1-score was achieved by the BERT-FULL approach,
reaching approximately 0.51. Interestingly, although LLM prompting was the best approach in Track 1,
it performed worse than both BERT-based methods in this track. Once again, the context-based models
recorded the lowest performance, although the gap was smaller compared to Track 1.</p>
      <p>A few additional observations can be made from the results:
• The BERT-FULL model, which was trained on the full dataset, outperformed the BERT model
trained on a reduced dataset in Track 2. However, the opposite occurred in Track 1, though the
diference in scores is minimal and likely not statistically significant.
• The use of ensemble strategies consistently led to improved performance across both BERT and
context-based models, highlighting the benefit of aggregating predictions over relying on a single
instance.</p>
      <p>Additionally, Table 6 presents the results obtained by all participants in the HOMO-LAT25 challenge
for both tracks. The data used to generate this table were sourced from the oficial results released by
the organizers, which are available at the following link.</p>
      <p>The most noteworthy observation from Table 6 is the substantial performance gap observed in
Track 2, where all five of the submitted approaches outperformed those of the other participants. In
particular, the top-performing method surpassed the next best result by nearly 0.03 in macro F1-score.</p>
      <p>Owner
Ours (LLM)
Ours (BERT)
user1
user1
Ours (BERT-FULL)
user2
user2
user2
Ours (CONTEXT-ENSEMBLE)
Ours (CONTEXT)
user1
user3
user4
user5
user5
user3
user6
0.5296
0.5284
0.5261
0.5261
0.5260
0.5137
0.5137
0.5104
0.5085
0.4946
0.4820
0.4661
0.4360
0.4301
0.4301
0.3793
0.2592</p>
      <p>Ours (BERT-FULL)
Ours (BERT)
Ours (LLM)
Ours (CONTEXT-ENSEMBLE)
Ours (CONTEXT)
user1
user1
user2
user2
user2
user1
user4
user5
user5
user3
user6
0.5086
0.5051
0.4989
0.4961
0.4870
0.4803
0.4803
0.4639
0.4639
0.4639
0.4467
0.4388
0.4054
0.4054
0.3622
0.2588
This performance advantage is likely attributable to the strategy of translating all texts into English.
Given that Track 2 features Spanish dialects in the test set that difer from those in the training data, a
decline in performance was expected due to the increased linguistic variability. However, by translating
the texts into English, the dialectal variation and linguistic noise is efectively reduced. This translation
step contributed to greater textual homogeneity, thereby improving the model’s ability to generalize
across dialects and ultimately enhancing performance in this more challenging track.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>This paper presented several Transformer-based approaches to tackle the HOMO-LAT25 shared task,
focusing on sentiment analysis towards the LGBTQ+ community in Latin American Spanish, with a
particular emphasis on handling dialectal variations.The core strategies involved translating input texts
to English to leverage powerful pre-trained models and mitigate dialectal discrepancies, employing
a novel Context Engine to inject semantically relevant examples during inference, and utilizing data
augmentation and ensemble techniques to enhance robustness.</p>
      <p>The findings demonstrate the eficacy of these methods. Notably, the strategy of translating all texts
to English proved highly successful for the cross-dialect track (Track 2), where all five of the submitted
models outperformed other participants, with the top-performing BERT-FULL model achieving a
significant lead. This underscores the benefit of standardizing linguistic input when dealing with
diverse, low-resource dialects. For the multi-dialect track (Track 1), LLM prompting enriched with
the Context Engine yielded the best results among the submissions, highlighting the power of large
models when provided with carefully curated contextual information. Fine-tuned BERT models also
showed strong, competitive performance across both tracks. The Context Engine, by providing
classspecific similar examples, consistently aided models in navigating the subjective nature of sentiment
classification.</p>
      <p>These results not only highlight the promise of Transformer-based models and LLMs in tackling
complex sentiment analysis tasks in low-resource and linguistically diverse settings, but also reflect a
broader insight: while LLMs are powerful and increasingly indispensable tools in NLP, their efectiveness
depends heavily on how they are integrated into task-specific pipelines. Relying solely on their
pretrained knowledge can lead to suboptimal outcomes, especially in domains where context, cultural
nuance, or dialectal variation plays a central role. As such, future systems should not only adopt
LLMs but also incorporate strategic components, such as contextual retrieval mechanisms, translation
pipelines, or task-aware augmentation, to fully unlock their potential. A thoughtful preliminary analysis
remains essential to tailor these models to the unique demands of each task.</p>
      <p>For future work, several avenues warrant exploration. First, while English translation was efective,
investigating multilingual models specifically pre-trained or fine-tuned on diverse Spanish dialects
could ofer a more direct approach, potentially preserving nuances lost in translation. Second, the
Context Engine could be enhanced by exploring more sophisticated retrieval mechanisms beyond
cosine similarity or by dynamically adjusting the number of retrieved examples () based on input
characteristics. Integrating true Retrieval-Augmented Generation with external knowledge bases
containing dialect-specific slang or cultural context could further improve performance. Third, a deeper
analysis into model interpretability is crucial, especially for socially sensitive tasks like hate speech
detection, to understand how models arrive at their predictions and to identify potential biases. Finally,
it is worth noting that the dataset provided for the shared task was limited in size. Expanding the
training data, both in quantity and dialectal coverage, would be highly beneficial, as it would allow for
more robust fine-tuning and better generalization across diverse linguistic scenarios.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been partially carried out with the support of grant PID2022-140554OB-C32, funded by
MCIN/AEI/10.13039/501100011033, and project SEGVAUTO 5Gen-CM (TEC-2024/ECO-277), funded
by the Community of Madrid. Additionally, this project has been partially supported by the strategic
research initiative of the Carlos III University of Madrid, titled “Digital Engineering and Language AI”.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4 and Google Gemini in order to: Grammar
and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
112249.
[19] I. A. Kandhro, F. Ali, M. Uddin, A. Kehar, S. Manickam, Exploring aspect-based sentiment analysis:
an in-depth review of current methods and prospects for advancement, Knowledge and Information
Systems 66 (2024) 3639–3669. URL: https://doi.org/10.1007/s10115-024-02104-8.
[20] M. Zeng, et al., What makes a good retriever for open-domain question answering?, Transactions
of the Association for Computational Linguistics 12 (2024) 1–19.
[21] Goover, Advancements in retrieval-augmented generation, https://seo.goover.ai/report/202502/
go-public-report-en-d8f4e00f-9e56-4651-bc66-274977ebd6d2-0-0.html, 2025.
[22] CelerData, Latest developments in retrieval-augmented generation, https://celerdata.com/glossary/
latest-developments-in-retrieval-augmented-generation, 2025.
[23] Chitika, Rag for code generation: Automate coding with ai &amp; llms, https://www.chitika.com/
rag-for-code-generation/, 2025.
[24] S. Solutions, Trends in active retrieval augmented generation: 2025 and beyond, https://www.</p>
      <p>signitysolutions.com/blog/trends-in-active-retrieval-augmented-generation, 2025.
[25] Y. Liu, Z. Wang, Y. Zhou, W. Zhang, Y. Y. Wang, Leveraging llm and user feedback to improve
retrieval-augmented generation when question and answer domains shift, in: Proceedings of
the 2024 Annual Meeting of the Association for Computational Linguistics (ACL), 2024. URL:
https://openreview.net/forum?id=S034bjikkf.
[26] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, V. Patti, Resources and benchmark corpora for hate
speech detection: a systematic review, Language Resources and Evaluation 55 (2021) 477–523.</p>
      <p>URL: https://doi.org/10.1007/s10579-020-09502-8. doi:10.1007/s10579-020-09502-8.
[27] B. R. Chakravarthi, Detection of homophobia and transphobia in YouTube comments,
International Journal of Data Science and Analytics 18 (2024) 49–68. URL: https://doi.org/10.1007/
s41060-023-00400-0. doi:10.1007/s41060-023-00400-0.
[28] M. G. Yigezu, O. Kolesnikova, G. Sidorov, A. F. Gelbukh, Transformer-based hate speech detection
for multi-class and multi-label classification., in: IberLEF@ SEPLN, 2023.
[29] J. J. Andrew, Judithjeyafreeda@ lt-edi-2023: Using gpt model for recognition of
homophobia/transphobia detection from social media, in: Proceedings of the Third Workshop on Language
Technology for Equality, Diversity and Inclusion, 2023, pp. 78–82.
[30] B. R. Chakravarthi, A. Hande, R. Ponnusamy, P. K. Kumaresan, R. Priyadharshini, How can
we detect Homophobia and Transphobia? Experiments in a multilingual code-mixed setting for
social media governance, International Journal of Information Management Data Insights 2 (2022)
100119. URL: https://www.sciencedirect.com/science/article/pii/S2667096822000623. doi:https:
//doi.org/10.1016/j.jjimei.2022.100119.
[31] J. M. Pérez, D. A. Furman, L. Alonso Alemany, F. M. Luque, RoBERTuito: a pre-trained language
model for social media text in Spanish, in: N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri,
T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, S. Piperidis (Eds.),
Proceedings of the Thirteenth Language Resources and Evaluation Conference, European Language
Resources Association, Marseille, France, 2022, pp. 7235–7243. URL: https://aclanthology.org/2022.
lrec-1.785/.
[32] M. Farouk, Measuring sentences similarity: A survey, Indian Journal of Science and Technology
(2019). doi:10.17485/ijst/2019/v12i25/143977.
[33] UKPLab, Sentencetransformers documentation, 2024. Available at https://www.sbert.net.
[34] D. Loureiro, F. Barbieri, L. Neves, L. Espinosa Anke, J. Camacho-collados, TimeLMs: Diachronic
language models from Twitter, in: Proceedings of the 60th Annual Meeting of the Association for
Computational Linguistics: System Demonstrations, Association for Computational Linguistics,
Dublin, Ireland, 2022, pp. 251–260. URL: https://aclanthology.org/2022.acl-demo.25. doi:10.18653/
v1/2022.acl-demo.25.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ştefăniţă</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Buf</surname>
          </string-name>
          ,
          <article-title>Hate speech in social media and its efects on the lgbt community: A review of the current</article-title>
          research1,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .21018/rjcpr.
          <year>2021</year>
          .
          <volume>1</volume>
          .322.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sureka</surname>
          </string-name>
          ,
          <article-title>Characterizing linguistic attributes for automatic classification of intent based racist/radicalized posts on tumblr micro-blogging website (</article-title>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sierra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dunstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Manrique</surname>
          </string-name>
          ,
          <article-title>Overview of HOMO-LAT at IberLEF 2025: Human-centric polarity detection in Online Messages Oriented to the Latin American-speaking lgbtq+ populaTion</article-title>
          ,
          <source>Procesamiento del lenguaje natural 75</source>
          (
          <year>2025</year>
          )
          <article-title>-</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sierra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vásquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <article-title>Overview of HOMO-MEX at Iberlef 2023: Hate speech detection in Online Messages directed Towards the MEXican spanish speaking LGBTQ+ population</article-title>
          ,
          <source>Procesamiento del lenguaje natural 71</source>
          (
          <year>2023</year>
          )
          <fpage>361</fpage>
          -
          <lpage>370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Andersen</surname>
          </string-name>
          , Scott Thomas Vásquez, ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alcántara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soto</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Cesar, Overview of HOMO-MEX at IberLEF 2024: Hate Speech Detection Towards the Mexican Spanish speaking LGBT+ Population, Procesamiento del lenguaje natural 73 (</article-title>
          <year>2024</year>
          )
          <fpage>393</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Łukasz</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>2017</volume>
          <source>-December, Neural information processing systems foundation, 2017</source>
          , pp.
          <fpage>5999</fpage>
          -
          <lpage>6009</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Grossberg</surname>
          </string-name>
          ,
          <source>Recurrent neural networks, Scholarpedia</source>
          <volume>8</volume>
          (
          <year>2013</year>
          )
          <year>1888</year>
          . doi:
          <volume>10</volume>
          .4249/ scholarpedia.
          <year>1888</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bebis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Georgiopoulos</surname>
          </string-name>
          ,
          <article-title>Why network size is so important</article-title>
          ,
          <source>IEEE Potentials 13</source>
          (
          <year>1994</year>
          )
          <fpage>27</fpage>
          -
          <lpage>31</lpage>
          . doi:
          <volume>10</volume>
          .1109/45.329294.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Niu</surname>
          </string-name>
          , G. Zhong,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A review on the attention mechanism of deep learning (</article-title>
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          . 1016/j.neucom.
          <year>2021</year>
          .
          <volume>03</volume>
          .091.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sanford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Telgarsky</surname>
          </string-name>
          ,
          <article-title>Representational strengths</article-title>
          and limitations of transformers,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Naveed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saqib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Usman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mian</surname>
          </string-name>
          ,
          <article-title>A comprehensive overview of large language models (</article-title>
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2307.06435.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Pan</surname>
          </string-name>
          , L. Bing,
          <article-title>Sentiment analysis in the era of large language models: A reality check</article-title>
          ,
          <source>in: NAACL-HLT</source>
          ,
          <year>2023</year>
          . URL: https://api.semanticscholar.org/CorpusID: 258866189.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <source>Sentiment Analysis: Mining Opinions</source>
          , Sentiments, and
          <string-name>
            <surname>Emotions</surname>
          </string-name>
          ,
          <year>2015</year>
          . doi:
          <volume>10</volume>
          .1017/ CBO9781139084789.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fatemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>A comparative analysis of fine-tuned llms and few-shot learning of llms for ifnancial sentiment analysis</article-title>
          ,
          <source>ArXiv abs/2312</source>
          .08725 (
          <year>2023</year>
          ). URL: https://api.semanticscholar.org/ CorpusID:266210291.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Poria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hazarika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <article-title>Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <fpage>108</fpage>
          -
          <lpage>132</lpage>
          . doi:
          <volume>10</volume>
          .1109/TAFFC.
          <year>2020</year>
          .
          <volume>3038167</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>C. Wu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Xue</surname>
          </string-name>
          ,
          <article-title>Evaluating zero-shot multilingual aspectbased sentiment analysis with large language models</article-title>
          ,
          <source>ArXiv abs/2412</source>
          .12564 (
          <year>2024</year>
          ). URL: https: //api.semanticscholar.org/CorpusID:274789129.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wankhade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C. S.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <article-title>A survey on aspect base sentiment analysis methods and challenges</article-title>
          ,
          <source>Applied Soft Computing</source>
          <volume>167</volume>
          (
          <year>2024</year>
          )
          <article-title>112249</article-title>
          . URL: https://www.sciencedirect. com/science/article/pii/S1568494624010238. doi:https://doi.org/10.1016/j.asoc.
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>