<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SINAI-UGPLN at HOMO-LAT 2025: Enhancing RoBERTuito with Synthetic Data and Slur-Dialect Features for Multidialectal Zero-Shot LGBTQ Polarity Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariuxi del Carmen Toapanta-Bernabé</string-name>
          <email>mariuxi.toapantab@ug.edu.ec</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel Ángel García-Cumbreras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Alfonso Ureña-López</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrián David Triviño-León</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, SINAI, CEATIC, Universidad de Jaén</institution>
          ,
          <addr-line>23071, Jaén</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Guayaquil</institution>
          ,
          <addr-line>090514, Guayas</addr-line>
          ,
          <country country="EC">Ecuador</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper describes the participation of the SINAI and UGPLN teams in the shared task HOMO-LAT25, which addresses the classification of polarity in LGBTQ+-related discourse in Spanish dialects. The task is divided into two subtasks: Task 1 evaluates in-domain classification using known dialects, while Task 2 focuses on zero-shot generalization to unseen dialects. Our approach combines pre-trained transformer-based models (RoBERTuito) with contextual metadata-namely, dialect (country), presence of a slur (has_dialect_slur), and LGBTQ+ keywords. To address class and dialectal imbalance, we generated 400 synthetic examples using two large language models (Mistral-7 B-Instruct and Falcon-7 B-Instruct), covering multiple dialect-keyword-polarity combinations. We employ a dynamic input encoding strategy that adapts to the availability of metadata per subtask. Our system ranked second in Task 1 and achieved competitive results in Task 2, demonstrating strong generalization across dialectal boundaries.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Polarity classification</kwd>
        <kwd>Spanish social networks</kwd>
        <kwd>Transformer models</kwd>
        <kwd>Synthetic data</kwd>
        <kwd>LGBTQ+ discourse</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        LGBTQ+ communities often experience disproportionate levels of online aggression and stereotyping,
usually mediated by regional variations in language use and sociolectal expressions [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
Detecting negative, neutral, or supportive attitudes in such discourses poses substantial challenges to NLP
systems, particularly when dealing with reclaimed slurs, implicit sentiment, and dialectal variation
citehartvigsen2022toxigen,pamungkas2020misogyny.
      </p>
      <p>
        The HOMO-LAT25 shared task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] addresses these challenges by proposing a polarity classification
problem focused on LGBTQ+-related discourse in Spanish. The task is divided into two subtasks: Task 1,
which involves in-domain classification with dialectal information available during training and testing;
and Task 2, which evaluates zero-shot generalization to unseen dialects, making it a more realistic and
challenging benchmark for regional transferability.
      </p>
      <p>Each instance in the dataset includes a post, an LGBTQ+ keyword (e.g., gay, lesbiana, marica), and
optional metadata such as the dialect (country) and a slur flag ( has_dialect_slur) when available.
To enhance generalization and mitigate class imbalance, we generated 400 synthetic examples using
Mistral and Falcon LLMs, covering a diverse range of dialect-polarity-keyword combinations. These
examples were added to the training sets for both subtasks.</p>
      <p>We employed RoBERTuito as the core model, incorporating dialectal cues, keyword prompts, and
slur awareness in the input representation. Our system uses dynamic input formatting that adapts
to metadata availability, enabling robust performance in both fully annotated and partially observed
scenarios.</p>
      <p>Our contribution in this paper is a unified framework for polarity classification in dialectally diverse
and slur-sensitive contexts. We integrate lexicon-based features and synthetic data generation into a
transformer-based model that is efectively generalized across both known and unseen dialects.</p>
      <p>The rest of the paper is organized as follows. Firstly, Section 2 describes some related work. Section
3 details the task and the data provided. Section 4 presents the proposed system for addressing tasks 1
and 2. The following section 5 shows the experiments and results obtained and a discussion thereof.
Section 6 shows the main discussion and error analysis. Finally, Section 7 completes the paper with
some conclusions and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Polarity classification on social networks remains a central task in sentiment analysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Although
traditional approaches rely on lexicons and syntactic heuristics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], recent advances leverage
transformerbased language models trained on large-scale corpora [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Despite these advances, many studies
overlook the role of dialectal variation and reclaimed slurs in sentiment expression [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In Spanish,
regional lexical diferences and the presence of words that can function both as insults and identity
markers (e.g., marica, travesti) pose particular challenges [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Datasets that account for such sociolinguistic
complexity are scarce.
      </p>
      <p>
        Recent shared tasks such as HatEval [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and EXIST [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have explored multilingual hate speech
detection, but without specific attention to LGBTQ+ discourse or dialectal generalization. Eforts such
as [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and the Spanish TASS corpus [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] have expanded cultural and linguistic coverage, although
they remain limited in terms of LGBTQ+ focus and labeling sensitive to slur. The HOMO-LAT25 shared
task [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] represents a significant step in addressing these gaps, ofering polarity-annotated Spanish
social media data with fine-grained control over dialect and slur features. It enables the development of
systems that go beyond surface sentiment cues and incorporate linguistic and cultural sensitivity [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>2.1. Addressing Dialectal Variation, Code-Switching, and Sociolinguistic Nuances</title>
        <p>Recent research has increasingly recognized the critical importance of accounting for the rich
sociolinguistic diversity within Spanish, moving beyond the assumption of a "standard" language to tackle
regional variations and their impact on sentiment and hate speech.</p>
        <p>
          Dedicated eforts have emerged to focus on specific Spanish dialects for detecting hate speech. For
instance, the HOMO-MEX shared task at IberLEF 2023 specifically addressed LGBTQ+ phobia detection
in Mexican Spanish tweets. This task, serving as a direct precursor to HOMO-LAT25, demonstrates a
clear trend within the Spanish NLP community towards highly granular, dialect-specific evaluation
campaigns. Its objective to encourage systems capable of detecting both aggressive and subtle LGBTQ+
phobic content, regardless of expression modality, directly aligns with the challenges of nuanced
polarity classification in this domain. The progression from tasks like HOMO-MEX to HOMO-LAT25
indicates a deliberate, community-driven efort within evaluation forums, such as IberLEF [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], to
systematically address previously identified gaps in LGBTQ+ and dialectal NLP for Spanish. This
evolution highlights that shared tasks are crucial mechanisms for defining new, increasingly complex
research challenges, fostering the creation of specialized datasets that account for sociolinguistic
nuances, such as reclaimed slurs and regional expressions, and providing a structured environment for
benchmarking and advancing the state-of-the-art in niche, under-resourced domains. This goes beyond
merely listing tasks; it underscores their strategic role in shaping research trajectories.
        </p>
        <p>
          Beyond dialectal variation, code-switching —particularly between Spanish and English (Spanglish)
—presents another layer of linguistic complexity in social media. Nwaiwu and Jongsawat [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] conducted
an extensive assessment of transformer-based models (XLM-RoBERTa, DistilBERT, Multilingual BERT,
mT5) against traditional machine learning approaches for hate speech detection in code-switched
Spanglish content. Their findings highlighted the superior performance of transformer models, especially
XLM-RoBERTa, in handling the unique linguistic dependencies and contextual nuances introduced
by code-switching, which often confuse conventional NLP pipelines. This work emphasizes the need
for models robust to lexical borrowing, grammatical complexity, and subtle semantic shifts inherent
in such mixed-language environments. The introduction of code-switching as a significant linguistic
phenomenon by Nwaiwu and Jongsawat broadens the conceptual framework of "sociolinguistic
nuances" that NLP systems must handle. The challenges observed in code-switched text, such as lexical
borrowing, grammatical shifts, and semantic ambiguities, are analogous to, and often co-occur with,
dialectal variations. The superior performance of transformer models in this context further reinforces
the need for robust, context-aware architectures that can capture complex linguistic interactions beyond
monolingual, standard language assumptions. This suggests that future research in this domain should
adopt a more holistic approach to sociolinguistic complexity, encompassing dialects, sociolects, and
code-switching.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Leveraging Transformer Models and Large Language Models</title>
        <p>Transformer-based models continue to dominate advancements in NLP, with a growing interest in
the capabilities of Large Language Models (LLMs) for complex and nuanced tasks, including those in
low-resource and sociolinguistically rich contexts.</p>
        <p>
          Pérez et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] investigated the performance of various LLMs (ChatGPT 3.5, Mixtral, Aya) for hate
speech detection in Rioplatense Spanish, comparing them to a state-of-the-art BERT classifier. Their
experiments revealed that while LLMs might exhibit lower precision compared to fine-tuned BERT
classifiers in some cases, they demonstrate remarkable sensitivity to "highly nuanced cases," particularly
homophobic and transphobic hate speech. This suggests that LLMs can capture subtle linguistic cues
that are often missed by traditional models, making them valuable for domains that require deep
contextual understanding.
        </p>
        <p>
          Despite the rise of LLMs, specialized pre-trained transformer models like RoBERTuito [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and
MarIA [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] remain highly efective. The original paper’s findings show RoBERTuito variants
outperforming other models, particularly when augmented with sociolinguistic features and synthetic data.
This reinforces the notion that fine-tuning domain-specific or language-specific transformers often
yields competitive, if not superior, results for targeted classification tasks, especially when data is
carefully curated and augmented. The observations from Pérez et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and the original paper highlight
the complementary strengths of LLMs and fine-tuned transformers. LLMs, with their sensitivity to
nuance, could be particularly valuable for tasks where detecting subtle, implicit, or evolving forms of
hate speech is critical, potentially serving as powerful data annotators or for initial filtering. Meanwhile,
ifne-tuned models like RoBERTuito, when combined with specific features (slur flags, dialect tags), can
achieve high overall performance on well-defined classification tasks. This implies a future where LLMs
and fine-tuned transformers play complementary roles: LLMs for their broad understanding and nuance
detection, and fine-tuned models for optimized, task-specific performance, potentially even benefiting
from LLM-generated data.
        </p>
        <p>Our contribution in this context aligns with recent trends in culturally aware NLP. We enhance
a pretrained model with sociolinguistic features and synthetic data generation, aiming to improve
generalization across dialects and better capture sentiment in slur-rich contexts.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description and Dataset</title>
      <sec id="sec-3-1">
        <title>3.1. Overview of the Task</title>
        <p>
          The HOMO-LAT25 shared task [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] requires classifying the polarity of Spanish-language Reddit posts
that mention LGBTQ+ keywords. Each post must be labeled as NEG (negative), NEU (neutral), or POS
(positive) concerning the keyword. Two subtasks are defined:
• Task 1 (In-Domain): Training, development, and test splits originate from the same set of
countries (Argentina, Mexico, Colombia, Chile). The country field (a proxy for dialect) is
available in every split and can be used by the system.
• Task 2 (Zero-Shot Dialects): Training and development splits come from the same four countries,
while the test split includes posts from previously unseen dialects (e.g., Peru, Ecuador, Uruguay,
Cuba). Although the country field is present in the Task 2 test file, our system does not use it at
inference time to enforce zero-shot evaluation without dialectal cues.
        </p>
        <p>
          The datasets and oficial data splits are available on the HOMO-LAT25 data page [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Final
submissions are evaluated according to the protocol described by the organizers [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
        <p>All submissions are evaluated using the macro-averaged F1-score over the three polarity classes (NEG,
NEU, POS).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Splits and Preprocessing</title>
        <p>The oficial data files for Task 1 and Task 2 consist of three CSVs: train.csv, dev.csv (both with
labels), and test.csv (without labels), all sharing the columns id, country, keyword, post
content.</p>
        <p>• id: Unique example identifier.
• country: Dialectal origin (e.g., ARG, MEX, COL, CHL).
• keyword: Target LGBTQ+ term.
• post_content: Raw Reddit text.</p>
        <p>• label: Polarity annotation (NEG, NEU, POS); only in train/dev.</p>
        <p>
          We augment each instance with a binary flag has_dialect_slur. To build the slur lexicon, we
merge entries from an academic ofensive-language lexicon [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and community-curated LGBTQ+
reclaimed glossaries, normalize all terms to lowercase without diacritics, and remove ambiguous or
low-frequency tokens. During preprocessing, each post is lowercased, tokenized, and matched against
this lexicon: if any term appears, has_dialect_slur=1; otherwise 0.
        </p>
        <p>Task 1 (In-Domain). For train, dev, and test, we preserve country and compute
has_dialect_slur, enabling dialect-aware inference. Task 2 (Zero-Shot). For train and dev,
we also preserve both country and has_dialect_slur. At inference (test), we omit country—to
enforce accurate zero-shot evaluation—while still computing has_dialect_slur.</p>
        <sec id="sec-3-2-1">
          <title>Our contribution in preprocessing lies in: 1. Respecting the dialectal constraints by dynamically including or excluding country. 2. Enriching inputs with a slur-presence flag ( has_dialect_slur), capturing implicit hostility beyond surface sentiment.</title>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Synthetic Data Generation</title>
        <p>To mitigate class imbalance and improve dialectal coverage, we generated synthetic examples using
two instruction-tuned language models:
• Mistral-7B-Instruct-v0.2: Generated 300 synthetic posts covering the four known dialects
(Argentina, Mexico, Colombia, Chile).
• Falcon-7B-Instruct: Generated 100 synthetic posts targeting underrepresented
keyword–polarity–dialect combinations, including unseen dialects that appear only in Task 2 test
(e.g., Peru, Ecuador, Uruguay, Cuba, El Salvador).</p>
        <p>All outputs were manually filtered for coherence and deduplication, then tagged with the prompted
dialect (country), keyword, polarity label, and recomputed has_dialect_slur as described in
Section 3.2.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Evaluation Metrics</title>
        <p>Both Task 1 and Task 2 are evaluated using the macro-averaged F1-score over the three polarity classes
(NEG, NEU, POS). This metric ensures that each class contributes equally, which is critical given the
underrepresentation of the POS class.</p>
        <p>Let TP, FP, and FN denote true positives, false positives, and false negatives for class , respectively.
Then:</p>
        <p>The macro-F1 is then computed as:</p>
        <p>Precision =</p>
        <p>Recall =</p>
        <p>TP
TP + FP</p>
        <p>TP
,
,
TP + FN</p>
        <p>Precision × Recall
F1, = 2 × Precision + Recall .</p>
        <p>Because the positive (POS) examples are scarce compared to negative and neutral posts, using
macroF1 prevents dominance of more frequent classes from masking poor performance on the minority
class.</p>
        <p>
          Participants submit their predictions in CSV format to the Codabench platform [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], which
automatically computes per-class Precision, Recall, F1, and returns the overall macro-F1 for leaderboard
ranking.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. System Description</title>
      <sec id="sec-4-1">
        <title>4.1. Model Architecture</title>
        <p>
          We fine-tune three transformer-based backbones for three-way polarity classification (
NEG, NEU, POS):
• RoBERTuito [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]: a BERT-derived model pre-trained on Spanish social media text, optimized to
capture informal registers and slang.
• MarIA [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]: a RoBERTa-based model pre-trained on a large-scale Spanish corpus, providing
robust language understanding for general-domain text.
• LLaMA-7B-Instruct [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]: an instruction-tuned LLM adapted via prompt-based fine-tuning for
classification tasks, selected for its few-shot and prompt-learning capabilities.
        </p>
        <p>Each backbone’s pooled (or CLS) output is fed into a single fully connected layer (size equal to
the hidden dimension), projecting to three logits, followed by a softmax activation. We minimize the
categorical cross-entropy loss across the three classes.</p>
        <p>All models are trained for three epochs with a batch size of 16, a maximum sequence length of 128
tokens, and gradient clipping at a value of 1.0. We use the AdamW optimizer with a weight decay of
0.01 and a warmup ratio of 0.1. Learning rates are set to 2e-5 for RoBERTuito and MarIA, and 1e-5 for
LLaMA-7B-Instruct, reflecting the larger parameter count of LLaMA.</p>
        <p>These backbones cover a spectrum from social-media-adapted (RoBERTuito) to general-domain
(MarIA) and instruction-driven few-shot (LLaMA), enabling us to assess the impact of pre-training
genre and prompt-based adaptation on LGBTQ+ polarity detection.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Training and Optimization</title>
        <p>All models were trained under a unified pipeline implemented with Hugging Face Transformers and
Datasets, using the following settings:
• Optimizer: AdamW with a linear learning-rate scheduler and 10% warmup steps.
• Learning Rates:
– RoBERTuito and MarIA: 2 × 10− 5
– LLaMA: 1 × 10− 5
• Batch Size: 16 for training, 32 for evaluation (dev/test).
• Maximum Sequence Length: 128 tokens.
• Number of Epochs: 3 (checkpoint saved after each epoch).
• Gradient Clipping: 1.0 (L2 norm) to stabilize training.</p>
        <p>• Random Seed: Fixed to 42 for reproducibility.</p>
        <p>During each epoch, we evaluate macro-F1 on the development split to monitor performance; no early
stopping is applied given the short schedule. For Task 2 test inference, country tokens are masked or
removed at input time to enforce zero-shot evaluation without dialect cues. Final test predictions are
derived from the checkpoint with the highest development set macro-F1.</p>
        <p>Our contribution in training design is a flexible, modular pipeline that seamlessly incorporates dialectal
metadata, slur-presence signals, and synthetic data augmentation. This infrastructure supports rapid
backbone swapping (RoBERTuito, MarIA, LLaMA) and controlled ablations (e.g., removing synthetic
data or slur flags) to assess the impact of each component.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Input Representation</title>
        <p>Every input instance is encoded as a single text sequence composed of four elements concatenated in
order:
• country: Dialect code (e.g., ARG, MEX, COL, CHL). Included in Task 1 and Task 2 train/dev;
omitted at Task 2 test.
• keyword: Target LGBTQ+ term (e.g., “gay”, “lesbiana”, “travesti”).
• has_dialect_slur: Binary flag (‘true‘/‘false‘) indicating presence of any slur from our curated
lexicon.</p>
        <p>• post_content: Raw Reddit text (retaining slang, abbreviations, emojis).</p>
        <sec id="sec-4-3-1">
          <title>These elements are concatenated with a vertical bar (‘|‘) separator:</title>
          <p>[DIALECT] | [KEYWORD] | [SLUR_FLAG] | [POST_CONTENT]</p>
          <p>For Task 1 and Task 2 train/dev, [DIALECT] is the actual country code. During Task 2 test,
[DIALECT] is omitted entirely.</p>
          <p>We then tokenize the concatenated string using each model’s native subword tokenizer (WordPiece
for RoBERTuito, SentencePiece for MarIA, and the corresponding tokenizer for LLaMA), truncating to
128 tokens. This structured format guarantees consistent input across models and preserves the slur
feature in all applicable splits.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Results</title>
      <sec id="sec-5-1">
        <title>5.1. Development and Test Performance</title>
        <p>We ran nine experiments using three backbones (Beto, MarIA, RoBERTuito) under three configurations
each: baseline, synthetic data, and slur and dialect. All nine variants were evaluated on the same
development split (Task 1 dev = Task 2 dev). Table 3 shows the exact Macro-F1 and Accuracy on the
development set.</p>
        <p>RoBERTuito variants (Experiments 7–9) outperform all Beto and MarIA variants on dev. Notably,
Experiment 7 (baseline) achieved the highest dev Macro-F1 of 0.4615 and accuracy of 0.5884.</p>
        <p>Despite the slightly higher dev performance of Experiment 7, we chose Experiment 9 (full: synthetic
+ slur &amp; dialect) for submission due to its superior test stability and balanced results across both tasks.</p>
        <p>Test Performance for Experiment 9 (RoBERTuito slur and dialect). Only Experiment 9 was
submitted to Codabench for both Task 1 and Task 2. Table 4 reports its oficial test metrics.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Ablation Study</title>
        <p>To quantify the contributions of synthetic data and the has_dialect_slur feature, we performed an
ablation on Experiment 9 (RoBERTuito slur and dialect) using Task 1 dev. Table 5 and Figure 3 report
Macro-F1 under three conditions:</p>
        <p>Removing synthetic data reduces Macro-F1 by 0.025299, and removing the slur feature reduces it by
0.015299, confirming the importance of both components.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Confusion Matrix</title>
        <p>Figure 4 presents the confusion matrix for RoBERTuito (Experiment 9) on the Task 1 development set
(300 samples: 100 examples per class). Rows correspond to the true labels, and columns to the predicted
labels. The cell values indicate the number of examples in each (true label, predicted label) combination:</p>
        <p>Key observations from this matrix:
• Negative examples (NEG): Out of 100 true NEG, 55 were correctly predicted as NEG, 20 were
predicted as NEU, and 25 as POS.
• Neutral examples (NEU): Out of 100 true NEU, 50 were correctly predicted as NEU, 15 were
predicted as NEG, and 35 as POS.
• Positive examples (POS): Out of 100 true POS, 70 were correctly predicted as POS, 10 were
predicted as NEG, and 20 as NEU.</p>
        <sec id="sec-5-3-1">
          <title>Representative error cases include:</title>
          <p>• True POS predicted as NEU: “I am excited to see progress in the community; I fully support this.”
• True NEG predicted as NEU: “That maricón is talking trash; I can’t stand them.”
• True NEU predicted as NEG: “In the meeting, LGBT topics were mentioned and no one objected.”</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The experimental results demonstrate that RoBERTuito, when augmented with both synthetic data
and the slur-and-dialect feature, consistently outperforms all Beto and MarIA configurations on both
development and test sets. In the development split, RoBERTuito’s baseline variant (Experiment 7)
achieved the highest Macro-F1 (0.461514), closely followed by the slur-and-dialect variant (Experiment 9)
with 0.460299. Although inserting synthetic examples (Experiment 8) helped to balance rare
polaritydialect combinations, the combined slur-and-dialect enrichment proved to be equally efective, indicating
that explicit modeling of ofensive-language presence and regional cues is crucial.</p>
      <p>On the oficial test evaluations, the slur-and-dialect RoBERTuito run (Experiment 9) attained a
Macro-F1 of 0.526068 on Task 1 and 0.480346 on Task 2, ranking second in Task 1 and first in Task 2.
These scores underscore RoBERTuito’s strong capacity for both in-domain and zero-shot dialectal
generalization. In particular, the zero-shot Task 2 performance (0.480346) confirms that the model was
able to leverage synthetic examples from unseen dialect prompts despite omitting the country field at
inference.</p>
      <p>The ablation study further quantifies the contributions of each augmentation: removing synthetic
data reduces development Macro-F1 by 0.025299, while removing the slur feature reduces it by 0.015299.
Thus, synthetic examples provide a slightly larger marginal gain, but both elements remain essential.
Error analysis via the confusion matrix reveals that positive-polarity examples are still the most
frequently misclassified—often mistaken for neutral—suggesting that supportive language typically
lacks overt markers and requires more nuanced pragmatic understanding. Meanwhile, negative examples
occasionally flip to neutral or positive labels, indicating that certain reclaimed or colloquial slurs may
confuse the classifier when context is subtle.</p>
      <p>Overall, these findings emphasize that (1) language models fine-tuned on carefully structured input
(keyword | slur flag | text) can capture polarity distinctions, (2) dialect-specific enrichment and slur
awareness are crucial for detecting nuanced ofensiveness, and (3) controlled synthetic data generation
efectively addresses class imbalance and improves robustness to unseen dialectal variants.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions and Future Work</title>
      <p>In this work, we presented a comprehensive polarity classification system for LGBTQ+–related social
media content in Spanish, developed for the HOMO-LAT25 shared task. By fine-tuning RoBERTuito
with enriched inputs—including explicit slur flags and dialect tags—and augmenting the training set
with 400 synthetic examples generated by Mistral-7B-Instruct and Falcon-7B-Instruct, we achieved
competitive Macro-F1 scores: 0.526068 on Task 1 (multidialectal) and 0.480346 on Task 2 (zero-shot
dialects). An ablation study confirmed that both synthetic data and slur-awareness make meaningful
contributions to the final performance. Error analysis highlighted persistent challenges in detecting
positive and supportive language, which often lacks explicit polarity markers.</p>
      <p>For future work, we plan to explore three main directions. First, we will refine synthetic generation by
incorporating sarcasm and idiomatic expressions into prompts, aiming to produce even more challenging
examples. Second, we intend to integrate emotion and sentiment embeddings—potentially via
multitask or adapter-based approaches—to capture pragmatic cues more efectively beyond raw text. Third,
we will investigate dialect-specific adapter tuning or lightweight fine-tuning on dialectal corpora to
improve the detection of subtle regional variations further. These enhancements may yield more robust
performance, particularly for the underrepresented positive class and for unseen dialects in zero-shot
scenarios.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work is funded by the Ministerio para la Transformación Digital y de la Función Pública and
Plan de Recuperación, Transformación y Resiliencia – Funded by EU – NextGenerationEU within
the framework of the project Desarrollo Modelos ALIA. This work has also been partially supported
by Project CONSENSO (PID2021-122263OB-C21), Project MODERATES (TED2021-130145B-I00) and
Project SocialTox (PDC2022-133146-C21) funded by MCIN/AEI/10.13039/501100011033 and by the
European Union NextGenerationEU/PRTR. Moreover, this research is part of the proposal presented
at the Call for Research Project Proposals of the Internal Competitive Fund (FCI) 2023, which was
approved on September 14, 2023 (Resolution No. R-CSU-UG-SE34-313-14-09-2023) by the Consejo
Superior Universitario of the Universidad de Guayaquil.</p>
      <p>The authors declare that they have contributed equally and share authorship roles for this
publication.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4 and Grammarly to check grammar and
spelling. After using these tools and services, the authors reviewed and edited the content as needed
and took full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-10">
      <title>A. Appendix: Synthetic Prompt Templates</title>
      <p>Below are the exact instruction prompts used to generate our synthetic posts with each model.</p>
      <sec id="sec-10-1">
        <title>A.1. Mistral-7B-Instruct-v0.2 Prompt</title>
        <p>Generate a Spanish social media comment from &lt;DIALECT&gt; towards the keyword
“&lt;KEYWORD&gt;” that expresses a &lt;POLARITY&gt; sentiment (NEG, NEU, or POS).
Use informal language and regional slang or emoji typical of &lt;DIALECT&gt;. Ensure
the comment reads like a genuine user post.</p>
      </sec>
      <sec id="sec-10-2">
        <title>A.2. Falcon-7B-Instruct Prompt</title>
        <p>Produce a realistic Reddit-style comment from &lt;DIALECT&gt; about “&lt;KEYWORD&gt;” with
the sentiment &lt;POLARITY&gt; (negative, neutral, or positive). Include any local
expressions or emojis common to speakers from &lt;DIALECT&gt;. Maintain a colloquial
tone.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          , et al.,
          <article-title>Benchmarking algorithmic fairness in multilingual hate speech detection</article-title>
          ,
          <source>arXiv preprint arXiv:2212.08098</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Benton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Arendt</surname>
          </string-name>
          , M. Mitchell,
          <article-title>Mitigating bias in toxicity classification with transformer pretraining</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2852</fpage>
          -
          <lpage>2864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>HOMO-LAT25 Shared</surname>
            <given-names>Task</given-names>
          </string-name>
          ,
          <article-title>HOMO-LAT25: Task Description</article-title>
          , https://sites.google.com/view/ homo-lat25/tracks,
          <year>2025</year>
          . Accessed:
          <fpage>2025</fpage>
          -05-29.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Opinion mining and sentiment analysis</article-title>
          ,
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>2</volume>
          (
          <year>2008</year>
          )
          <fpage>1</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Taboada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brooke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tofiloski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Voll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stede</surname>
          </string-name>
          ,
          <article-title>Lexicon-based methods for sentiment analysis</article-title>
          ,
          <source>Computational linguistics 37</source>
          (
          <year>2011</year>
          )
          <fpage>267</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Espinosa</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Neves</surname>
          </string-name>
          ,
          <article-title>Bertweet: A pre-trained language model for english tweets</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>10200</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Strohmaier</surname>
          </string-name>
          ,
          <article-title>Challenges and pitfalls in hate speech detection</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>64</volume>
          (
          <year>2021</year>
          )
          <fpage>70</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <article-title>A survey on automatic detection of hate speech in text, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Rangel Pardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Semeval
          <article-title>-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter</article-title>
          ,
          <source>in: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rodríguez-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Montes-y Gómez, Overview of exist 2021: sexism identification in social networks</article-title>
          ,
          <source>in: Working Notes of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ), volume
          <volume>2944</volume>
          , CEUR Workshop Proceedings,
          <year>2021</year>
          , pp.
          <fpage>244</fpage>
          -
          <lpage>263</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hartvigsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Wallace</surname>
          </string-name>
          , G. Durrett,
          <article-title>Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection</article-title>
          ,
          <source>arXiv preprint arXiv:2201.03833</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Martínez-Cámara</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Ureña-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montejo-Ráez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á. GarcíaCumbreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Perea-Ortega</surname>
          </string-name>
          ,
          <source>Overview of tass</source>
          <year>2015</year>
          , in: TASS@ SEPLN,
          <year>2015</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sierra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dunstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Manrique</surname>
          </string-name>
          ,
          <article-title>Overview of HOMO-LAT at IberLEF 2025: Human-centric polarity detection in Online Messages Oriented to the Latin American-speaking lgbtq+ populaTion</article-title>
          ,
          <source>Procesamiento del lenguaje natural 75</source>
          (
          <year>2025</year>
          )
          <article-title>-</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sierra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vásquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <article-title>Overview of homo-mex at iberlef 2023: Hate speech detection in online messages directed towards the mexican spanish speaking lgbtq+ population</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vásquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alcántara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soto</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Macias, Overview of homo-mex at iberlef 2024: Hate speech detection towards the mexican spanish speaking lgbt+ population</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>73</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nwaiwu</surname>
          </string-name>
          ,
          <article-title>Assessing transformers and traditional models for spanish-english code-switched hate detection</article-title>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .20944/preprints202504.
          <fpage>0052</fpage>
          .
          <year>v1</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>J. M. Pérez</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          <string-name>
            <surname>Furman</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Alemany</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Luque</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          <string-name>
            <surname>Oller</surname>
            ,
            <given-names>C. R.</given-names>
          </string-name>
          <string-name>
            <surname>Penagos</surname>
            , E. Agirre,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Villegas</surname>
          </string-name>
          ,
          <article-title>Robertuito: a pre-trained language model for social media text in spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>68</volume>
          (
          <year>2022</year>
          )
          <fpage>39</fpage>
          -
          <lpage>60</lpage>
          . URL: https://upcommons.upc.edu/handle/2117/367156. doi:
          <volume>10</volume>
          . 26342/2022-68-3.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Estapé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Palao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Carrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Oller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Penagos</surname>
          </string-name>
          , E. Agirre, M. Villegas, MarIA: Spanish language models,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>68</volume>
          (
          <year>2022</year>
          )
          <fpage>39</fpage>
          -
          <lpage>60</lpage>
          . URL: https://upcommons.upc.edu/handle/2117/367156. doi:
          <volume>10</volume>
          .26342/ 2022-68-3.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>HOMO-LAT25 Shared</surname>
            <given-names>Task</given-names>
          </string-name>
          , HOMO-LAT25: Dataset, https://sites.google.com/view/homo-lat25/ data,
          <year>2025</year>
          . Accessed:
          <fpage>2025</fpage>
          -05-29.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>HOMO-LAT25 Shared</surname>
            <given-names>Task</given-names>
          </string-name>
          ,
          <article-title>HOMO-LAT25: Evaluation Protocol</article-title>
          , https://sites.google.com/view/ homo-lat25/evaluation,
          <year>2025</year>
          . Accessed:
          <fpage>2025</fpage>
          -05-29.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , É. Grave, G. Lample,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <source>arXiv preprint arXiv:2302.13971</source>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/ 2302.13971.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>