<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ITST at SatiSPeech-IberLEF 2025: Leveraging Transformers for Textual and Multimodal Satire Detection in Spanish</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario Andrés Paredes-Valverde</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>María del Pilar Salas-Zárate</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tecnológico Nacional de México/I.T.S. Teziutlán</institution>
          ,
          <addr-line>Fracción l y ll SN, 73960 Teziutlán, Puebla</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <volume>202</volume>
      <fpage>161</fpage>
      <lpage>164</lpage>
      <abstract>
        <p>This paper presents ITST's participation in the SatiSpeech 2025 shared task on satire detection in Spanish. We propose a modular and eficient pipeline that leverages pretrained transformer models to encode linguistic and prosodic features for binary satire classification. For Task 1 (text-only), we extract contextual sentence embeddings using the Spanish RoBERTa-BNE model and train a Support Vector Machine (SVM) for classification. For Task 2 (multimodal), we integrate acoustic information by concatenating Wav2Vec 2.0 audio embeddings with RoBERTa text features. Both models are evaluated on an internal validation split and performed strongly on the oficial leaderboard, achieving second place in Task 1 and fifth place in Task 2. Our results demonstrate that transformer-based embeddings, even when combined through simple early fusion, can deliver robust performance in both textual and multimodal satire detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Satire Speech Recognition</kwd>
        <kwd>Automatic Emotion Recognition</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Transformers</kwd>
        <kwd>SVM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Satire constitutes a complex and context-dependent form of communication, posing significant
challenges to traditional content classification systems. Unlike conventional humor, satire operates through
implicit critique, employing rhetorical devices such as irony, exaggeration, and parody. These strategies
often rely on a shared understanding between speaker and audience, making satire inherently
ambiguous and susceptible to misinterpretation. Accurately interpreting satire often requires understanding
sociocultural context and speaker intent. These layers of meaning are conveyed not only through textual
content but also via prosodic features in speech, including intonation, rhythm, and pitch variation. In
multimodal contexts—where text and audio interact—these interpretive demands increase, necessitating
models that can process both modalities in tandem to detect satire efectively [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        In complex multimodal scenarios where text, audio, and visual elements converge, interpretive
demands increase due to the interplay of linguistic and paralinguistic cues. As a result, satire detection
needs models capable of integrating and reasoning over multiple modalities simultaneously. Multimodal
deep learning, particularly models that jointly process textual and auditory signals, has emerged as a
promising approach to meet this challenge [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>Interest in automatic satire detection has grown, motivated by its potential to combat misinformation,
support content moderation, and enhance media analysis. On digital platforms, satirical content is
frequently misinterpreted as factual news, increasing the risk of misleading interpretations and the
spread of false narratives. Consequently, reliable satire detection systems are essential, particularly in
multilingual and culturally diverse environments.</p>
      <p>
        Previous research has primarily focused on text-based satire detection, with transformer-based
models achieving strong performance on datasets drawn from news articles and social media [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
However, satire is also prevalent in spoken media—such as television programs, podcasts, and online
videos—where vocal delivery plays a critical role. Despite this, benchmarks for multimodal satire
classification remain scarce. Related work on sarcasm and irony detection has demonstrated that
multimodal models, integrating text and audio features, consistently outperform text-only baselines [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        To address this gap, the SatiSpeech [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] shared task at IberLEF 2025 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduces a benchmark for
multimodal satire detection in Spanish. The task comprises two subtasks: (1) satire classification based
on text alone, and (2) multimodal classification using aligned text and audio segments.
      </p>
      <p>
        Our participation in the SatiSpeech at IberLEF 2025 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] shared task involved the development of
customized models for both tasks. To this end, we based our approach on the methodology used in
the EmoSpeech competition at IberLEF 2024 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which also included a multimodal task involving both
audio and text.
      </p>
      <p>
        For Task 1, we employed a text-only architecture using the RoBERTa-base-bne model for Spanish
(MarIA) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Sentence embeddings were extracted from the [CLS] token of the final hidden state and
used as input to a Support Vector Machine (SVM) classifier. We optimized model performance through
grid search over kernel types, regularization parameters, and gamma values.
      </p>
      <p>
        For Task 2 (multimodal classification), we extended this approach by incorporating audio-based
features derived from the Wav2Vec 2.0 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] model for Spanish. Specifically, we extracted embeddings
from the first hidden state vector of each audio sample and concatenated them with the corresponding
RoBERTa-based text embeddings. These multimodal vectors were then used to train a second SVM
classifier, with hyperparameters optimized through the same grid search strategy as in Task 1. This
combined representation allowed the model to capture both semantic content and prosodic cues critical
to satire detection.
      </p>
      <p>The remainder of this paper is organized as follows: Section 3 provides an overview of the shared task
and dataset; Section 2 reviews previous work on satire detection in both textual and multimodal contexts;
Section 4 describes our modeling approaches for both the unimodal and multimodal configurations;
Section 5 presents experimental results and performance comparisons; and Section 6 concludes with
key findings and future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The task of satire detection has gained growing attention in natural language processing due to its
relevance for media analysis, misinformation detection, and content moderation. Satirical content,
while often humorous, relies on subtle rhetorical devices such as irony, parody, and exaggeration, which
present unique challenges for automatic classification systems.</p>
      <p>Early approaches to satire detection focused predominantly on handcrafted linguistic features,
including lexical and syntactic patterns, sentiment polarity mismatches, and stylistic cues [12, 13]. However,
these approaches struggled to generalize across domains and genres, particularly in multilingual and
culturally diverse settings.</p>
      <p>With the advent of transformer-based models, substantial improvements have been reported in satire
and humor detection tasks. Studies such as [14] demonstrate the efectiveness of models like BERT
in capturing contextual and pragmatic nuances of satirical texts, especially when trained on
domainspecific corpora. These models leverage self-attention mechanisms to represent implicit relationships
between tokens, which are critical for decoding irony and sarcasm.</p>
      <p>
        Despite these advances, most research has remained focused on the textual modality. Recent work
has begun to explore multimodal approaches that incorporate acoustic and visual information to
improve satire detection, particularly in spoken or audiovisual content. For example, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposed a
multimodal satire detection framework combining textual and visual cues, while [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] applied audio-text
fusion techniques for emotion and satire classification in Spanish. These studies have shown that
prosodic features such as intonation, rhythm, and speech tempo provide complementary information
for disambiguating satirical intent, especially when textual signals are ambiguous.
      </p>
      <p>Moreover, the integration of multimodal transformer encoders (e.g., Wav2Vec 2.0 for speech and
ViLT for vision) has opened new possibilities for cross-modal learning. Yet, challenges remain in
efectively fusing these heterogeneous representations, as naive concatenation may fail to capture
complex interdependencies between modalities.</p>
      <p>
        In this context, the SatiSpeech 2025 shared task [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] represents a significant step forward by providing
a benchmark dataset for multimodal satire detection in Spanish. It enables systematic evaluation of
models that jointly process text and speech, fostering the development of more robust and culturally
aware satire detection systems.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description</title>
      <p>The SatiSpeech Shared Task, organized as part of IberLEF 2025, targets the automatic detection of
satirical content in Spanish across both textual and multimodal (text + audio) inputs. The task reflects
the inherent complexity of satire, which relies on implicit rhetorical strategies and context, making it a
valuable but challenging target for computational methods in media analysis, misinformation detection,
and discourse understanding.</p>
      <p>The task is divided into two tasks:
• Task 1: Text-Based Satire Detection. Participants are required to classify whether a given
transcript represents satire using only textual information, leveraging features such as lexical
choice, syntactic structure, and figurative language like irony or exaggeration.
• Task 2: multimodal Satire Detection. This subtask introduces additional complexity by
requiring the integration of aligned speech and transcription data. Systems must incorporate both
semantic (textual) and prosodic (acoustic) features—such as pitch, rhythm, and emphasis—for
binary satire classification.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>Participants were provided with the SatirA dataset, a curated collection of Spanish-language speech
segments collected primarily from YouTube. Satirical examples were sourced from television and
webbased comedy programs like El Intermedio, Zapeando, Homo-Zapping, and El Mundo Today. Non-satirical
samples were drawn from journalistic content published by outlets including Antena 3 Noticias, El
Mundo, and BBC News. The dataset includes a variety of dialects and regional accents to encourage the
development of robust models that generalize across linguistic variation.</p>
        <p>Speech segments were automatically extracted using diarization tools, limited to a maximum length
of 25 seconds. Transcriptions were generated using Whisper ASR [15], and a semi-supervised labeling
process was used: automatic predictions were reviewed and corrected by expert annotators to ensure
high-quality annotations.</p>
        <p>The released training set contains approximately 25 hours of labeled content. Our modeling pipeline
uses the entire provided training data for model training. For evaluation and hyperparameter tuning,
we held out 10% of this dataset to create an internal validation set. Table 1 summarizes the distribution
of samples and transcript lengths across both splits.</p>
        <p>Split
Train
Train
Validation
Validation</p>
        <p>Class
Non-satirical
Satirical
Non-satirical
Satirical</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This section describes the approach we used for both tasks of the SatiSpeech Shared Task: text-based
satire detection (Task 1) and multimodal satire detection (Task 2). Instead of end-to-end neural
finetuning, we adopted a feature-based strategy that leverages pretrained transformer models for Spanish,
followed by traditional SVM classifiers. This design ofered a balance between eficiency, interpretability,
and performance.</p>
      <p>Our system was developed in Python using PyTorch, Transformers, and scikit-learn. For
Task 1, we extracted sentence-level embeddings from a RoBERTa model trained on Spanish corpora.
For Task 2, we extended the pipeline by incorporating acoustic embeddings extracted from Wav2Vec
2.0. The final multimodal architecture is illustrated in Figure 1.</p>
      <sec id="sec-4-1">
        <title>4.1. Text-Based Satire Detection (Task 1)</title>
        <p>For the first subtask, we used the pretrained PlanTL-GOB-ES/roberta-base-bne model to encode
Spanish-language transcriptions. Each text was tokenized to a maximum length of 512 tokens with
padding and truncation. From the nfial hidden layer of RoBERTa, we extracted the embedding
corresponding to the [CLS] token, which resulted in a 768-dimensional fixed-length sentence representation.</p>
        <p>These embeddings were then used to train an SVM classifier. We conducted a grid search over kernel
types (rbf, poly), regularization parameter , and kernel coeficient  . The classifier was trained
on the full dataset provided, using a 10% internal validation split to select the best hyperparameters.
Classification performance was evaluated using the macro-averaged F1-score to account for class
imbalance.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Multimodal Satire Detection (Task 2)</title>
        <p>In the second subtask, we incorporated speech-based features into the model. We used the
facebook/wav2vec2-large-xlsr-53-spanish1 model to extract a 1024-dimensional embedding
1https://huggingface.co/facebook/wav2vec2-large-xlsr-53-spanish
from each audio clip. This embedding was obtained by computing the first token vector from the final
hidden layer of the model after processing the waveform at 16kHz.</p>
        <p>We concatenated the 1024-dimensional audio embedding from Wav2Vec 2.0 with the 768-dimensional
RoBERTa text embedding, forming a unified 1792-dimensional multimodal feature vector. This fused
representation was then used as input to a second SVM classifier, also tuned using grid search with the
same hyperparameter configuration as in Task 1.</p>
        <p>This approach allowed us to integrate both semantic (textual) and prosodic (acoustic) signals relevant
to the detection of satire. By separating the embedding extraction and classification stages, the
architecture remains modular and interpretable. Furthermore, the pipeline is eficient to train, GPU-compatible,
and scalable to larger or multilingual datasets.</p>
        <p>As in Task 1, model selection was based on macro-averaged F1-score using the internal validation
split. The overall architecture remained consistent across tasks, with the only diference being the
additional acoustic branch in the multimodal setup.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>During development, we experimented with multiple configurations of the SVM classifier, varying the
kernel type (RBF vs. polynomial), regularization parameter (C), and the inclusion of normalization
techniques. Among these, the RBF kernel with C=1.0 and gamma=’scale’ yielded the best macro
F1 score. Other configurations showed reduced recall on the satirical class, indicating sensitivity to
hyperparameter choices.</p>
      <p>For Task 1, our RoBERTa-based model trained on Spanish transcriptions achieved a macro F1 score
of 0.9394 on the internal validation set. This highlights the strength of monolingual transformers like
RoBERTa-BNE in capturing linguistic markers of satire such as hyperbole, irony, and rhetorical shifts.</p>
      <p>For Task 2, we extended the model by integrating speech representations extracted via Wav2Vec
2.0. The resulting system achieved a slightly higher macro F1 of 0.9412. This marginal improvement
supports the hypothesis that acoustic signals such as intonation, rhythm, and pitch help reinforce
textual signals in satire detection. However, the performance gain remains modest, indicating that early
fusion through concatenation does not fully leverage the expressive richness of audio features.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this work, we presented a lightweight yet efective pipeline for satire detection in Spanish,
developed as part of the SatiSpeech 2025 shared task. Our approach relies on pretrained transformer
models—RoBERTa-BNE for textual embeddings and Wav2Vec 2.0 for audio features—combined through
early fusion and classified using SVMs. The design is modular, reproducible, and computationally
eficient, enabling fast experimentation and high generalization capacity.</p>
      <p>Our models achieved competitive performance on both tasks, securing second place in Task 1 and
iffth place in Task 2 on the oficial leaderboard. The results confirm that monolingual language models
are highly efective in identifying satirical patterns in text, and that the inclusion of audio features
provides additional, though marginal, improvements.</p>
      <p>While our simple fusion method proved suficient to outperform several end-to-end architectures,
future work will explore more advanced integration strategies, such as late fusion, attention-based
weighting, or modality-specific fine-tuning. Furthermore, domain adaptation techniques may help
reduce the gap between validation and test performance.</p>
      <p>Overall, our findings support the utility of transformer-based feature extraction for multimodal satire
detection and highlight the potential of hybrid pipelines combining pretrained models with traditional
classifiers.</p>
      <p>Future work will explore late fusion strategies where separate classifiers for text and audio are
combined via ensemble techniques. Attention-based fusion could dynamically weigh modalities depending
on input characteristics. Modality-specific fine-tuning may enhance performance by aligning internal
representations with domain-specific cues, while domain adaptation strategies such as adversarial
training or corpus alignment could reduce performance drops when transferring to out-of-domain
satire. We also plan to investigate the integration of Large Language Models (LLMs), such as LLaMA
variants and Qwen, to capture high-level pragmatic cues and sociocultural context that are often critical
in satirical expression. Recent studies have demonstrated the eficacy of LLMs in complex classification
tasks, such as hate speech detection, where nuanced intent and linguistic subtleties are critical [16, 17].</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We are grateful to the Tecnológico Nacional de Mexico (TecNM, by its Spanish acronym) for supporting
this work. This research was also sponsored by Secretariat of Science, Humanities, Technology and
Innovation (Secihti).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used DeepL in order to Grammar and spelling check.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <article-title>Cultural diferences in humor perception, usage, and implications</article-title>
          ,
          <source>Frontiers in Psychology</source>
          <volume>10</volume>
          (
          <year>2019</year>
          ). URL: https://api.semanticscholar.org/CorpusID:59307773.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dynel</surname>
          </string-name>
          ,
          <article-title>Beyond a joke: Types of conversational humour, Language and linguistics compass 3 (</article-title>
          <year>2009</year>
          )
          <fpage>1284</fpage>
          -
          <lpage>1299</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Baltrusaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ahuja</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.-P. Morency,</surname>
          </string-name>
          <article-title>Multimodal machine learning: A survey and taxonomy</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>41</volume>
          (
          <year>2019</year>
          )
          <fpage>423</fpage>
          -
          <lpage>443</lpage>
          . URL: https://doi.org/10.1109/TPAMI.
          <year>2018</year>
          .
          <volume>2798607</volume>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2018</year>
          .
          <volume>2798607</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Rodríguez-García</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <source>Spanish meacorpus</source>
          <year>2023</year>
          :
          <article-title>A multimodal speech-text corpus for emotion analysis in spanish from natural environments</article-title>
          ,
          <source>Computer Standards &amp; Interfaces</source>
          <volume>90</volume>
          (
          <year>2024</year>
          )
          <article-title>103856</article-title>
          . URL: https://www.sciencedirect.com/science/ article/pii/S0920548924000254. doi:https://doi.org/10.1016/j.csi.
          <year>2024</year>
          .
          <volume>103856</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <article-title>Compilation and evaluation of the spanish saticorpus 2021 for satire identification using linguistic features and transformers</article-title>
          ,
          <source>Complex &amp; Intelligent Systems</source>
          <volume>8</volume>
          (
          <year>2022</year>
          )
          <fpage>1723</fpage>
          -
          <lpage>1736</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Broniatowski</surname>
          </string-name>
          ,
          <article-title>A multi-modal method for satire detection using textual and visual cues</article-title>
          , in: G. Da San Martino, C. Brew,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Ciampaglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leberknight</surname>
          </string-name>
          , P. Nakov (Eds.),
          <source>Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship</source>
          , Disinformation, and Propaganda,
          <source>International Committee on Computational Linguistics (ICCL)</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>38</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .nlp4if-
          <fpage>1</fpage>
          .4/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bernal-Beltrán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>García-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          , Overview of SatiSPeech at IberLEF 2025:
          <article-title>Multimodal Audio-Text Satire Classification in Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Á. Rodríguez</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>García</surname>
          </string-name>
          <string-name>
            <surname>Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Valencia</given-names>
            <surname>García</surname>
          </string-name>
          , Overview of emospeech at iberlef 2024:
          <article-title>Multimodal speech-text emotion recognition in spanish</article-title>
          ,
          <source>Procesamiento del lenguaje natural 73</source>
          (
          <year>2024</year>
          )
          <fpage>359</fpage>
          -
          <lpage>368</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Estapé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Palao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Carrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Oller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Penagos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <article-title>Maria: Spanish language models</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>68</volume>
          (
          <year>2022</year>
          ). URL: https://upcommons.upc.edu/handle/2117/367156#.YyMTB4X9A-0. mendeley. doi:
          <volume>10</volume>
          .26342/2022-68-3.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Baevski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , A. rahman
          <string-name>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <source>M. Auli, wav2vec 2</source>
          .
          <article-title>0: A framework for self-supervised learning of speech representations</article-title>
          , ArXiv abs/
          <year>2006</year>
          .11477 (
          <year>2020</year>
          ). URL: https://api.semanticscholar. org/CorpusID:219966759.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>