<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Rogushina); rodrigo@um.es (R. Martínez-Béjar)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>UKR at SatiSPeech-IberLEF 2025: Multimodal Satire Detection in Spanish with a BETO-Based Text Encoder and MFCC-Derived Audio Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anatoly Gladun</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julia Rogushina</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rodrigo Martínez-Béjar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Facultad de Informática, Universidad de Murcia, Campus de Espinardo</institution>
          ,
          <addr-line>30100 Murcia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Software Systems of National Academy of Sciences of Ukraine</institution>
          ,
          <addr-line>40, Acad. Glushkov Avenue, Kyiv, 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>International Research and Training Center of Information Technologies and Systems of National Academy of Sciences of Ukraine and Ministry of Education and Science of Ukraine</institution>
          ,
          <addr-line>40, Acad. Glushkov Avenue, Kyiv, 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This paper presents the UKR team's participation in the SatiSpeech 2025 shared task, focused on the detection of satirical content in Spanish from both text-only and multimodal (text + audio) sources. We propose a supervised ifne-tuning approach using the Spanish monolingual BERT model (BETO) for Task 1, and extend it with MFCCbased acoustic features in Task 2 to capture prosodic information. For classification, we adapt the final layers of the transformer model to integrate textual and audio inputs in a unified architecture. Our system ranked 6th in Task 1 and 9th in Task 2, with validation macro F1 scores of 0.9648 and 0.9699, respectively. The results demonstrate that while textual information carries most of the discriminative power, audio features ofer complementary cues that slightly improve performance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Satire speech Recognition</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Transformers</kwd>
        <kwd>BERT</kwd>
        <kwd>MFCC</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Satire is a nuanced and context-dependent form of expression that presents significant challenges for
traditional content classification systems. Unlike straightforward humor, it delivers critique indirectly
through rhetorical strategies such as irony, parody, and exaggeration. Understanding satire often
requires knowledge of cultural context and the speaker’s intent, which makes automated detection
more complex. The satirical meaning is influenced not only by the text itself but also by vocal features
like tone, pitch, and rhythm that shape how the message is communicated and received. As a result,
detecting satire in multimodal content requires models that can efectively combine both linguistic and
auditory information [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Interest in automatic satire detection has increased, driven by its potential to counter misinformation,
support content moderation, and enrich media analysis. On digital platforms, satire often closely
resembles genuine news, which heightens the risk of confusion and the dissemination of false narratives.
Therefore, developing reliable automated systems for satire detection is essential to promote accurate
interpretation of content, particularly in multilingual and culturally diverse contexts.</p>
      <p>
        Traditionally, satire detection has concentrated on textual analysis. Transformer-based language
models trained on news and social media datasets have demonstrated strong performance in this area
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, satire is also widespread in spoken formats such as television shows, podcasts, and video
sketches, where vocal delivery is central to its expression. Despite this, benchmarks for multimodal
satire classification remain scarce. Studies on sarcasm and irony detection have shown that models
integrating text, audio, and visual modalities significantly outperform text-only counterparts [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This
performance gap underscores the importance of moving beyond purely textual representations and
adopting multimodal approaches—particularly those that incorporate audio—to better capture the
complexities of satirical communication.
      </p>
      <p>Detecting satire in spoken or audiovisual content necessitates multimodal modeling due to the
inherent limitations of relying on text alone. Satirical intent is often conveyed not just by the words
themselves, but by how they are spoken. Prosodic features such as intonation, pitch variation, stress,
and rhythm play a crucial role in signaling irony, sarcasm, or exaggeration. These vocal cues can
fundamentally alter the meaning of an utterance. For instance, a deadpan or sarcastic tone can reverse
the literal interpretation of the words, which would likely be misunderstood by models that analyze
text in isolation. By incorporating audio, models can capture these non-verbal signals, leading to a more
accurate understanding of speaker intent and more reliable satire detection in real-world, multimodal
contexts.</p>
      <p>
        Moreover, recent research has demonstrated that audio features can enhance performance in various
NLP tasks, such as emotion recognition, and across diferent domains [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>
        To address the existing gap in multimodal satire detection, the SatiSpeech shared task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], part of
IberLEF 2025 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], introduces a new benchmark focused on Spanish-language satire. The task includes
two subtasks: (1) satire classification using text only, and (2) multimodal satire classification using
aligned text and audio segments.
      </p>
      <p>
        We participated in both subtasks of the SatiSpeech challenge by adapting our prior approach from the
EmoSpeech task at IberLEF 2024. For Subtask 1 (text-only classification), we fine-tuned BETO, a
BERTbased model pre-trained on Spanish corpora [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We extracted the [CLS] token from the final hidden
layer to produce fixed-size sentence embeddings, which were then used as input to a support vector
machine (SVM) classifier. We optimized the SVM through grid search over kernel types, regularization
parameters, and gamma values.
      </p>
      <p>For Subtask 2 (multimodal classification), we extended the architecture to integrate audio features.
We extracted Mel-frequency cepstral coeficients (MFCCs) from each audio sample using librosa,
resulting in 1D acoustic embeddings. These were concatenated with BETO’s contextual text embeddings
to form joint multimodal vectors. A custom classification head was built to accommodate the expanded
feature set, operating on top of a modified Bert model. Our implementation, structured as a subclass
of BertForSequenceClassification class, combined pooled textual and acoustic representations
before the final classification layer. The model was trained using binary cross-entropy loss for satire
detection.</p>
      <p>This paper is organized as follows: Section 3 provides an overview of the shared task and Section
2 a summary of related works; Section 4 describes our modeling approaches for both unimodal and
multimodal configurations; Section 5 presents the experimental results and comparisons; and Section 6
concludes with insights and future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The automatic detection of figurative language—such as irony, sarcasm, and satire—has garnered
increasing attention in natural language processing due to its relevance in combating misinformation
and supporting content moderation. Early research focused primarily on text-based features, using
lexical, syntactic, and semantic cues to classify satirical or sarcastic content. Traditional machine
learning approaches applied handcrafted features (e.g., n-grams, sentiment polarity) with classifiers
such as SVMs or random forests. However, these methods often struggled with generalization across
domains due to the context-dependent nature of satire.</p>
      <p>
        Recent advancements in transformer-based language models, such as BERT and its monolingual
variants like BETO [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], have significantly improved performance in satire and humor detection. These
models benefit from contextualized word embeddings and large-scale pretraining, allowing them to
better capture subtle linguistic signals such as hyperbole, metaphor, or irony [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        In the multimodal domain, several studies have explored the integration of speech and visual data
for figurative language detection. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] showed that combining audio and visual features improves
sarcasm recognition compared to text-only approaches. Similar findings have been reported in emotion
recognition tasks, where speech prosody—including pitch, energy, and rhythm—was found to provide
complementary information to text. Mel-Frequency Cepstral Coeficients (MFCCs) [ 9], as used in our
work, are widely adopted as a compact representation of speech signals due to their efectiveness in
modeling prosodic features.
      </p>
      <p>
        The SatiSpeech 2025 shared task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] directly addresses this gap by introducing a standardized dataset
and evaluation protocol for Spanish satire detection from both text and aligned audio. Our work
builds upon these prior studies by evaluating the efectiveness of a simple early fusion strategy using
BETO embeddings and MFCC features, and by highlighting current limitations and opportunities in
multimodal satire classification.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description</title>
      <p>The SatiSpeech shared task, part of IberLEF 2025, focuses on the automatic detection of satirical content
in Spanish using both text-only and multimodal (text + audio) inputs. This challenge stems from the
subtle and context-dependent nature of satire and its growing relevance in areas such as media analysis,
misinformation detection, and computational discourse understanding.</p>
      <p>The task is organized into two subtasks:
• Task 1: Text-Based Satire Detection. Participants must develop systems that classify whether
a given transcript is satirical, relying solely on textual features. These may include word choice,
syntactic patterns, and rhetorical devices like irony and exaggeration.
• Task 2: Multimodal Satire Detection. This subtask extends the problem by incorporating audio.</p>
      <p>Participants receive aligned audio-transcript pairs and must combine linguistic and prosodic
cues—such as rhythm, intonation, and stress—for binary satire classification.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>We participated in the task using the SatirA dataset, a curated collection of Spanish-language audio
segments sourced from YouTube. The dataset features a diverse range of Spanish dialects and regional
varieties to promote linguistic diversity and reduce potential regional bias. Audio segments were
generated using automatic speaker diarization, filtering out clips longer than 25 seconds. Transcriptions
were produced using Whisper. The annotation process followed a semi-supervised approach: initial
automatic labels were refined by a team of three expert annotators to ensure high-quality ground truth.</p>
        <p>The final dataset contains approximately 25 hours of labeled content and was split 80/20 into training
and validation sets. All submissions and evaluations were carried out through the Codalab platform.</p>
        <p>Table 1 summarizes the class distribution in both splits. There is a mild class imbalance, with
nonsatirical samples being slightly more frequent and typically longer on average. Satirical transcripts
tend to show more variation in length. While these patterns may reflect real-world stylistic diferences,
models should avoid exploiting them directly during classification.</p>
        <p>Set
Train
Validation</p>
        <p>Metric</p>
        <p>Samples
Avg. Length</p>
        <p>Std. Dev.</p>
        <p>Samples
Avg. Length</p>
        <p>Std. Dev.</p>
        <p>Non-satirical Satirical
3168
60.83
12.01
633
61.08
12.23
2832
55.95
17.60
567
55.83
17.15</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This section describes the approach we used for both subtasks of the SatiSpeech Shared Task: text-based
satire detection (Task 1) and multimodal satire detection (Task 2). Instead of a traditional SVM-based
pipeline, we fine-tuned neural architectures built on top of the Spanish BERT model (BETO), using
custom classification heads tailored to each modality.</p>
      <p>Our system was developed in PyTorch using the HuggingFace Transformers library. For Task 1,
we relied exclusively on textual inputs. For Task 2, we extended this model by incorporating
MFCCbased acoustic features extracted from the corresponding audio segment. The full architecture for the
multimodal model is illustrated in Figure 1.</p>
      <sec id="sec-4-1">
        <title>4.1. Text-Based Satire Detection (Task 1)</title>
        <p>For Task 1 (text-only satire detection), we fine-tuned the dccuchile/bert-base-spanish-wwm-uncased
model, commonly known as BETO. The transcriptions were preprocessed using standard tokenization,
with padding and truncation applied to ensure a consistent input length of 512 tokens. We extracted the
768-dimensional contextual embedding associated with the [CLS] token from the final hidden layer of
BETO to serve as a fixed-size representation of each utterance, which was subsequently passed through
a lightweight classification head for binary satire prediction.</p>
        <p>This embedding was passed to a custom classification head consisting of a dropout layer followed by a
linear projection to the number of output labels. The entire model, including BETO and the classification
head, was fine-tuned using cross-entropy loss.</p>
        <p>Training was conducted over 10 epochs using a batch size of 16 and a learning rate of 2 × 10− 5. We
selected the best model based on weighted F1-score on a validation set (10% of the training data).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Multimodal Satire Detection (Task 2)</title>
        <p>For Task 2 (multimodal satire detection), we extended the textual model by integrating prosodic
information. We extracted Mel-Frequency Cepstral Coeficients (MFCCs) from each audio segment
using the librosa library. These features were averaged across the temporal axis to yield a
fixedlength acoustic vector of 40 dimensions. This audio representation was then concatenated with the
768-dimensional BETO text embedding.</p>
        <p>This MFCC vector was concatenated with the 768-dimensional BETO embedding to form a
808dimensional multimodal input. We then adapted the classification head to project from this larger
combined feature space.</p>
        <p>The decision to combine BETO with MFCCs rather than using a full acoustic language model (e.g.,
Wav2Vec 2.0) was driven by both computational constraints and interpretability. MFCCs provide a
compact and well-understood representation of prosodic and timbral characteristics of speech, which
are highly relevant for capturing nuances of satirical delivery such as tone, rhythm, and emphasis.</p>
        <p>The multimodal classification head was custom-designed to handle the concatenated embeddings
eficiently. By treating the audio and text branches independently until the fusion point, the architecture
maintains modularity, making it adaptable for other modalities or downstream tasks.</p>
        <p>All training was conducted using PyTorch and Hugging Face Transformers, with evaluation based
on weighted and macro F1-scores to account for mild class imbalance. The entire pipeline—from
preprocessing to final prediction—is reproducible and compatible with GPU acceleration, ensuring
scalability for larger datasets or multilingual adaptations.</p>
        <p>The rest of the architecture remained the same. The model was trained for 20 epochs using a learning
rate of 1 × 10− 5 and the same batch size. As in Task 1, we used cross-entropy loss and selected the best
model according to weighted F1-score on the validation split.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>For Task 1, the fine-tuned BETO model on Spanish satirical texts achieved a macro F1 score of 0.9648
on the validation split. This demonstrates that BETO, as a monolingual transformer pretrained on
Spanish, is highly efective at capturing linguistic features relevant to satire, such as irony, hyperbole,
and rhetorical structure.</p>
      <p>For Task 2, the addition of prosodic features through MFCC vectors led to a modest improvement,
with a macro F1 score of 0.9699. The gain, while small, confirms that acoustic signals—particularly
related to tone, rhythm, and emphasis—can reinforce textual cues in satire detection. However, the
improvement is incremental rather than transformative, suggesting that simple early fusion of features
(concatenation) may not fully exploit the expressive richness of the audio modality.</p>
      <p>As the gold labels for the test set were not publicly released, a 20% stratified split from the training
data was used as a held-out validation set to better understand model performance. Table 5 provides
the macro-averaged metrics for both tasks.</p>
      <p>A class-wise breakdown revealed that the models maintain extremely high performance across both
satire and no-satire classes. In particular:
• Task 1 showed excellent balance, with precision and recall values near 0.99 for both classes,
indicating that BETO is capable of generalizing from lexical and syntactic satire cues alone.
• Task 2 demonstrated that audio features marginally improved recall for satirical utterances.</p>
      <p>This suggests the model learned prosodic patterns such as exaggerated intonation or rhythmic
anomalies common in satirical speech.
• However, the improvement was not drastic, which likely reflects the limitation of using MFCC
features and early fusion. Richer acoustic representations or attention-based fusion may be
needed to fully leverage the audio modality.</p>
      <p>In sum, both models achieved results well above the challenge baseline, confirming the validity of
our pipeline. The performance gap between our internal validation and leaderboard scores could be
explained by slight domain shift, noise in test set audio, or diferences in source domains.</p>
      <sec id="sec-5-1">
        <title>5.1. Error Analysis</title>
        <p>To better understand the limitations of our models, we conducted a qualitative analysis of several
misclassified examples from the validation set. We selected representative cases from both Task 1
(text-only) and Task 2 (multimodal) that illustrate common failure patterns.</p>
        <p>In Task 1, many errors occurred when the satirical content employed subtle rhetorical cues such as
irony, sarcasm, or absurd exaggeration that were not easily distinguishable from factual reporting. These
examples often relied on cultural or contextual background knowledge that the model was unlikely to
capture from text alone.</p>
        <p>In Task 2, despite the inclusion of prosodic features through MFCCs, some predictions still failed to
recognize satirical tone. This may be due to the limitations of using static acoustic embeddings and
early fusion strategies, which do not fully exploit temporal speech dynamics such as intonation or
speech rate.</p>
        <p>Table 6 presents a selection of misclassified samples with their full transcriptions and classification
results.</p>
        <p>Task
Task 1
Task 1
Task 1
Task 2
Task 2</p>
        <p>Prediction
no-satire</p>
        <p>Gold Label
satire
no-satire
no-satire
no-satire
satire
satire
satire
satire
no-satire</p>
        <p>Transcription
Este parque es súper importante para el movimiento del barrio,
que además ahora se usa para los rodajes, que es otra fuente de
ingresos. Pero claro, como lo usa la tele, ya nos lo van a cerrar.</p>
        <p>El 5G no deja de sorprender. Ayer se pudo ver a una vecina del
barrio que, tras vacunarse, sintonizaba los canales rusos con
sólo tocarse la frente.</p>
        <p>No nos responden. Luego tuve ocasión de, con los servicios
secretos franceses, hacerme con una grabación de Macron
escuchando la COPE mientras se duchaba.</p>
        <p>Bueno, pues me temo que vamos a tener que dejar las fronteras
abiertas otra vez. Parece que el virus ha pedido vacaciones.</p>
        <p>Por su altura de 1 metro y 52 centímetros, nunca pensaron que
sería capaz de trepar esa valla de seguridad. Pero lo logró. Y con
elegancia.</p>
        <p>This qualitative review reveals that improving satire detection may require enhanced modeling of
pragmatic and cultural cues, and more sophisticated multimodal fusion strategies that account for the
temporal structure of speech. Incorporating prosody-aware embeddings, pitch contours, or pre-trained
acoustic transformers such as Wav2Vec2 could help address these limitations in future work.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we presented the UKR team’s system for the SatiSpeech 2025 shared task, addressing
both textual and multimodal satire detection in Spanish. Our approach relied on fine-tuning the BETO
language model, with and without the integration of acoustic features derived from MFCCs. Despite
using a simple early fusion strategy for the multimodal task, our system achieved strong results, placing
6th in Task 1 and 9th in Task 2 on the oficial leaderboard.</p>
      <p>The validation results showed that BETO alone is highly capable of capturing the linguistic and
rhetorical markers of satire, with a macro F1 score of 0.9648. The addition of MFCC features led to
a modest gain (macro F1 of 0.9699), suggesting that prosodic cues—although helpful—were not fully
leveraged with the current feature representation. These findings underscore the need for more efective
fusion strategies to harness the full potential of multimodal features.</p>
      <p>Our system outperformed the baseline by a large margin, and remained competitive among teams
that adopted more complex multimodal architectures. As future work, we plan to explore richer audio
embeddings (e.g., Wav2Vec 2.0 fine-tuning), attention-based fusion layers, and the use of
prosodicspecific features like pitch contours and speech rate, which may reveal subtler satire signals.</p>
      <p>Overall, our results confirm the efectiveness of Transformer-based models in satire detection and
highlight the potential of integrating speech features to enhance detection in multimodal communication
settings. Additionally, we aim to experiment with LLMs, given their demonstrated efectiveness in
various classification tasks within domains such as hate speech and satire detection [ 10, 11]. Their
capacity for contextual understanding and generalization could further enhance performance in nuanced
tasks like SatiSpeech.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used DeepL in order to Grammar and spelling check.
[9] V. Tiwari, Mfcc and its applications in speaker recognition, International journal on emerging
technologies 1 (2010) 19–22.
[10] A. Salmerón-Ríos, J. A. García-Díaz, R. Pan, R. Valencia-García, Fine grain emotion analysis in
spanish using linguistic features and transformers, PeerJ Computer Science 10 (2024). doi:10.
7717/PEERJ-CS.1992.
[11] R. Pan, J. A. García-Díaz, R. Valencia-García, Spanish mtlhatecorpus 2023: Multi-task learning
for hate speech detection to identify speech type, target, target group and intensity, Computer
Standards &amp; Interfaces 94 (2025) 103990. URL: https://www.sciencedirect.com/science/article/pii/
S0920548925000194. doi:https://doi.org/10.1016/j.csi.2025.103990.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <article-title>Cultural diferences in humor perception, usage, and implications</article-title>
          ,
          <source>Frontiers in Psychology</source>
          <volume>10</volume>
          (
          <year>2019</year>
          ). URL: https://api.semanticscholar.org/CorpusID:59307773.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <article-title>Compilation and evaluation of the spanish saticorpus 2021 for satire identification using linguistic features and transformers</article-title>
          ,
          <source>Complex &amp; Intelligent Systems</source>
          <volume>8</volume>
          (
          <year>2022</year>
          )
          <fpage>1723</fpage>
          -
          <lpage>1736</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Broniatowski</surname>
          </string-name>
          ,
          <article-title>A multi-modal method for satire detection using textual and visual cues</article-title>
          , in: G. Da San Martino, C. Brew,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Ciampaglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leberknight</surname>
          </string-name>
          , P. Nakov (Eds.),
          <source>Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship</source>
          , Disinformation, and Propaganda,
          <source>International Committee on Computational Linguistics (ICCL)</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>38</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .nlp4if-
          <fpage>1</fpage>
          .4/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Rodríguez-García</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <source>Spanish meacorpus</source>
          <year>2023</year>
          :
          <article-title>A multimodal speech-text corpus for emotion analysis in spanish from natural environments</article-title>
          ,
          <source>Computer Standards &amp; Interfaces</source>
          <volume>90</volume>
          (
          <year>2024</year>
          )
          <article-title>103856</article-title>
          . URL: https://www.sciencedirect.com/science/ article/pii/S0920548924000254. doi:https://doi.org/10.1016/j.csi.
          <year>2024</year>
          .
          <volume>103856</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Á. Rodríguez</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>García</surname>
          </string-name>
          <string-name>
            <surname>Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Valencia</given-names>
            <surname>García</surname>
          </string-name>
          , Overview of emospeech at iberlef 2024:
          <article-title>Multimodal speech-text emotion recognition in spanish</article-title>
          ,
          <source>Procesamiento del lenguaje natural 73</source>
          (
          <year>2024</year>
          )
          <fpage>359</fpage>
          -
          <lpage>368</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bernal-Beltrán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>García-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          , Overview of SatiSPeech at IberLEF 2025:
          <article-title>Multimodal Audio-Text Satire Classification in Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cañete</surname>
          </string-name>
          , G. Chaperon,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <article-title>Spanish pre-trained bert model and evaluation data</article-title>
          ,
          <source>in: PML4DC at ICLR</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>