<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PRHLT-UPV at MiSonGyny 2025: Detection of Misogynist Expressions in Song Lyrics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elias Urios Alacreu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Aloy Mayo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>PRHLT Research Center, Universitat Politècnica de València</institution>
          ,
          <addr-line>Valencia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ValgrAI Valencian Graduate School and Research Network of Artificial Intelligence</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Gender bias and discrimination against women are still manifested in popular culture through various types of discourse present in social media, TV shows, and music. In particular, song lyrics often express misogyny in explicit ways (e.g., objectification of women's bodies, violent threats), but also in covert forms masked as romantic expression. This paper reports our participation in the MiSonGyny 2025 shared task, which involves both the detection and fine-grained classification of misogynistic content in lyrics. Our experiments explore the efectiveness of BERT-like models and large language models (LLMs) for these tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Misogyny detection</kwd>
        <kwd>song lyrics</kwd>
        <kwd>gender bias</kwd>
        <kwd>natural language processing</kwd>
        <kwd>LLMs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The spread of misogynistic discourse remains a persistent issue contributing to gender inequality.
Harmful content towards women in social media platforms and other ways of communication implies a
critical scenario that promotes gender inequality. It also contributes “to an environment where women
feel unsafe, targeted or silenced”[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        We can consider misogyny within the concept of hate speech (HS) understood as “any communication
denigrating a person or group, judging by specific features such as color, ethnicity, gender, sexual
orientation, nationality, and religion” [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this sense, Moreno-Sandoval et al. consider misogynistic
speech “when hate speech is gender-oriented and specifically targets women” [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Moreover, social
studies link misogynistic discourses with the enforcement of male dominance, since these attitudes
promote discrimination against women and “its efects are widespread and systematic, bearing severe
both social and individual consequences, such as verbal and physical violence, rape and femicide” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
In summary, the presence of misogyny in social media and popular culture discourses denotes hateful
attitudes or actions against women that still prevail in society.
      </p>
      <p>
        This scenario presents a challenge for research studies trying to understand the semantic patterns
hidden beneath a wide range of written and visual features in discourses such as social media platforms,
news articles or TV shows. For instance, even song lyrics are contaminated with ideologically-biased
ideas based on misogynistic attitudes, and we believe that it is crucial to address this concern, given the
influential role music plays in shaping societal perspectives. The powerful influence of music is raised
in Davies [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] emphasizing that song lyrics “are more than mere mirrors of society; they are a potent
force in the shaping of it”. This idea reveals the power of linguistic structures present in song lyrics
and how the application of Natural Language Processing (NLP) techniques presents unique challenges.
Factors such as the use of metaphors, slangs, and poetic language can obscure explicit content and
make misogyny in music harder to detect for Natural Language Models. Addressing these complexities
requires the development of specialized models capable of interpreting nuanced language and cultural
references inherent in musical texts.
      </p>
      <p>
        In this regard, an interesting work by Betti et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] analyses the ’Two Million Song Database’ corpus,
focusing on the detection of sexism over five decades (1960-2010). The study employed NLP techniques
to identify sexist language and measured gender biases through word embeddings, highlighting the
prevalence of misogynistic themes in popular music. Their conclusions reveal an increase in sexist
content and specifically observe that “songs of male solo artists contain more and often stronger biases
than those of female solo artists”. It shows how lyrics carry strong cultural and contextual signals and
are the expression of popular culture that can influence how listeners see the world and reflect how
artists perceive it [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Their conclusions highlight the continuing need to explore gender bias in popular
music, focusing not only on linguistic structures but also on other sociocultural aspects such as the
gender of the artist or even the impact on the audience.
      </p>
      <p>This scenario confirms that the exploration of misogyny in song lyrics has opened new avenues for
multidisciplinary research, and it is a call to action highlighting the need for models attuned to the
intricacies of lyrical language. Continued eforts in dataset development, bias mitigation, and model
refinement are essential to further progress in this critical area of study.</p>
      <p>
        The following work aims to investigate this gap by participating in the MySonGyny shared task
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] at IberLEF 2025 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which proposes both the detection of misogyny in song lyrics as well as a
more fine-grained classification. Our system presents a comparison of BERT-like models against LLM
models to detect mysogyny speech in song lyrics. In addition, we also investigate the efects of data
augmentation by LLM for the given task.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        In recent years, the proliferation of HS on digital platforms has sparked significant interest within the
NLP community. Numerous resources, shared tasks, and benchmark datasets [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref15 ref16 ref9">9, 10, 11, 12, 13, 14, 15, 16</xref>
        ]
have been proposed, contributing to the development of classifiers capable of efectively addressing
this challenge.
      </p>
      <p>
        Misogyny, as a specific form of HS, has received particular attention. Existing studies have tackled
this issue primarily through the textual analysis of comments extracted from social media platforms,
especially X [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref9">9, 10, 11, 12, 13, 14</xref>
        ]. More recently, research has expanded to multimodal approaches,
including the detection of misogyny in memes [
        <xref ref-type="bibr" rid="ref12 ref15">15, 12</xref>
        ].
      </p>
      <p>
        Despite these advances, most approaches still rely heavily on social media data, which limits the
generalization of models to other forms of communication. Creative and artistic texts, such as song
lyrics, present unique challenges for the automatic detection of HS. Although some recent work has
addressed this issue [
        <xref ref-type="bibr" rid="ref17 ref18 ref6">6, 17, 18</xref>
        ], the number of shared tasks dedicated to this topic remains small, with
only the third subtask of HOMO-MEX 2024 focusing on LGBTQ+phobia detection in song lyrics [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        From a methodological standpoint, recent advances in Deep Learning (DL) have significantly shaped
the development of HS detection systems, with Transformer-based architectures [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] setting the state of
the art over the last few years. On the one hand, Transformer encoder models [
        <xref ref-type="bibr" rid="ref21 ref22 ref23">21, 22, 23</xref>
        ] have become
standard due to their ability to produce contextual representations and adapt through fine-tuning.
On the other hand, Large Language Models (LLMs), such as GPT [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] or LLaMA [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], are generative
models based on Transformer decoder architectures and trained on massive corpora. These models
have demonstrated strong performance in classification tasks [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] by leveraging few-shot learning
[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] capabilities and advanced reasoning skills through techniques such as Chain of Thought (CoT)
prompting [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Tasks and Dataset Descriptions</title>
      <sec id="sec-3-1">
        <title>3.1. Tasks Descriptions</title>
        <p>
          The MiSonGyny 2025 shared task [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is divided into two sub-tasks:
1. Misogyny Speech Detection. A binary classification task aimed at classifying phrases from
song lyrics containing misogynistic speech. The classes are described as follows:
• Misogynist (M): Any lyrics that contain hate speech or disdain directed at women or
perpetuate harmful gender stereotypes that promote subordination or objectification of
women.
• Not Misogynist (NM): Lyrics that do not include hate speech or disdain against women.
        </p>
        <p>Although they might address themes related to women, they can do so without perpetuating
stereotypes or negative attitudes.
2. Fine-grained Misogyny Speech Detection. A hierarchical multi-class classification task aimed
at detecting the type of speech present in a song. The categories can be described as follows:
• Sexualization (S): Phrases which describe or suggest sexual acts, sexual language, or
insinuations.
• Violence (V): Refers to lyrics containing physical or verbal aggressions, threats or violent
actions.
• Hate (H): Lyrics which contain ofensive or discriminatory language, expressions of
contempt, or hostility towards a women or a group of women.
• Not Related (NR): Phrases that do not fall into the above categories and lack sexual, violent,
or hateful content.</p>
        <p>
          For the evaluation of the proposed systems across both sub-tasks, standard classification metrics
such as Precision (Pr), Recall (Rc) and F1-score were proposed by the organizers, with F1-score being
the selected metric for determining the ranking on the oficial test set [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. In addition to the proposed
metrics, our system’s performance is also evaluated with the Area Under the Receiver Operating
Characteristic Curve (AUROC) [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] 1 as well as the Matthews Correlation Coeficient (MCC) [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], which
provides us with a deeper understanding of the behaviour of our system.
        </p>
        <p>Table 1 summarizes the metrics used for the evaluation of each sub-task, indicating the oficial metrics
used for ranking.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset</title>
        <p>
          This section describes the dataset used in the MiSonGyny 2025 shared task [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The dataset distributions
for each sub-task are illustrated in Figure 2.
1AUROC will not be present in the evaluation of all our system since it requires class probabilities.
        </p>
        <p>On the one hand, Figure 2a presents the data distribution for the first subtask. Although we are
provided with approximately 2300 samples, it is important to note on the class imbalance present for
this sub-task, since the non-misogynistic songs make up almost 70% of the dataset.</p>
        <p>On the other hand, Figure 2b presents the data distribution for the second subtask. Since this subtask
is a hierarchical one, all the samples on this subtask are directly extracted from the previous. Likewise,
the most common class is the one related to the absence of misogyny (i.e., NR for this subtask). By
examining the categories related to the presence of hate speech, it can be observed that the sexualization
class is the most common among them with 435 samples, whereas the violence and hate classes only
make up less than 20% of the overall dataset, presenting a very skewed distribution.
(a) Misogyny Speech Detection sub-task.
(b) Fine-grained Misogyny Speech Detection
subtask.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. System Proposal</title>
      <sec id="sec-4-1">
        <title>4.1. Preprocessing</title>
        <p>Song lyrics possess a unique structure, driven by their domain, that requires specific NLP considerations.
For instance, they frequently feature pervasive repetition, like recurring choruses and verses, which,
while integral to musical composition, ofers little to no information for the given tasks.</p>
        <p>Beyond repetition, lyrics often contain elements irrelevant to their core message. This includes
ad-libs (often sounds rather than coherent sentences) and song delimiters such as [Intro] or [Chorus].
As shown in Table 2, these elements, coupled with the initial song length, frequently push them beyond
common token limits like BERT’s 512-token maximum.</p>
        <p>Believing these structural and superfluous elements do not significantly influence hate speech
presence, we preprocessed our data by removing ad-libs, song section delimiters, blank lines, and
repeated verses and sections. As Table 2 reflects, this step allows approximately 75% of the songs to
ift within the BERT model’s token limit, enabling more eficient and relevant analysis focused on the
unique and semantically relevant lyrical content.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. RoBERTa Model</title>
        <p>
          The proposed system architecture is based on fine-tuning a pre-trained RoBERTa model [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
Specifically, we utilized the RoBERTa model from the MarIA project 2 [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. For the classification task, the
representation of the final layer’s [CLS] token is extracted and passed through a sequence of additional
layers: a dropout layer [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ], a pooling layer, a Tanh activation, another dropout layer, and a final linear
classifier layer. Figure 3 illustrates this architecture.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. LLM for Tackling Misogyny Detection</title>
        <sec id="sec-4-3-1">
          <title>4.3.1. Data Augmentation</title>
          <p>
            As we pointed out in Section 3.2, one of the problems we can anticipate for both sub-tasks is the lack of
data for some categories. In order to address this problem, we have decided to employ data augmentation
techniques using LLMs to generate new data for the misogynistic lyrics. More specifically, we have
explored data augmentation by paraphrasing [
            <xref ref-type="bibr" rid="ref33 ref34">33, 34</xref>
            ] as well as synthetic data creation [
            <xref ref-type="bibr" rid="ref35">35</xref>
            ].
          </p>
          <p>Rather than direct paraphrasing, we explicitly prompt the LLM with a definition of the target
finegrained misogyny category and instruct it to first identify misogynistic verses related to that category.
Subsequently, the model generates three distinct paraphrases, with each version adhering to specific
rules: the first paraphrases all content while maintaining the core misogynistic theme, the second
replaces non-misogynistic verses with new content while only paraphrasing the misogynistic elements,
and the third reorganizes and paraphrases the entire song while preserving narrative coherence and the
defined misogynistic essence. Prior to generation, the LLM is also tasked with explaining and justifying
its intended changes and how the new versions retain the original misogynistic nature, encouraging
the model to take intermediate steps before generating any data. Our paraphrase prompt can be found
in Appendix 8.1.</p>
          <p>On the other hand, for synthetic data creation, our intention was to generate entirely new song lyrics
from an existing one, critically ensuring the preservation of the original’s misogynistic elements in
the new composition. Prior to generating these new lyrics, we instruct the model to extract crucial
information: the song’s main themes, specific misogynistic verses, its genre, and its overall mood.
Subsequently, the model is tasked with creating a new song that retains the same mood and themes but
belongs to a diferent music genre. Our synthetic data generation prompt can be found in Appendix 8.2.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.3.2. Classification</title>
          <p>A primary consideration when designing our classification prompts is that it is not necessary to create
a separate prompt for each task. As highlighted in Section 3.2, the second sub-task is dependent on
the first. Consequently, successfully addressing the second sub-task enables us to infer the categories
for the initial sub-task, which minimizes both employed time and economic costs if we were to use a
provider API.</p>
          <p>
            Furthermore, our classification prompts adhere to the conventional zero-shot and one/three-shot
schemes for few-shot learning. To enhance the LLM performance, we incorporated CoT reasoning by
instructing the model to meticulously analyse song lyrics and formulate its responses through a
stepby-step process [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ]. Moreover, the model was directed to provide a justification for its classification,
which provides us with a plausible explanation for its decision.
          </p>
          <p>
            Instead of randomly selecting the samples for our in-context learning prompts, we have utilized
a model which combines both automatic extraction as well as manual revision. To do so, we first
calculated embeddings for all song lyrics with a multilingual embedding model 3 [
            <xref ref-type="bibr" rid="ref36">36</xref>
            ] which excels at
the Massive Text Embeddings Benchmark (MTEB) [
            <xref ref-type="bibr" rid="ref37">37</xref>
            ].
          </p>
          <p>Once embeddings were computed, our goal was to select both representative and ambiguous examples
for each class. We achieved this by calculating each song lyric’s distance to the centroid of every class.
Representative samples were chosen based on their proximity to their own class centroid, providing the
LLM with clear examples of what to look for. Conversely, ambiguous samples were picked if they were
closer to another class’s centroid, exposing the LLM to "dificult" examples.</p>
          <p>Given its reliance on semantic similarity, a linguist manually evaluated the top-K representative and
ambiguous samples extracted by the algorithm. Thus, the selected samples are guaranteed to represent
the phenomena of each of the categories from the second sub-task.</p>
          <p>Our 0-shot classification prompt can be found in Appendix ??.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimentation Settings</title>
      <sec id="sec-5-1">
        <title>5.1. Data Split</title>
        <p>For each task, we created our private training and validation datasets by splitting the original training
dataset provided by the organizers, using an 80/20 ratio while stratifying the split to ensure proportionate
representation of classes. Subsequently, when applying the data augmentation techniques mentioned on
Section 4.3.1, we experimented with various training configurations; for instance, some setups involved
replacing all samples of the minority class with synthetic data, whereas others combined both the
original misogynistic samples with the synthetic ones. Critically, data augmentation was exclusively
applied to samples already present in our private training dataset, ensuring no data leakage into the
private validation sets.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. RoBERTa</title>
        <p>
          Our RoBERTa models were fine-tuned following several key recommendations presented in [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ].
Specifically, we utilized an AdamW optimizer [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] with a learning rate of  = 2 − 5; all other AdamW
settings were kept at their default values. Moreover, we applied weight decay only to the Transformer
encoder layers, but not to the pooling and classification layers. Consistent with these recommendations,
the weights of both the pooling and classification layers were initialized with values drawn from a
Normal distribution and the selected Transformer configuration (mean = 0, standard deviation = 0.02).
        </p>
        <p>
          Models were trained with a batch size of 16 for a maximum of 10 epochs. To optimize training
duration and prevent overfitting, early stopping was implemented based on the validation loss, and the
model checkpoint achieving the lowest validation loss was selected. Input sequences were dynamically
padded to the maximum length within each batch to ensure maximum computation eficiency.
5.3. LLM
All experimentation involving LLMs was conducted using the OpenAI model GPT-4.1-mini [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. Our
selection is based on the fact that this model not only surpasses the GPT-4o model in various benchmarks
(e.g., Massive Multitask Language Understanding [
          <xref ref-type="bibr" rid="ref41 ref42">41, 42</xref>
          ] or Instruction-Following Evaluation [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ]),
but it also ofers its cost-efectiveness.
        </p>
        <p>Regarding more specific settings, we only tweaked the temperature of the model depending on the
given task. On the one hand, classification tasks were run with a temperature of 0.5, promoting more
consistent and deterministic outputs, yet still allowing the model suficient flexibility to provide nuanced
reasoning, especially for misogyny detection. On the other hand, data augmentation was performed
with a temperature of 0.7, thus enabling the model to explore more creative and diverse generations for
the resulting data.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results &amp; Discussion</title>
      <sec id="sec-6-1">
        <title>6.1. Quantitative Results</title>
        <sec id="sec-6-1-1">
          <title>6.1.1. Data Augmentation Quality</title>
          <p>Prior to presenting the performance of the system, we evaluated the quality of data generated through
augmentation techniques. The results, reflected in Table 3, demonstrate that all the paraphrased data,
across all variations, ofer a high degree of semantic similarity to the original content, as shown by
consistently high BERT-scores. However, the similarity across the three paraphrasing methods is also
very close, suggesting that despite specifying rules for diversity, the model did not introduce as much
variability as expected.</p>
          <p>In contrast, the synthetically generated data stands out for its lower cosine similarity, indicating
greater lexical and structural diversity. Despite this, its BERT-score remains relatively high (0.76),
implying that the generated content still retains much of the original meaning.</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>6.1.2. Misogyny Speech Detection Results</title>
          <p>Table 4 displays that our RoBERTa model trained on the original dataset consistently achieves the best
overall performance across most metrics. When the original data is replaced with paraphrased versions
(Par. {1,2,3}), a performance drop is observed in all cases, with the third paraphrasing variant showing the
most pronounced decline. A similar trend is evident when training solely on synthetic data (Synthetic),
which also underperforms compared to the original configuration. Furthermore, combining the original
dataset with either paraphrased (Original + Par. {1,2,3}) or synthetic data (Original + Synthethic) does
not yield improvements over the baseline. This suggests that the augmented samples may introduce
noise or redundancy rather than helpful variability, hindering model performance.</p>
          <p>Table 5 presents the results for the GPT-4.1-mini model across diferent prompt configurations. As
anticipated, increasing the number of examples in the prompt leads the LLM model to yield results
that more closely resemble those obtained by our RoBERTa-based approach, indicating the benefit of
in-context learning. Note that the LLM results on classification do not incorporate the AUROC metric,
since it relies on class probabilities in order to calculate its value.</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>6.1.3. Fine-grained Misogyny Speech Detection</title>
          <p>Table 7 presents the results of the diferent dataset techniques utilized compared to the original dataset
for the RoBERTa model. As can be seen, the results are very similar to the ones we observed earlier.
(d) 3-shot with GPT-4.1-mini.
In other words, replacing the original data by the generated from the LLM (Par. {1,2,3} and Synthetic)
ofers a worse performance in comparison with the original baseline. However, a tiny performance
improvement can be observed once we combine the generated data from the LLM alongside the original
data.</p>
          <p>Table 8 presents the results for the Fine-grained Misogyny Speech Detection using GPT-4.1-mini with
diferent prompting strategies. As shown, the LLM consistently outperforms our RoBERTa system-based
approach for all prompts, including 0-shot.</p>
          <p>To further analyse the results, we examined the system outputs using Figure 5. For the RoBERTa
model with the original dataset (Figure 5a), high recall is observed for NR and S, which suggests that
the model is overfitting on these categories since they are the most predominant on this task. However,
the model significantly struggles with hate and violence, showing a very low recall of these crucial
categories, frequently misclassifying them as NR or S.</p>
          <p>The LLM model’s confusion matrices (Figures 5b, 5c, and 5d) show a progressive improvement
with more examples. While initially biased towards predicting NR (especially in 0-shot), recall for S
consistently improves across all scenarios. H also sees a significant recall improvement from 0-shot to
3-shot. However, the V category remains challenging for the LLM, showing only marginal improvements
in recall across settings. We will analyse specific outputs for the V category in the subsequent section
to better understand these limitations.</p>
          <p>For the oficial test set, we sent the predictions from the RoBERTa model trained with the original
training dataset configuration as well as the LLM predictions using the diferent prompts. Table 9
reflects our results in the oficial test set as well as a comparison with the winner of the subtask. In
addition, our test results prove that the LLM proposal ofers better results in the given task. Moreover,
our best submission is the 3-shot prompt, which has achieved a third position in the overall ranking.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Qualitative Results</title>
        <sec id="sec-6-2-1">
          <title>6.2.1. Data Augmentation Quality Analysis</title>
          <p>The concise results explained in the last Section 5.1, provide a rigorous context for a qualitative analysis
that allows us to explore the GPT-4.1-mini data-augmentation behaviour. In this subsection, we explore
the rationale of the LLM behind the generation of paraphrased samples. Table 10 shows an example of
a paraphrased song.</p>
          <p>In general, taking into account the conditions of the paraphrasing prompt, we observe three main
issues. Firstly, there is a lack of creativity when the LLM tries linguistic paraphrasing strategies. The
three versions of the paraphrase show poor writing innovation reflected in verses that are almost equal
between versions. For instance, "No matter how far you are", "even if you are far", "even if you’re far
away". The LLM only adds some synonyms or changes the verb tense keeping some of the verses almost
identical to the original song. Secondly, we noticed that the length of the paraphrases is significantly
shorter than the original (especially in the third paraphrase) as it can be observed in Table 10. Lastly,
after exploring the generated samples, we noticed that the LLM does not apply the condition of the
(d) 3-shot with GPT-4.1-mini.
third paraphrase in the prompt: "[...] paraphrase all misogynistic and non-misogynistic verses of the
song, but modify the structure of the song as well as the order of the verses [...]", as it can be seen from
the third example in the table, the order and structure of verses are kept the same as in the original.</p>
          <p>
            In conclusion, after our exploration of the data augmentation results, we can infer that creative
writing is an outstanding challenge in generative AI. This is an ongoing discussed subject ([
            <xref ref-type="bibr" rid="ref44">44</xref>
            ], [
            <xref ref-type="bibr" rid="ref45">45</xref>
            ],
[
            <xref ref-type="bibr" rid="ref46">46</xref>
            ]) in which the limits of the creative thinking in AI are still unclear.
          </p>
        </sec>
        <sec id="sec-6-2-2">
          <title>6.2.2. Misogyny Speech Classification Analysis</title>
          <p>In this subsection, we further present examples of the responses produced by the GPT-4.1-mini when
detecting the categories from the second sub-task (NR, S, H, V) to the diferent prompts: 0-shot, 1-shot
and 3-shot. With this aim, we establish a comparison among the three diferent prompting strategies and
explore the performance results in detail. The following three tables present three significant examples
of the response of the model in diferent categories. Some of them include interesting information about
the detailed reasoning of GPT-4.1 when making a classification decision. For instance, we can observe
the presence of semantic arguments and linguistic expressions to justify the LLM choice and how its
interpretation of the semantic context of the song afects its final decision.</p>
          <p>Table 11 presents the responses to a song categorized as "Sexualization". As we mentioned in Section
5.1, this is one of the predominant categories within the dataset, but even though the LLM performance
Original:
&lt;Estoy tan lejos de ti Y a pesar de la enorme distancia Te siento juntito a mí; Corazón, corazón; alma con
alma Y siento en mi ser tus besos No importa que estés tan lejos Estoy pensando en tu amor Y a lo loco
platico contigo Te cuento de mi dolor Y aunque me hagas feliz No te lo digo Y vuelvo a sentir tus besos
No importa que estés tan lejos El cielo empieza a clarear Y mis ojos se llenan de sueño Contigo voy a
soñar Porque, quieran o no Yo soy tu dueño Y siempre tendré tus besos No importa que estés tan lejos
Estoy pensando en tu amor.&gt;
Paraphrase 1:
&lt;Me siento lejos de tu piel Aunque estemos distantes, Te imagino bien cerca de mí; Alma con alma,
latidos juntos, Y en mi cuerpo guardo tus caricias, No importa que estés tan lejos. Mi mente se llena de
tu amor, Hablo contigo en mi delirio, Te comparto mi tristeza, Aunque no lo diga, tú me completas, Y
vuelvo a sentir tus labios, No importa que estés tan lejos La madrugada comienza a salir, Los párpados se
ciegan de cansancio, Contigo quiero soñar, Porque, guste o no, soy quien te posee, Y siempre tendré tus
besos, No importa que estés tan lejos.&gt;
Paraphrase 2:
&lt;Estoy lejos de tu figura Pienso en ti sin medida, Siento tu calor en la distancia, Corazón junto al mío, Tus
caricias me persiguen, Aunque estés lejos de aquí. Me absorto en las llamas de tu querer, Hablo contigo
en mis sueños, Te revelo mi tristeza, Aunque no lo confiese, Y tus labios son míos, No importa que estés
lejos. Cuando el alba empieza a brillar, Mis ojos se cierran de fatiga, Quiero perderme en tus sueños,
Porque, quieras o no, Soy el dueño de tu cuerpo, Y siempre poseeré tus besos, Aunque estés lejos.&gt;
Paraphrase 3:
&lt;Soy quien controla tu cuerpo Y siempre tendré tus besos, No importa donde estés. Aunque la distancia
crezca, Te siento muy cerca de mí, En mi alma estás presente, Y tus caricias me envuelven. Hablo contigo
en silencio, Revelo mi pena oculta, Y aunque no te lo diga, Mi deseo por ti no muere. Al amanecer me
rindo al sueño, Cierro los ojos imaginando, Que tus labios son míos, Aunque estés muy lejos.&gt;
shows high results, there are some cases where it struggles to detect "Sexualization". As we observe, this
song is classified as "Non-Related" in 0-shot and 1-shot responses. The LLM gives diferent arguments to
support its classification decision, but some of them are inconsistent: "using afectionate and flattering
language (for example: ’you’re perfect’, ’doll, princess’, ’I love all of you’), without dehumanizing or
objectifying" (0-shot) and "barbie, doll, princess physical descriptions are expressed [...] without any
degrading intent or objectification"(1-shot). In both responses, linguistic cues carry explicit meanings
of women body objectification that are interpreted as mutual sexual desire. This is reformulated in the
3-shot response in which the model improves its performance considering the association of women
with figures such as Barbie or a doll as clearly objectifying and subordinating women.</p>
          <p>The second table (Table 12) shows a comparison of the LLM responses to a song categorized as
"Violence" by the true label. This particular example demonstrates how the model struggles to detect
instances of violent actions against women even when they are explicitly expressed through linguistic
cues. As we observe, the model misclassifies this category as "Sexualization" and tries to justify the
presence of some linguistic expressions that carry significant semantic weight such as "I could be your
rapist", "I’ll take you to the limit" or "I’ll make you move to extremes". These expressions are detected by
the LLM as sexual objectification or sexual subordination but are understood as sexual and dominance
play within the context of "Sexualization". In this sense, the responses of 0-shot and 1-shot tend to
misunderstand the violence expressed by the song masked under sexualization and domination play. In
addition, in 3-shot the model mitigates the violent connotation of expressions such as "I could be your
rapist" and supports its decision mentioning that these linguistic instances carry an implicit meaning of
0-shot:
&lt;The lyrics of the song "Es Un Secreto" do not meet the general definition of misogyny [...] Nor does
it perpetuate gender stereotypes that promote subordination [...]. The content is primarily romantic
[...] using afectionate and flattering language (for example: "you’re perfect," " doll, princess," "I love all
of you"), without dehumanizing or objectifying women in a degrading way. [...] The portrayal
of sexuality is not used as a tool for domination or degradation, but rather as an expression of
desire and connection between two people. Consequently, the correct classification is ’Not Related’ (NR).&gt;
1-shot:
&lt;The lyrics of "Es Un Secreto" do not meet the established definition of misogyny, [...] Nor do they
perpetuate stereotypes of female subordination or objectify the body or its parts in a degrading
manner [...]. The use of metaphors such as “Barbie, doll, princess” and physical descriptions (“your
face, your body, your eyes, also your skin”) is expressed in a tone of admiration and appreciation,
without degrading intent or objectification [...] the song conveys an ideal of connection between
two parties where desire is shared and consensual. Therefore, the song should be classified as ’Not
Related’ (NR)[...].&gt;
3-shot:
&lt;The song "Es Un Secreto" is classified as ’Sexualization’(S) because, [...] it does perpetuate the
objectification and subordination of women through sexualization. The lyrics repeatedly highlight
female physical attributes [...] and use terms like “Barbie, doll, princess,” which objectify women
by comparing them to idealized and superficial figures. Additionally, there is an underlying
intention of domination [...], and the invitation to “dance sensually” and “please,” which suggest
a dynamic where the woman is primarily seen as an object of sexual pleasure. [...] it fits the
definition of misogyny in terms of promoting the subordination and objectification of the female body
with an implicit intent of sexual domination, clearly placing it within the category of Sexualization.&gt;
sexual subordination of women. In conclusion, this is one of the examples that illustrates how "Violence"
against women goes undetected by the model.</p>
          <p>Finally, the third table presents a comparison of the model response to a song in the category "Hate".
"Hate" is one of the categories from sub-task two that improves with 3-shot while it presents some
performance limitations in 0-shot and 1-shot according to the confusion matrix presented in Figure 5 (5b,
5c, 5d). The following example illustrates this scenario in which the song is classified as "Non-Related"
in 0-shot and 1-shot. According to the model, in both responses, the song "contains neither explicit
disdain nor violent intentions against women". In addition, in 0-shot, the model supports the idea
that “although the narrator expresses a certain degree of contempt or resentment due to having been
abandoned, the language used is neither ofensive nor discriminatory toward women”. Also, in 1-shot,
it states a similar reason: “Pejorative language is used to describe the man with whom the woman was
involved, such as the expression (’that bastard’)[...]the underlying intention seems to align more with
a narrative centered on betrayal and personal empowerment.” There is a significant improvement in
3-shot since the model recognizes linguistic cues that carry explicit expressions of misogyny and hate
speech against women such as "Consider yourself dead; your time is up." Also, in 3-shot, the model
interprets the expression “that bastard” as one of the expressions conveying “disdain and resentment
toward the woman.”</p>
          <p>
            From a qualitative perspective, we consider two key aspects from the examples analyzed in this
section. On the one hand, there is a clear improvement in the classification performance of the model.
As previously discussed, the category of “Sexualization” is the most accurately classified, for the reasons
outlined earlier. However, in some instances, the model overlooks certain linguistic cues that may be
crucial for detecting the appropriate category. The categories of “Hate” and “Violence” exhibit notable
0-shot:
&lt;The song contains language that suggests sexual acts and domination games, specifically in
verses like “I’ll be your butler and you’ll play the role of Lady Fine” and “Or I could be your
rapist[...], which imply a sexual relationship based on domination[...]. There are also explicit references
[...]which objectify the female body with a clearly sexual intent. Although the lyrics use figurative
language and a framework of a “seduction game,” the depiction of scenes in which the woman is
objectified and placed in a subordinate role during sexual acts fits the definition of misogyny
through sexualization. No direct expressions of physical or psychological aggression (Violence
category) or explicit hate speech (Hate category) are identified. Therefore, the appropriate classification
is Sexualization(S).&gt;
1-shot:
&lt;The song "Juego de Seducción" is classified as Sexualization (S) [...] the woman is portrayed in a passive
or subordinate role within an implicitly dominant context. [...]especially the line “Or I could be
your rapist, imagination” suggest a power dynamic that implies an unequal relationship in which
the woman is positioned as an object within a sexual game of domination and submission.
[...] “rapist” [...] directly invokes a violent sexual act [...]. Furthermore, the chorus [...] reinforce the
dominant and controlling intent from a sexual angle that instrumentalizes female sexuality. There is
no direct threat of physical violence or clear expression of hate, so it should not be classified
under Violence (V) or Hate (H). [...]. For these reasons, it clearly fits under the category Sexualization
(S).&gt;
3-shot:
&lt;The song "Juego de Seducción" meets the definition of misogyny [...] the lyrics suggest roleplay [...]
and explicitly mention the possibility of being “your rapist”, which implies violent intent and
submission in a sexual context. Although the violence is not directly expressed as a real threat
or harm, the reference to rape within a game scenario sexualizes the woman as a passive,
subordinate object, reinforcing subordination [...] “I’ll take you to the edge” and “your very skin”
are expressions that objectify the female body in a context of sexual domination. Therefore, the
misogyny present is mainly expressed through sexualization, with an implicit component of domination
and submission[...]&gt;
classification limitations and reveal that the arguments developed by the model are significantly more
biased in these cases compared to the “Sexualization” category. Nevertheless, while the classification
of “Hate” improves considerably in responses using 3-shot prompting, the model still struggles to
accurately classify songs containing semantic content related to violence against women. On the other
hand, after a detailed examination of the generated responses, the misclassification of “Violence” as
“Sexualization” or “Non-Related” may be due to the presence of more linguistically complex structures
that convey implicit meanings, which are far more subtle than the explicit language found in songs
classified as "Sexualization". In addition, we could suggest that the model may exhibit greater bias
when interpreting this type of content as a consequence of the perpetuation of gender biases in AI. The
presence of misogyny is a one of the most complex issues to be detected since it requires a nuanced
understanding of language and context [
            <xref ref-type="bibr" rid="ref47">47</xref>
            ]. The reason behind the presence of gender biases in LLMs
is that the training data used to develop a language model is made of human-produced texts containing
diferent types of sociocultural biases held by humans [
            <xref ref-type="bibr" rid="ref48">48</xref>
            ]. In this sense, further work needs to be done
to constitute a more comprehensive research to evaluate debiasing methods to mitigate the presence of
gender and other social bias in AI.
0-shot:
&lt;[...] Colloquial language and expressions reflecting the narrator’s discontent and perceived
superiority over another person (presumably a woman, given the reference to a past romantic
relationship) are evident. However, the lyrics do not contain hate speech, explicit contempt, or any
violent intent directed toward the woman or toward women as a group. [...] Although the narrator
expresses a certain degree of disdain or rejection due to being abandoned, the language used [...] does
not employ hostile terms that would suggest hate speech. Therefore, the song does not meet the
strict definition of misogyny provided and should thus be classified in the "Not Related" (NR) category.&gt;
1-shot:
&lt;The lyrics of the song do not meet the provided definition of misogyny. The text refers to a woman
who allegedly was with another man, expressing a sense of superiority and triumph on the part of the
speaker (“I’m the boss”), and using derogatory terms to refer to the man involved with the woman (“that
bastard”). However, the lyrics do not contain hate speech [...] there are no expressions of violence
or threats directed at her. Furthermore, the underlying intention appears to be more about a
narrative of betrayal and personal empowerment, rather than misogynistic hostility [...] should
be labeled as “Not Related” (NR).&gt;
3-shot:
&lt;The song meets the general definition of misogyny by containing expressions of contempt
and hostility toward a specific woman. The language used-such as “bastard” and phrases that
convey scorn and resentment toward the woman [...]—reflects a discourse of hatred and
disdain directed at her. Additionally, the phrase “Consider yourself dead, your time is up” can be
interpreted as an expression of absolute rejection [...] it does not include an explicit threat
of physical violence, it implies a desire to erase or eliminate her presence [...] the most fitting
category is Hate (H), as the lyrics reflect ofensive language, disdain, and resentment toward a woman.&gt;
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions and Future Work</title>
      <p>This work has addressed the detection of misogyny—an already well-explored topic in NLP—within the
under-researched yet culturally significant domain of song lyrics. By participating in the MiSonGyny
2025 shared task, we contribute to expanding misogyny detection into new discursive contexts.</p>
      <p>Our experiments with a RoBERTa-based system confirm that fine-tuned transformer architectures
remain efective for this task, achieving fifth place in the general misogyny detection subtask. However,
their performance sufers under data imbalance, particularly in the fine-grained classification scenario,
where overfitting is more pronounced, achieving only 31st position on the test ranking.</p>
      <p>In contrast, our exploration of LLMs investigated the impact of in-context learning and
chain-ofthought prompting. Despite limited data, LLMs demonstrated strong performance in the fine-grained
subtask, securing third place and outperforming BERT-like models. Beyond raw performance, we
analysed the LLM’s reasoning capabilities, noting that while it efectively identified explicit categories
like Sexualization (S), it struggled with more implicit ones like Violence (V) and Hostility (H). These
dificulties may stem from the subtlety of these expressions and the model’s inability to justify its
predictions with coherent rationale—issues clearly reflected in the confusion matrix.</p>
      <p>
        Furthermore, we examined the use of LLMs for data augmentation through paraphrasing and synthetic
sample generation. However, the results suggest limited variability in the augmented data, potentially
due to constraints imposed by safety filters or a lack of creativity in LLM-generated text [
        <xref ref-type="bibr" rid="ref17 ref49">49, 17</xref>
        ].
      </p>
      <p>
        Building on our LLM experimentation, we plan to extend our work beyond closed models, utilizing
open LLMs like the Gemma [
        <xref ref-type="bibr" rid="ref50">50</xref>
        ] or Llama [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] series. This will not only enhance reproducibility for the
scientific community but also ofer valuable insights into their comparative performance.
      </p>
      <p>
        On the other hand, although the following work addresses the detection of misogyny in song lyrics,
we believe there is future research to be conducted regarding the detection of stereotypes [
        <xref ref-type="bibr" rid="ref51 ref52">51, 52</xref>
        ] in
song lyrics.
      </p>
      <p>In conclusion, our study presents a multifaceted evaluation of misogyny detection in lyrics, comparing
traditional fine-tuned models and LLMs, and highlighting their respective strengths and limitations. We
provide both quantitative metrics and qualitative insights, ofering new perspectives on LLM reasoning
and their potential for future work in data augmentation and interpretability.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work has been developed under the framework of the following projects respectively: MARTINI
(Grant PCI2022-135008-2) and FairTransNLP-Stereotypes (Grant PID2021-124361OB-C31) both funded
by MCIN/AEI/10.13039/501100011033 and by European Union NextGenerationEU/PRTR.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used used ChatGPT and DeepL in order to: Text
Translation, Paraphrase and reword and Improve Writing style. After using these services, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-10">
      <title>8. Data augmentation</title>
      <sec id="sec-10-1">
        <title>8.1. Paraphrasing prompt</title>
        <p>OBJECTIVE:Generate three paraphrases of the lyrics of the following song, which contains
misogynistic elements and/or hate speech, specifically of the type {TYPE_MISOGYNY}. The
output must be structured in XML format.</p>
        <p>DEFINITION OF MISOGYNY: We speak of a song’s lyrics as containing misogyny when they
contain hate speech or disdain directed at women. Lyrics that promote the subordination or
objectification of women also fall into this category.</p>
        <p>DEFINITION OF {MISOGYNY_TYPE}:
{MISOGYNY_TYPE_DEFINITION}.</p>
        <p>ADDITIONAL RULES:
• Do not remove misogynistic and/or problematic elements from any of the new versions.
• You do not need to respect the rhyme of the verses, but you do need to keep the central
theme as well as the misogyny of {TYPE_MISOGYNY} in the song.
• For the first paraphrase, paraphrase all misogynistic and non-misogynistic content, but
keep the hate speech nature of {TYPE_MISOGYNY} as per the definition provided.
• For the second paraphrase, it paraphrases all the misogynistic verses of the song, while
the non-misogynistic verses are replaced by entirely new ones in line with the original
subject matter of the song as well as its message. The paraphrases of the misogynistic
elements should be in accordance with the definition of {TYPE_MISOGYNY} provided.
• For the third paraphrase, paraphrase all misogynistic and non-misogynistic verses of
the song, but modify the structure of the song as well as the order of the verses, always
maintaining a narrative coherence that resembles the original lyrics but conforms to the
definition of {TYPE_MISOGYNY} provided.
• Before paraphrasing, think about the changes you are going to make and explain them
briefly. Briefly justify why your new version still retains the original essence as well as
why it is misogynistic according to the definition of {TYPE_MISOGYNY} we have provided.</p>
      </sec>
      <sec id="sec-10-2">
        <title>8.2. Synthetic data generation prompt</title>
        <p>OBJECTIVE: Generate a new song lyric from the lyrics of the following song, which contains
misogynistic elements and/or hate speech, specifically of the type {MISOGYNY_TYPE}.
DEFINITION OF MISOGYNY: We speak of a song’s lyrics as containing misogyny when they
contain hate speech or disdain directed at women. Lyrics that promote the subordination or
objectification of women also fall into this category.</p>
        <p>DEFINITION OF {MISOGYNY_TYPE}: {MISOGYNY_TYPE_DEFINITION}.
RULES:
• Preserving misogyny: It is imperative that the song contains similar misogynistic elements
in its new versions. The specific type of misogyny {MISOGYNY_TYPE} and the essence of
the message must remain clearly recognisable.
• Maintain Original Themes: The newly generated song should respect much of the original
themes proposed in the song. For example, if the song is about heartbreak, the new song
should focus on this theme, while if it is about sexual acts, the new version should talk
about this as well.
• Rhymes: The new version does not need to respect the rhymes between verses, but it does
need to maintain misogynistic elements in its final version.
• Mood: Use the feeling/mood associated with the original song to generate the new song. If
the song emanates a feeling of sadness, the new song should maintain this essence, while
if it emanates a feeling of revenge or hatred, the new song should reflect these feelings.
• Rhetorical figures of speech: Introduce at least 2 or 3 metaphors or similes in the song.</p>
        <p>However, these figures of speech should NOT afect the explicitness of the misogyny, if
any.
• Length: The resulting song should be approximately the length of the original song in
terms of the number of verses.
• DON’T JUST PARAPHRASE OR COPY THE ORIGINAL SONG: Generate a new song from
the one provided, BUT don’t mimic the original song in its entirety. Be creative.
• Identify all elements of the original song that relate to the definition of {MISOGYNY_TYPE}.</p>
        <p>These new elements must appear, in one form or another, in the newly generated version.
• Extract and argue the main themes, the genre as well as the mood of the next song.
• Generate a new song: Consider all of the above elements to generate a new song of
acceptable quality that is suficiently diferent from the original.</p>
        <p>Once you have generated the new song, justify why it maintains the themes of the original song
as well as the misogynistic elements. Please respond using the following outline:
&lt;response&gt;
&lt;elements_m&gt;Here you should write down all the misogynistic elements you have identified
in the song along with their relationship to {MISOGYNY_TYPE}. Argue your answer by citing
specific excerpts from the song that have allowed you to reach this conclusion. &lt;/elements_m&gt;
&lt;main_themes&gt;Include here all the main themes surrounding the song with an argument for the
presence of these"&lt;/main_themes&gt;.
&lt;mood&gt;Mention here the main moods conveyed by this song&lt;/mood&gt;.
&lt;genre&gt; Mention here the genre to which the song belongs&lt;/genre&gt;.
&lt;new_lyrics&gt; The lyrics of the new song you have generated &lt;/new_lyrics&gt;.
&lt;justification&gt;Justification of why the new version you have generated maintains the misogyny.
&lt;/justification&gt;
&lt;/response&gt;
Here are the lyrics to the song: {lyrics}</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Harsiwi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fatimah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alfarisy</surname>
          </string-name>
          , Gendered Hate:
          <string-name>
            <surname>How Afective Language Shapes Hate Speech Against Women</surname>
          </string-name>
          ,
          <source>International Journal of Linguistics, Literature and Translation</source>
          <volume>7</volume>
          (
          <year>2024</year>
          )
          <fpage>17</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Nockleby</surname>
          </string-name>
          , Hate speech,
          <source>Encyclopedia of the American constitution 3</source>
          (
          <year>2000</year>
          )
          <fpage>1277</fpage>
          -
          <lpage>1279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. G.</given-names>
            <surname>Moreno-Sandoval</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomares-Quimbaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Barbosa-Sierra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Pantoja-Rojas</surname>
          </string-name>
          ,
          <source>Detection of Hate Speech, Racism and Misogyny in Digital Social Networks: Colombian Case Study, Big Data and Cognitive Computing</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <fpage>113</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cascione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cerulli</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Manerba</surname>
          </string-name>
          , L. Passaro,
          <article-title>Women's Professions and Targeted Misogyny Online</article-title>
          ,
          <source>in: Proceedings of the 10th Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>182</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Davis</surname>
          </string-name>
          , Pop Lyrics:
          <article-title>A Mirror and a Molder Of Society</article-title>
          , Et cetera
          <volume>42</volume>
          (
          <year>1985</year>
          )
          <fpage>167</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Betti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Abrate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaltenbrunner</surname>
          </string-name>
          ,
          <source>Large Scale Analysis of Gender Bias and Sexism in Song Lyrics, EPJ Data Science</source>
          <volume>12</volume>
          (
          <year>2023</year>
          )
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Alcántara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Garcia-Vazquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Espinosa-Juarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>ValdezRodríguez</surname>
          </string-name>
          , E. Felipe-Riveron, Overview of MiSonGyny at IberLEF 2025:
          <article-title>Misogyny Speech Detection in Spanish Language Song Lyrics</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rodríguez-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , L. Plaza,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Comet</surname>
          </string-name>
          , T. Donoso, Overview of EXIST 2021:
          <article-title>Sexism Identification in Social Networks</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <year>2021</year>
          )
          <fpage>195</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rodríguez-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendieta-Aragón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Marco-Remón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Makeienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Overview of EXIST 2022:
          <article-title>Sexism Identification in Social Networks</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          )
          <fpage>229</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          , J. C. de Albornoz,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Overview of EXIST 2023-
          <article-title>Learning with Disagreement for Sexism Identification and Characterization (Extended Overview)</article-title>
          .
          <source>, CLEF (Working Notes)</source>
          (
          <year>2023</year>
          )
          <fpage>813</fpage>
          -
          <lpage>854</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. C. de Albornoz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Maeso</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chulvi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Amigó</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Morante</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , Overview of EXIST 2024-
          <article-title>Learning with Disagreement for Sexism Identification and Characterization in Tweets and Memes (Extended Overview)</article-title>
          ,
          <source>in: Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Schmeisser-Nieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pastells Vilà</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ariza-Casabona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Taulé</given-names>
            <surname>Delor</surname>
          </string-name>
          , et al.,
          <source>Overview of DETESTS-Dis at IberLEF</source>
          <year>2024</year>
          :
          <article-title>DETEction and classification of racial STereotypes in Spanish-Learning with Disagreement</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>73</volume>
          (
          <year>2024</year>
          )
          <fpage>323</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Anzovino</surname>
          </string-name>
          ,
          <source>Overview of the Task on Automatic Misogyny Identification at IberEval</source>
          <year>2018</year>
          , in:
          <source>Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          ),
          <article-title>co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2018</year>
          ), volume
          <volume>2150</volume>
          <source>of CEUR Proc., Seville</source>
          , Spain,
          <year>2018</year>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saibene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lees</surname>
          </string-name>
          , J. Sorensen, SemEval
          <article-title>-2022 Task 5: Multimedia Automatic Misogyny Identification</article-title>
          ,
          <source>in: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>533</fpage>
          -
          <lpage>549</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Firooz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ringshia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Testuggine</surname>
          </string-name>
          ,
          <source>The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes, Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>2611</fpage>
          -
          <lpage>2624</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Satish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khanbayov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schuster</surname>
          </string-name>
          , G. Groh, Tuning Into Bias:
          <article-title>A Computational Study of Gender Bias in Song Lyrics</article-title>
          ,
          <source>in: Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage</source>
          , Social Sciences,
          <article-title>Humanities and Literature (LaTeCH-CLfL</article-title>
          <year>2025</year>
          ),
          <year>2025</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Casanovas-Buliart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Alvarez-Cueva</surname>
          </string-name>
          , C. C.
          <article-title>and, Evolution over 62 years: An Analysis of Sexism in the lyrics of the most-listened-to songs in Spain</article-title>
          ,
          <source>Cogent Arts &amp; Humanities</source>
          <volume>11</volume>
          (
          <year>2024</year>
          )
          <article-title>2436723</article-title>
          . URL: https://doi.org/10.1080/23311983.
          <year>2024</year>
          .
          <volume>2436723</volume>
          . doi:
          <volume>10</volume>
          .1080/23311983.
          <year>2024</year>
          .
          <volume>2436723</volume>
          . arXiv:https://doi.org/10.1080/23311983.
          <year>2024</year>
          .
          <volume>2436723</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vásquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alcántara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soto</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Macias, Overview of HOMO-MEX at IberLEF 2024: Hate Speech Detection Towards the Mexican Spanish Speaking LGBT+ Population</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>73</volume>
          (
          <year>2024</year>
          )
          <fpage>393</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention Is All You Need,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-Training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <string-name>
            <surname>RoBERTa: A Robustly Optimized BERT Pretraining Approach</surname>
          </string-name>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1907</year>
          .11692. arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Warner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chafin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Clavié</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hallström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Taghadouini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gallagher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ladhak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Aarsen</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Smarter</surname>
            , Better, Faster, Longer:
            <given-names>A Modern</given-names>
          </string-name>
          <string-name>
            <surname>Bidirectional</surname>
          </string-name>
          <article-title>Encoder for Fast, Memory Eficient, and Long Context Finetuning and Inference</article-title>
          ,
          <source>arXiv preprint arXiv:2412.13663</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>O.</given-names>
            <surname>Team</surname>
          </string-name>
          , GPT-4
          <source>Technical Report</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2303.08774. arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>M. AI</surname>
          </string-name>
          ,
          <source>The Llama 3 Herd of Models</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407.21783. arXiv:
          <volume>2407</volume>
          .
          <fpage>21783</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>P.</given-names>
            <surname>Törnberg</surname>
          </string-name>
          ,
          <source>Large Language Models Outperform Expert Coders and Supervised Classifiers at Annotating Political Social Media Messages, Social Science Computer Review</source>
          <volume>0</volume>
          (
          <issue>0</issue>
          )
          <fpage>08944393241286471</fpage>
          . doi:
          <volume>10</volume>
          .1177/08944393241286471.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <source>Language Models are Few-Shot Learners, Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , et al.,
          <source>Chain-OfThought Prompting Elicits Reasoning In Large Language Models, Advances in neural information processing systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <article-title>The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>30</volume>
          (
          <year>1997</year>
          )
          <fpage>1145</fpage>
          -
          <lpage>1159</lpage>
          . doi:https://doi.org/10.1016/ S0031-
          <volume>3203</volume>
          (
          <issue>96</issue>
          )
          <fpage>00142</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chicco</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Jurman, The Advantages Of the Matthews Correlation Coeficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation, BMC Genomics 21 (</article-title>
          <year>2020</year>
          )
          <article-title>6</article-title>
          . doi:
          <volume>10</volume>
          .1186/ s12864-019-6413-7.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gutiérrez Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Armengol</given-names>
            <surname>Estapé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Llop</given-names>
            <surname>Palao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Silveira</given-names>
            <surname>Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pio Carrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Armentano Oller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rodriguez Penagos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Gonzalez</given-names>
            <surname>Agirre</surname>
          </string-name>
          , M. Villegas,
          <source>MarIA: Spanish Language Models, Procesamiento del Lenguaje Natural</source>
          <volume>68</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>N.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          , Dropout:
          <string-name>
            <given-names>A Simple</given-names>
            <surname>Way To Prevent Neural Networks from Overfitting</surname>
          </string-name>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>15</volume>
          (
          <year>2014</year>
          )
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>E.</given-names>
            <surname>Okur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nachman</surname>
          </string-name>
          ,
          <article-title>Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Béchet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cieri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goggi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>4114</fpage>
          -
          <lpage>4125</lpage>
          . URL: https://aclanthology. org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .437/.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zeng</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>AugGPT: Leveraging ChatGPT for Text Data Augmentation</article-title>
          ,
          <source>IEEE Transactions on Big Data</source>
          <volume>11</volume>
          (
          <year>2025</year>
          )
          <fpage>907</fpage>
          -
          <lpage>918</lpage>
          . doi:
          <volume>10</volume>
          .1109/TBDATA.
          <year>2025</year>
          .
          <volume>3536934</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations</article-title>
          ,
          <source>in: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>10443</fpage>
          -
          <lpage>10461</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          , et al.,
          <article-title>mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval</article-title>
          ,
          <source>in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1393</fpage>
          -
          <lpage>1412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>N.</given-names>
            <surname>Muennighof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Magne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          , MTEB: Massive Text Embedding Benchmark, in: A.
          <string-name>
            <surname>Vlachos</surname>
          </string-name>
          , I. Augenstein (Eds.),
          <source>Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , Association for Computational Linguistics, Dubrovnik, Croatia,
          <year>2023</year>
          , pp.
          <fpage>2014</fpage>
          -
          <lpage>2037</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .eacl-main.
          <volume>148</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Katiyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Artzi</surname>
          </string-name>
          ,
          <article-title>Revisiting Few-sample BERT Fine-tuning</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          , Decoupled Weight Decay Regularization, in: International Conference on Learning
          <source>Representations (ICLR)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          ,
          <string-name>
            <surname>Introducing</surname>
            <given-names>GPT</given-names>
          </string-name>
          <source>-4</source>
          .1 in
          <string-name>
            <surname>the</surname>
            <given-names>API</given-names>
          </string-name>
          , https://openai.com/index/gpt-4-1/,
          <year>2025</year>
          . Accessed on 19-05-
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hendrycks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Basart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mazeika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steinhardt</surname>
          </string-name>
          ,
          <article-title>Measuring Massive Multitask Language Understanding</article-title>
          ,
          <source>Proceedings of the International Conference on Learning Representations (ICLR)</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hendrycks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Basart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Critch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steinhardt</surname>
          </string-name>
          ,
          <string-name>
            <surname>Aligning AI With Shared Human Values</surname>
          </string-name>
          ,
          <source>Proceedings of the International Conference on Learning Representations (ICLR)</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brahma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , L. Hou,
          <source>InstructionFollowing Evaluation for Large Language Models</source>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2311.07911. arXiv:
          <volume>2311</volume>
          .
          <fpage>07911</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <article-title>Art or Artifice? Large Language Models and the False Promise of Creativity, author=Chakrabarty, Tuhin and Laban, Philippe and Agarwal, Divyansh and Muresan, Smaranda</article-title>
          and Wu, Chien-Sheng,
          <source>in: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>"It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bellemare-Pepin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lespinasse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Thölke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Harel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mathewson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Olson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jerbi</surname>
          </string-name>
          ,
          <source>Divergent Creativity in Humans and Large Language Models, arXiv preprint arXiv:2405.13012</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>D.</given-names>
            <surname>Battista</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Molano</surname>
          </string-name>
          , et al.,
          <article-title>How ai bots have reinforced gender bias in hate speech, Ex aequo (</article-title>
          <year>2023</year>
          )
          <fpage>53</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>A.</given-names>
            <surname>Caliskan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Ajay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Charlesworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wolfe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Banaji</surname>
          </string-name>
          ,
          <article-title>Gender bias in word embeddings: A comprehensive analysis of frequency, syntax, and semantics</article-title>
          ,
          <source>in: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>156</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ismayilzada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stevenson</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. van der Plas</surname>
          </string-name>
          ,
          <source>Evaluating Creative Short Story Generation in Humans and Large Language Models, arXiv preprint arXiv:2411.02316</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <surname>Team</surname>
          </string-name>
          ,
          <article-title>Gemma and Kamath, Aishwarya and Ferret, Johan and Pathak, Shreya and Vieillard, Nino and Merhej, Ramona and Perrin, Sarah and Matejovicova, Tatiana and Ramé, Alexandre and Rivière, Morgane and others</article-title>
          ,
          <source>Gemma 3 Technical Report, arXiv preprint arXiv:2503.19786</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>E. U.</given-names>
            <surname>Alacreu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>Identification of Racial and Sexist Stereotypes in Spanish: A Learning with Disagreements Approach</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>74</volume>
          (
          <year>2025</year>
          )
          <fpage>15</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giachanou</surname>
          </string-name>
          , E. Lefever,
          <article-title>Stereotype detection in natural language processing</article-title>
          ,
          <source>arXiv preprint arXiv:2505.17642</source>
          (
          <year>2025</year>
          ).
          <article-title>letras de canciones que no pertenecen a ninguna de las categorías anteriores (es decir, no cumplen la definición de misoginia)</article-title>
          . &lt;/descripcion&gt;
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>