<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Comparison of Fine-Tuning and Prompting⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ainhoa Guerrero-San Martín</string-name>
          <email>ainhoa.guerrero@bbva.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenceslao González-Viñas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>César de Pablo-Sánchez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Banco Bilbao Vizcaya Argentaria</institution>
          ,
          <addr-line>S.A., Bilbao</addr-line>
          ,
          <country>España</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Physics and Applied Mathematics, University of Navarra</institution>
          ,
          <addr-line>Pamplona</addr-line>
          ,
          <country>España</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Data Science and Artificial Intelligence (DATAI), University of Navarra</institution>
          ,
          <addr-line>Pamplona</addr-line>
          ,
          <country>España</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>As large language models (LLMs) improve in their interactions with humans, it is essential to evaluate whether they truly understand human behavior and can detect emotions. To explore this, we reviewed key emotion profiles and emotionally annotated datasets used for emotion detection. We then compared a model fine-tuned on such data in English and Spanish with a larger zero-shot model using prompting. Our goal was to assess whether a model with extensive parameters becomes so effective that domainspecific retraining is no longer necessary. Results show that although fine-tuned models remain more accurate, the performance gap is narrowing. This raises the question of whether the additional computational cost and time required for domain adaptation are justified, given the increasingly marginal gains.</p>
      </abstract>
      <kwd-group>
        <kwd>generative artificial intelligence</kwd>
        <kwd>large language model</kwd>
        <kwd>natural language processing</kwd>
        <kwd>ChatGPT</kwd>
        <kwd>fine-tuning</kwd>
        <kwd>zero-shot</kwd>
        <kwd>transfer learning</kwd>
        <kwd>emotion analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the era of large language models (LLMs) and their growing use in human interaction, it seems
pertinent to assess whether these LLMs can detect human emotion. As they become integrated into
customer service, education, and virtual assistance, the ability to interpret emotional and social
nuances is increasingly vital [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The correct detection of emotions not only enhances the precision
and naturalness of interactions but also contributes to the development of systems that appropriately
respect and respond to the complexities of human emotions and behaviors.
      </p>
      <p>Emotions are essential adaptive mechanisms that enable humans to respond to various
environmental challenges.</p>
      <p>Regarding LLMs, our objective is to determine whether the additional computational cost of
finetuning a model for a specific application or domain is justified, considering that these models have
already been pre-trained on extensive datasets with a large number of parameters.</p>
      <p>In the following sections, we review the types of emotional profiles that exist, as well as the datasets
with annotations used for emotion detection. This framework enables us to compare somewhat
smaller language models (SLMs) fine-tuned on domain-specific datasets with large language models
(LLMs) operating in zero-shot learning settings, all evaluated using the same efficiency metrics and
dataset.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Emotional Profiles</title>
      <p>The word "emotion" comes from the Latin "emovere" which means an impulse that induces action.
Therefore, emotions can be defined as primary psychophysiological responses to internal or external
stimuli. Emotions are immediate, intense, and generally short-lived.</p>
      <p>To better understand emotions, it is important to distinguish and define what feelings are, as the
two terms are often used interchangeably despite having distinct meanings. Feelings are subjective,
conscious interpretations of emotions; they tend to be longer-lasting and less intense and are usually
influenced by past experiences, beliefs, and thoughts. In essence, feelings are the sum of emotions
and thoughts—that is, they are the result of emotions. An emotion transforms into a feeling as one
becomes aware of it.</p>
      <p>
        One of the characteristics of emotions is that they are evolutionary, and adaptive, and influence
thinking, consequently, they affect decision-making and social interaction. Additionally, they serve as
adaptation mechanisms that help us face different situations in our environment [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Throughout modern history, various interpretations of emotion have emerged, along with models or
emotional profiles. These profiles can be divided into two types:


      </p>
      <p>Discrete or Categorical: separate classes of emotions with no inherent relationship among
them.</p>
      <p>Dimensional: emotions are considered interconnected, with each emotion represented as a
point influenced by different characteristics in a multidimensional space.</p>
      <p>
        Within the first type is Ekman's emotional profile [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], based on facial expressions. He differentiates
six types of basic emotions: anger, disgust, fear, joy, sadness and surprise.
      </p>
      <p>
        Ortony, Clore, and Collins developed another categorical emotional model (OCC) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] based on a
cognitive approach to observing emotions. They define 22 categories of emotions (see Figure 1) that
depend on how we interpret events, agents, or objects according to our goals, beliefs, and values.
      </p>
      <p>
        Among multidimensional models, one type of emotional profiling is that of Plutchik [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], who notes
that emotional states exhibit certain similarities, allowing them to be combined in varying degrees of
intensity to give rise to secondary or tertiary emotions. He defines emotions through a wheel of eight
primary emotions, which are as follows: anger, aversion (disgust in Ekman's model), fear, joy, sadness,
surprise, anticipation, and trust.
      </p>
      <p>
        Another multidimensional model is Russell's model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which measures human emotional states
using two variables: valence and activation. Valence indicates whether an emotional experience is
positive or negative on a continuous scale. Activation describes the intensity with which an emotion
is experienced, ranging from inactive to highly active. Russell places these variables within a
coordinate system, with valence as the horizontal axis and activation as the vertical. This results in a
circular distribution of emotions, where "prototypical" states are located at the center of the graph.
This approach is known as the circumplex model of emotions (Figure 2).
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Emotional Datasets</title>
      <p>We reviewed public datasets with emotional annotations to identify those that serve our goal of
detecting emotions.</p>
      <p>
        The first dataset examined is SemEval-2018 by Mohammad et al. (2018) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], created with 22,000
tweets obtained from the X app, formerly known as Twitter, in three languages: English (49% of the
tweets), Arabic (20% of the tweets), and Spanish (31% of the tweets). The tweets were classified using
multi-class labels with 11 emotions plus a neutral label. The classification task was manually
performed on 5% of each emotion to obtain a golden dataset or validation base. The rest of the
classification was carried out by collaborative crowdsourcing teams who used different methods of
automatic classification and natural language processing methodologies, such as embeddings trained
from Spanish tweets (for example, those provided by Rothe et al., 2016 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) which were used as a
basis to train deep learning models, including convolutional neural networks and recurrent neural
networks with gated units such as LSTM and GRU. Traditional machine learning models were also
used, such as Support Vector Machine (SVM), along with Spanish affective lexicons to construct the
feature space, like the Spanish emotion lexicon by Sidorov et al. (2012) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and ML-SentiCon by Cruz
et al. (2014) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Other datasets have been constructed as derivative work from the SemEval-2018 dataset:
TweetEval, by Barbieri et al. (2020) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] which exclusively include English tweets from SemEval-2018
labeled with a single class for four basic emotions: anger, sadness, optimism, and joy. Since the
number of tweets with single labels was limited, only those emotions with a minimum frequency of
300 examples in the training set were included.
      </p>
      <p>
        MELD was developed from multi-participant dialogues from the TV series Friends, containing 13K
utterances from 1.4K dialogues in English. It was classified manually using 6 of Ekman’s emotions plus
a neutral category considering all available modalities like text, audio, and video as described by Poria
et al. (2018) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        GoEmotions by Demszky et al. (2020) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] contains 58K messages downloaded from Reddit from
2005 to 2019 in English, with a taxonomy based on 27 emotions plus a neutral label, obtained through
an initial classification based on emotions identified by Cowen and Keltner (2017) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The data set
was classified manually with three annotators.
      </p>
      <p>
        ExTES is an example of a synthetically generated dataset, described by Zheng et al. (2023) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
For its development, comprehensive emotional support scenarios that integrate response strategies
are used to simulate chat-based dialogues. Then, an expanded set of dialogues is generated using a
GPT-3.5-turbo model, followed by a manual correction process. This iterative process generates a
large dataset while reducing the amount of human labor required.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental setting</title>
      <p>For our experiment, we selected the SemEval-2018 dataset, which is widely adopted by the research
community for emotion detection tasks. This dataset includes tweets in both Spanish and English and
employs a classification based on a simplified discrete profile, thereby avoiding interdependencies
among emotions. Additionally, only four basic single-class emotions were considered, providing an
adequate representation of the human emotional profile. For English messages, the TweetEval
subset—derived from SemEval-2018—was used, while the Spanish dataset was constructed by
filtering under the same premises.</p>
      <p>
        To address this task, we have adopted the approach described in Zhang et al. (2024) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which
explores whether generative large language models (LLMs) using zero-shot prompting deliver
classification results comparable to that of smaller language models (SLMs) fine-tuned on the same
datasets across various sentiment analysis tasks. Fine-tuning is a form of transfer learning in which a
pre-trained model is adjusted to better adapt to new data [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        Their findings indicate that zero-shot prompted LLMs yield results comparable to fine-tuned SLMs
in simpler tasks, such as sentiment classification. However, for more complex tasks—such as opinion
classification, which involves extracting both polarity and topic—there remains significant room for
improvement compared to fine-tuned models on domain-specific data. Additionally, they conducted
a comparative study on emotion detection using the TweetEval dataset, as described in Section 2.2
(Barbieri et al., 2020) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This dataset consists of 5,052 English tweets, divided into training, testing,
and validation sets (64%, 28%, and 8%, respectively).
      </p>
      <p>
        In this study, we first aim to replicate the emotion detection findings using the same TweetEval
dataset and evaluate their transferability to another language, specifically Spanish. We employed two
language models: a smaller model T5-Base (hereafter T5), fine-tuned with the dataset using the Adam
optimizer with a learning rate of 1e-4, and a fixed batch size of 4 for all tasks with 3 epochs for the
full training setting and a larger model GPT 3.5-Turbo, version 0125 (hereafter 3.5), in a zero-shot
configuration [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The evaluation metric was the average macro F1 score over three runs with
different random seeds.
      </p>
      <p>
        Similarly, we explored more advanced versions of the GPT model: the 4o-mini version, released
on July 18, 2024 (hereafter 4o) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and a preliminary version of GPT-4.5 (hereafter 4.5) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
Additionally, we conducted experiments using two Spanish datasets: TweetEval, which was translated
into Spanish using the 4o model, and a subset of SemEval-2018 containing Spanish tweets. Aware of
the potential biases introduced by literal translation, we included the latter to provide a more robust
evaluation on native content. This subset consists of 7,000 tweets, divided into training, testing, and
validation sets (50%, 40%, and 10%, respectively). For both datasets, only the unique classes
corresponding to the four emotions represented in TweetEval were selected, and a comparative
analysis across the four models was conducted.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>The average results obtained from three runs with a random seed in our experiment were consistent
with those reported in the reference article. Although some differences are observed, they are likely
due to the use of different random seeds. For T5, the difference is -0.58%, and for 3.5, it is -0.55%
(see Table 1).</p>
      <p>It can also be observed that in the Spanish dataset, 4o and 4.5 models outperform the adjusted
T5, suggesting that the effort dedicated to dataset-specific training may not be justified. In contrast,
for the English dataset, the fine-tuned model is even more efficient (+2.25%). In this case, it is
necessary to assess whether the efficiency gains sufficiently offset the additional training time
required.</p>
      <p>T5
80.35
79.77
63.96</p>
      <p>Table 2 presents a comparative analysis between some versions of the GPT models and the
reference T5 model for detecting the four emotions separately across two datasets—SemEval and
TweetEval (the latter in both English and Spanish). For the English dataset, the results indicate that
the smallest fine-tuned model more effectively detects the emotions of joy and optimism, while the
GPT models demonstrate greater efficiency in identifying sadness and anger—although the latest
version shows a decrease in efficiency for anger.</p>
      <p>Regarding the detection of emotion types in the Spanish datasets, the GPT models outperform the
T5 model in classifying all emotions. Furthermore, compared to previous versions, the latest GPT
model exhibits improved performance in detecting optimism, while its ability to detect joy has
diminished.</p>
      <p>We analyzed tweets with label mismatches (Table 3), where discrepancies often stem from
emotional ambiguity and the lack of contextual information.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>In this study, we reviewed emotional profiles and corpus for emotion detection and compared two
tweet datasets—TweetEval (in English and Spanish) and SemEval-2018—using a fine-tuned T5-Base
model versus advanced GPT versions with zero-shot prompting.</p>
      <p>The results indicate that using prompting with large generative models is increasingly accurate for
the problem of emotion classification.</p>
      <p>The difference between them and a domain-specific fine-tuned model is diminishing. For the
Spanish datasets, the large models demonstrate better accuracy than that offered by a model
finetuned on the Spanish dataset.</p>
      <p>Additionally, regarding the specific detection of emotions, the latest version of GPT shows
improved detection of all emotions with the Spanish dataset compared to the smaller fine-tuned
model. However, with the English dataset, the detection of joy still has room for further refinement.</p>
      <p>Based on these results, it is observed that for Spanish datasets, the time spent fine-tuning the
model appears unnecessary; however, for English datasets, further improvements in detecting joy
are needed to enhance overall efficiency.</p>
      <p>As future work, we plan to extend this analysis to the domain of turn-based conversation, with
the goal of evaluating model performance in more dynamic and context-rich interaction settings.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>The authors would like to thank BBVA and Universidad de Navarra for their valuable support
throughout this research and financial support.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>(by using the activity taxonomy in ceur-ws.org/genai-tax.html):
During the preparation of this work, the authors used GPT-4 in order to: Grammar and spelling check.
After using these tools, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Appendix. Prompts for Emotion Classification</title>
      <p>English Prompt Zero-Shot: "Please perform Emotion Classification task. Given the sentence, assign an
emotion label from ['0' to anger, '1' to joy, '2' to optimism, '3' to sadness]. Return label only without
any other text."</p>
      <p>Spanish Prompt Zero-Shot: "Por favor, realiza la tarea de Clasificación de Emociones. Dada la
oración, asigna una etiqueta de emoción de entre ['0' para enojo, '1' para alegría, '2' para optimismo, '3'
para tristeza]. Devuelve únicamente la etiqueta sin ningún otro texto."</p>
      <p>Prompt to translate TweetEval: "You are a translator. Translate the following text into Spanish.
Provide only the translation without any extra text."</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Cowie</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douglas-Cowie</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsapatsoulis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Votsis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kollias</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fellenz</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , J.G.,
          <article-title>Emotion recognition in human-computer interaction</article-title>
          ,
          <source>IEEE Signal Processing Magazine</source>
          , (
          <year>2001</year>
          ), pp.
          <fpage>32</fpage>
          -
          <lpage>80</lpage>
          . DOI:
          <volume>10</volume>
          .1109/79.911197
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Keltner</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Haidt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Social functions of emotions at four levels of analysis</article-title>
          ,
          <source>Cognition &amp; Emotion</source>
          , (
          <year>1999</year>
          ), pp.
          <fpage>505</fpage>
          -
          <lpage>521</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ekman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <article-title>Facial expressions of emotion: New findings, new questions</article-title>
          , Psychological Science, (
          <year>1992</year>
          ), pp.
          <fpage>34</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Ortony</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clore</surname>
            ,
            <given-names>G. L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Collins</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <source>The Cognitive Structure of Emotions [La estructura cognitiva de las emociones]</source>
          ,
          <string-name>
            <surname>Siglo XXI de España Editores</surname>
          </string-name>
          , Madrid,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Plutchik</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Emotion:
          <string-name>
            <given-names>A Psychoevolutionary</given-names>
            <surname>Synthesis</surname>
          </string-name>
          , Harper &amp; Row, New York,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Barrett</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          ,
          <article-title>Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant</article-title>
          ,
          <source>Journal of Personality and Social Psychology</source>
          , (
          <year>1999</year>
          ), pp.
          <fpage>805</fpage>
          -
          <lpage>819</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bravo-Marquez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salameh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kiritchenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Semeval
          <article-title>-2018 task 1: Affect in tweets</article-title>
          ,
          <source>in: Proceedings of the 12th International Workshop on Semantic Evaluation</source>
          , (
          <year>2018</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Rothe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ebert</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schütze</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>Ultradense word embeddings by orthogonal transformation</article-title>
          ,
          <source>arXiv preprint arXiv:1602.07572</source>
          , (
          <year>2016</year>
          ). URL: https://arxiv.org/abs/1602.07572
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranda-Jiménez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viveros-Jiménez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.,
          <article-title>Empirical study of machine learning based approach for opinion mining in tweets</article-title>
          ,
          <source>in: Lecture Notes in Computer Science</source>
          , (
          <year>2013</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -37807-
          <issue>2</issue>
          _
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>F. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troyano</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pontes</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ortega</surname>
            ,
            <given-names>F. J.,</given-names>
          </string-name>
          <article-title>ML-SentiCon: A Multilingual Lexicon of Semantic Polarities at the Lemma Level</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          , (
          <year>2014</year>
          ), pp.
          <fpage>113</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camacho-Collados</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neves</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Espinosa-Anke</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification</article-title>
          , ArXiv, abs/
          <year>2010</year>
          .12421, (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2010</year>
          .12421
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Poria</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hazarika</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , et al.,
          <article-title>MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations, ArXiv</article-title>
          , abs/
          <year>1810</year>
          .02508, (
          <year>2018</year>
          ). URL: https://arxiv.org/abs/
          <year>1810</year>
          .02508
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Demszky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Movshovitz-Attias</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ko</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.,
          <article-title>GoEmotions: A Dataset of Fine-Grained Emotions, in: Annual Meeting of the Association for Computational Linguistics</article-title>
          , (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Cowen</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Keltner</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <article-title>Self-report captures 27 distinct categories of emotion bridged by continuous gradients</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences</source>
          , (
          <year>2017</year>
          ), pp.
          <fpage>E7900</fpage>
          -
          <lpage>E7909</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Building emotional support chatbots in the era of LLMs</article-title>
          , arXiv preprint arXiv:
          <volume>2308</volume>
          .
          <fpage>11584</fpage>
          , (
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2308.11584
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Bing</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Sentiment analysis in the era of large language models: A reality check</article-title>
          ,
          <source>Findings of the Association for Computational Linguistics: NAACL</source>
          <year>2024</year>
          , (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S. J.,</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <article-title>A survey on transfer learning</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          , (
          <year>2010</year>
          ), pp.
          <fpage>1345</fpage>
          -
          <lpage>1359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <article-title>OpenAI, ChatGPT: Optimizing language models for dialogue, 2022</article-title>
          . URL: https://platform.openai.com/docs/models/gpt-3.5-turbo
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19] OpenAI,
          <string-name>
            <surname>GPT-</surname>
          </string-name>
          4o-mini: Experimental version,
          <issue>18</issue>
          <year>July 2024</year>
          . URL: https://platform.openai.com/docs/models/gpt-4o-mini
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[20] OpenAI, GPT-4</source>
          .5 preliminary release:
          <source>Preliminary technical report</source>
          ,
          <year>2024</year>
          . URL: https://platform.openai.com/docs/models/gpt-4.5-preview
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>