<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Emotions⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luana Bulla</string-name>
          <email>luana.bulla@phd.unict.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Misael Mongiovì</string-name>
          <email>misael.mongiovi@cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>NLP, Machine Learning, NLI, Emotion Regression</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research Council</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The paper is organized as follows: Section 2 introduces</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>We present our model at EmotivITA</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>cessing and Speech Tools for Italian</institution>
          ,
          <addr-line>Sep 7 - 8, Parma, IT</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>predicting one of the Valence</institution>
          ,
          <addr-line>Arousal, or Dominance</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Emotions are an integral part of human communication, and accurately detecting and interpreting them from textual data holds significant potential for numerous applications. The Valence-Arousal-Dominance (VAD) dimensional model provides a rich framework for capturing the nuanced emotional states conveyed through text. This paper presents an in-depth exploration of text-based emotion detection using VAD analysis and machine learning techniques. In this paper, we propose a novel machine-learning model specifically designed to detect VAD dimensions from textual data. Through empirical evaluation and comparison with existing methods, we analyze the results of our proposed model in accurately identifying emotions expressed in text. This paper describes the ISTC-CNR participation in the EmotivITA task at EVALITA 2023, showcasing our team's eforts and findings in the context of the competition.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction
ence, influencing our perception, behavior, and overall
well-being. Traditionally, emotions have been studied
using categorical models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which assign discrete labels
to distinct emotional states. However, the limitations of
such models in capturing the richness and variability of
els. Among these, the VAD dimensional model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has
gained considerable attention due to its ability to provide
a multi-dimensional representation of emotions.
According to the VAD paradigm, emotions may be classified
along three main dimensions: valence, which describes
the pleasantness or unpleasantness of an emotional
experience; arousal, which reflects the level of
physiologiwhich denotes the degree of control or influence exerted
by an emotion. This dimensional framework ofers a
more fine-grained understanding of emotional
experiences, enabling researchers to capture the subtle nuances
and interplay of emotions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>The primary objective of this paper is to explore text</title>
      <p>nEvelop-O
(M. Mongiovì)
(M. Mongiovì)
2https://sites.google.com/view/emotivita/the-tasks
performance of the VAD MNLI-XML-ROBERTA-based
system with the baseline BERT-based model provided
by the organizers. Finally, Section 4 summarizes the key
ifndings, discusses the practical implications, and
outlines potential directions for future research.</p>
      <sec id="sec-2-1">
        <title>2. Description of the Systems</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>We describe our NLI-based method for recognizing fine</title>
      <p>grained emotions along the continuous VAD dimensions.
Our approach encompasses both the dimensional
(subtask A) and multi-dimensional (sub-task B) VAD
regression tasks, following a unified methodology framework.
To accomplish this, we introduce a supervised model
developed within a constrained mode, leveraging the
EmoITA. Section 2.1 provides a comprehensive overview
of our methodology, ofering detailed insights into the
key components and techniques employed in our
NLIbased approach. Section 2.2 provide an overview of the
prompts utilized during the training phase.
sentence convey?). For arousal, the prompt is “quanto
è eccitante la frase?” (how exciting is the sentence?).
Lastly, for dominance, the prompt is “quanto è controllata
l’emozione?” (how controlled is the emotion?). These
prompts are crafted to succinctly encapsulate the essence
of each emotional dimension using easily understood
language. Their purpose is to guide the model’s training
process toward discerning specific emotional dimensions
accurately. In the multi-dimensional emotion analysis
task, we utilize a single prompt to predict all emotion
dimensions. The prompt used is “valence, arousal,
dominance dell’emozione?” (valence, arousal, dominance of
the emotion?). In this case, the prompt assists the model
in comprehending the comprehensive framework of the
task during the training phase, providing contextual
information. By leveraging these prompts, our objective is
to enhance the model’s capacity to capture and
comprehend the intricate nuances of emotional dimensions in
text.</p>
      <sec id="sec-3-1">
        <title>3. Results and Evaluation</title>
        <p>2.1. VAD MNLI-XML-RoBERTa model</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>We conduct an experimental analysis to assess the efec</title>
      <p>
        We present the VAD MNLI-XML-RoBERTa model, a fine- tiveness of our proposed approach and compare it with
tuned version of the multilingual XML-RoBERTa system, the baseline method provided by the competition’s
orwhich has been developed using an automatically trans- ganizers using EmoITA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In Section 3.1, we present
lated Italian version of the MNLI corpus3. To adapt the the results of the dimensional emotion analysis,
focusmodel for the regression task, we replace the last linear ing on the individual evaluation of valence, arousal, and
layer of the MNLI-XML-RoBERTa model. During train- dominance. Here, we examine the performance of our
ing, the parameters of the last linear layer are initialized approach and provide a comprehensive analysis of its
from scratch and learned, while the remaining parame- eficacy. Moving on to Section 3.2, we shift our focus to
ters are fine-tuned. To tune the hyperparameter, we train multi-dimensional emotion analysis. In this section, we
the model on a portion of the training set provided by evaluate the performance of our approach in capturing
the organizers in a preliminary phase. Our model utilizes and predicting emotional dimensions collectively.
input sentences as premises, accompanied by predefined
ifxed prompts that serve as hypotheses. These prompts 3.1. Dimensional Emotion Analysis
aim to guide the model’s training process by focusing
on specific assignments. The VAD MNLI-XML-RoBERTa
model is trained with a learning rate of 1e-5, a batch size
of 64, and AdamW as the optimizer. The Mean Squared
Error (MSE) is adopted as the loss function, measuring
the disparity between the model’s predictions and the
actual values4.
      </p>
    </sec>
    <sec id="sec-5">
      <title>We train three separate VAD MNLI-XML-RoBERTa mod</title>
      <p>els, each focused on detecting the degree of valence,
arousal, and dominance values individually. During
training, each model focuses on a specific target dimension
and is associated with a specific prompt (Sect. 2).</p>
      <p>Tables 3.1 and 3.1 showcase the results obtained by
evaluating our MNLI-based model on 2′063 items from
2.2. Prompting the Italian EmoBank test set provided by the organizers.
For the dimensional emotion analysis task, we submitted
We introduce a set of tailored prompts designed to pre- two diferent runs corresponding to distinct
configuradict the values of valence, arousal, and dominance in tions of the fine-tuned VAD MNLI-XML-RoBERTa model.
text. For the dimensional emotion analysis task, we de- During the model training phase, the first run
incorpovelop three prompts, each targeting a specific emotion rates 99% of the data from the development set, while
dimension. The prompt for valence is “quanta positiv- the second run utilizes the entire development set. Each
ità esprime la frase?” (how much positivity does the emotional dimension is evaluated based on Pearson’s
correlation coeficient (Table 3.1) and Mean Absolute Errors
(Table 3.1). To assess the efectiveness of our model, we</p>
    </sec>
    <sec id="sec-6">
      <title>3https://huggingface.co/Jiva/xlm-roberta-large-it-mnli 4https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html</title>
      <p>4. Discussion
compare its performance against a baseline BERT-based
model provided by the organizers. Our NLI-based model
demonstrates slightly higher Pearson’s correlation val- As anticipated, performance varied significantly across
ues for valence compared to the baseline, while the other diferent emotion values in both dimensional (i.e.
submetrics show slightly lower scores. Notably, valence ex- task A) and multi-dimensional (i.e. sub-task B) emotion
hibits the highest performance in terms of correlation, regression tasks. The concept of valence demonstrated
while arousal and dominance present more challenges, consistent interpretation, resulting in higher correlation
with the baseline model achieving a correlation of only scores for all models in both analyses. Conversely,
detect0.64%. ing concepts like “arousal” and “dominance” proved more
challenging in the context of short text documents.
Com3.2. Multi-dimensional Emotion Analysis paring the two models, the BERT-based model yielded the
best results, while the NLI-based model exhibited slightly
We present the performance of models designed for lower performance in both tasks. This discrepancy can be
detecting valence, arousal, and dominance in a multi- attributed to the multilingual nature of the
MNLI-XMLdimensional setting. In this setting, both the baseline and RoBERTa model, which grants it greater adaptability but
MNLI-based models predict all three values by utilizing limits its ability to identify nuanced and ambiguous terms
all three VAD dimensions during the training phase. For like “dominance” and “excitement” within text.
Additionthis task, we submitted a single run where we employed ally, the NLI-based models employed a generic prompt for
all the available development set data for training the every item in the dataset as a hypothesis. This approach
model. We report the performances of the VAD MNLI- potentially underutilized the model’s capabilities as the
XML-RoBERTa system for each emotion dimension, eval- prompt lacked the necessary explanatory power to
efecuating them in terms of Pearson’s correlation coeficient tively tackle challenging regression tasks. In conclusion,
(Table 3.2) and Mean Absolute Error (Table 3.2). We com- while the models demonstrated strengths in capturing
pare our results with the baseline performance provided valence, improvements are needed to enhance their
perby the organizers. formance in discerning arousal and dominance. This
in</p>
      <p>Both the baseline and NLI-based models demonstrate cludes refining the model’s adaptability to diverse
linguissuperior performance in detecting valence, achieving a tic contexts and developing more informative prompts
correlation of approximately 80%. As previously men- for comprehensive analysis of emotional dimensions.
tioned, the scores for the arousal and dominance
dimensions are comparatively lower than valence. Specifically,
the highest correlation values observed for arousal and 5. Conclusion
dominance values are 0.652 and 0.654, respectively.</p>
    </sec>
    <sec id="sec-7">
      <title>In our study, we propose a novel approach for detecting valence, arousal, and dominance values in natural language short text using an MNLI-XML-RoBERTa-based model. Our method encompasses both dimensional and</title>
      <p>multi-dimensional regression scenarios, leveraging a
finetuned XML-RoBERTa classifier trained and evaluated on
the EmoITA. We present the results of both the
dimensional and multi-dimensional configurations’ models and
compare them with the performance of the BERT-based
model provided by the competition’s organizers. This
evaluation provides valuable insights into the
efectiveness of our proposed approach in capturing and analyzing
emotional dimensions in textual data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Acheampong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wenyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nunoo-Mensah</surname>
          </string-name>
          ,
          <article-title>Text-based emotion detection: Advances, challenges, and opportunities</article-title>
          ,
          <source>Engineering Reports</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <article-title>e12189</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mehrabian</surname>
          </string-name>
          ,
          <article-title>Evidence for a threefactor theory of emotions</article-title>
          ,
          <source>Journal of research in Personality</source>
          <volume>11</volume>
          (
          <year>1977</year>
          )
          <fpage>273</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised crosslingual representation learning at scale</article-title>
          , CoRR abs/
          <year>1911</year>
          .02116 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1911</year>
          .02116. arXiv:
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nangia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <article-title>A broadcoverage challenge corpus for sentence understanding through inference</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , New Orleans, Louisiana,
          <year>2018</year>
          , pp.
          <fpage>1112</fpage>
          -
          <lpage>1122</lpage>
          . URL: https://aclanthology.org/N18-1101. doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>N18</fpage>
          - 1101.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gafà</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cutugno</surname>
          </string-name>
          , M. Venuti, EmotivITA at EVALITA2023:
          <article-title>Overview of the Dimensional and Multidimensional Emotion Analysis Task</article-title>
          , in: M.
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Menini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Polignano</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Russo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Sprugnoli</surname>
          </string-name>
          , G. Venturi (Eds.),
          <source>Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , G. Venturi,
          <year>Evalita 2023</year>
          :
          <article-title>Overview of the 8th evaluation campaign of natural language processing and speech tools for italian, in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>