<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>EmotivITA at EVALITA2023: Overview of the Dimensional and Multidimensional Emotion Analysis Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanni Gafà</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Cutugno</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Venuti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Catania</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Naples Federico II</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>EmotivITA is the first shared task for Italian Dimensional and Multidimensional Emotion Analysis, aiming to promote research in the field of emotion detection within the Italian language. We developed an Italian dataset annotated following the dimensional model of emotions and invited participants to submit systems to predict Valence, Arousal, and Dominance associated to sentences in the corpus. Five runs were submitted by two teams. We present the dataset, the evaluation methodology, and the approaches of the participating systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;emotion analysis</kwd>
        <kwd>emotion detection</kwd>
        <kwd>VAD model</kwd>
        <kwd>dataset</kwd>
        <kwd>EmoITA</kwd>
        <kwd>EmotivITA</kwd>
        <kwd>Evalita 2023</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>texts as well [6, 7].</p>
      <p>Recently, EA started receiving more and more
attenIn the last two decades, the analysis of emotions that tion as well. Several models of emotion proposed in
people express in texts has become an essential area in psychology have been used in NLP, either categorical or
Natural Language Processing (NLP). Such an interest dimensional. The former consider feelings as discrete,
springs from the awareness of the crucial role feelings and usually identify a small set of basic emotions upon
have in our cognition: being able to detect and eventually which other, more subtle and complex afective states are
simulate them could be a fundamental step to produce built; the widely adopted model conceived by Ekman [8],
human-like forms of artificial intelligence [ 1]. For a re- for instance, proposes six fundamental emotions. The
view on possible applications of Emotion Analysis (EA), latter, on the contrary, describes emotions by
combinranging from stock market predictions to the manage- ing a limited number of independent dimensions in a
ment of catastrophic events, see for example [2]. real-valued vector space. The model proposed by Russel</p>
      <p>Taking into account the somewhat uncertain termi- and Mehrabian [9], probably the best-known, recognizes
nology about human feelings occasionally found in the three dimensions: Valence (measuring pleasure or
disliterature (see below), we start by defining some terms. pleasure), Arousal (degree of excitement or calm), and
Adopting a well known typology of afective states by Dominance (level of control over the situation) – the VAD
Scherer [3, pp. 140–141], we use the word ‘emotion’ to model.
refer to a “relatively brief episode of synchronized re- Categorical models have some advantages over
dimensponses by all or most organismic subsystems to the sional ones, as they allow the identification of several
evaluation of an external or internal event as being of emotions in the same input and usually have simpler
inmajor significance", whereas ‘sentiments’, like Scherer’s terpretations. Nevertheless, they have been criticized for
‘attitudes’, are “relatively enduring, afectively colored their use of culture and language specific labels [ 10];
bebeliefs, preferences, and predispositions toward objects sides, diferent categorical models adopt diferent sets of
or persons". emotions, making it dificult to compare studies.
Concern</p>
      <p>Sentiment analysis has been a major interest for com- ing dimensional models, the independence of the three
putational linguistics for a long time, and, over the years, dimensions is yet to be ascertained [11, 12]; however,
it moved from the prediction of the semantic polarity dimensional models allow easier comparisons between
towards more fine-grained modeling, as is the case in emotions and can describe feelings that are dificult to
Aspect-based Sentiment Analysis [4] and Stance Detec- label.
tion [5]; similar studies have been conducted on Italian At SemEval, the most renowned evaluation campaign
of NLP, the first shared task concerning emotion
detecEVALITA 2023: 8th Evaluation Campaign of Natural Language Pro- tion (for three languages: English, Arabic and Spanish)
*ceCsosirnrgesapnodnSdpinegecahuTthooolrs. for Italian, Sep 7 – 8, Parma, IT was proposed in 2018 [13]. Building on earlier works, a
$ giovanni.gafa@phd.unict.it (G. Gafà); cutugno@unina.it 22,000 tweet dataset was annotated for many diferent
(F. Cutugno); marco.venuti@unict.it (M. Venuti) afect states, following both the categorical and
dimen© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License sional models of emotions (limited to the Valence
dimenCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
sion1); the sub-tasks involved emotion classification and organized into two sub-tasks whose results will be
evaluemotion regression. ated separately:</p>
      <p>Another task of emotion classification was proposed
at SemEval 2019 [14], this time leveraging a dataset con- • Sub-task A: Dimensional emotion regression.
taining roughly 3,000 short conversations annotated for Prediction of Valence, Arousal, and Dominance
the presence of four emotions; the purpose was to study values based on a set of Italian sentences and
and exploit the role of context in facilitating emotion annotations, using only the target annotated
didetection. mension for training – so, for instance, when</p>
      <p>Anyway, EA has not yet received in Italy the same predicting Valence participant systems may only
amount of interest it gained at the international level. use Valence values annotated in the dataset for
This is probably due to the lack of resources annotated training; the same holds for Arousal and
Domifor emotions. After some investigations, we could find nance.
just a few lexica [15, 16, 17]; some are not open to the pub- • Sub-task B: Multidimensional emotion
relic [18] or are quite specific in scope [ 19]; others are the gression. Prediction of Valence, Arousal, and
result of automatic translations from English of existing Dominance values based on a set of Italian
senvocabularies, and have not been re-annotated by Italian tences and annotations, using all mentioned
dispeakers [20, 21]. This situation worsens when it comes mensions for training – so participant systems
to datasets, where to the best of our knowledge only should determine Valence, Arousal, and
Domdomain-specific resources are available [ 22, 23, 24]. An- inance simultaneously, using values from the
other dataset [25] has been proposed at Evalita 2023 [26], three dimensions for training.
containing social media messages about TV shows, TV Both sub-tasks are regression problems, so participating
series, music videos, and advertisements, which had been teams were asked to provide in the output the sentence
labeled following the Plutchik model of emotions [27]. id and three real numbers between 1 and 5, relative to</p>
      <p>As we tried to outline, existing datasets for EA in Ital- the three predicted dimensions. Sub-task B intends to
ian are scarce and quite specialized. Moreover, the emo- study and exploit potential correlations between Valence,
tion formats used for annotating the corpora are uniquely Arousal, and Dominance, which have been discussed in
categorical. Nevertheless, dimensional models are re- the literature (see § 1).
ceiving increasing attention in tasks of emotion detec- Participants could carry out either both sub-tasks or
tion [28, 29]. By proposing the EmotivITA shared task only one of them, even if participation in sub-task A was
at the Evalita 2023 evaluation campaign, we aim at pro- strongly recommended, in order to have a common basis
viding a new, general-purpose resource for EA in Italian, for comparison. Each participating team was allowed
with labeling provided by Italian speakers, EmoITA: a to submit a maximum of 2 runs for each sub-task. All
dataset composed with a genre and domain-balanced runs could be produced according to the ‘constrained’ or
selection of more than 10,000 written sentences, anno- ‘unconstrained’ modality (or both); however, we asked
tated following the dimensional model of emotions; on to specify the type of run. In constrained modality, only
the other hand, we intend to promote dimensional and annotated data distributed by the organizers could be
multidimensional EA in Italian. used for training and tuning the systems. Other
linguis</p>
      <p>The rest of the paper is organized as follows: Section 2 tic resources (e.g., word embeddings and lexicons) were
provides a definition of the task; Section 3 describes the instead allowed. In unconstrained modality, annotated
dataset made available to participants, and the process external data could also be employed and had to be
deof its creation; Section 4 details the oficial evaluation scribed in the system reports.
measures; Section 5 reports the results obtained by
participating teams; Section 6 discusses the results; in Section
7 we draw some conclusions on the outcomes of the task. 3. Dataset</p>
    </sec>
    <sec id="sec-2">
      <title>2. Definition of the task</title>
      <p>The EmotivITA shared task consists of automatically
annotating for emotions in the VAD model a collection of
written sentences from a genre-balanced dataset
translated into Italian. More specifically, the task has been</p>
      <sec id="sec-2-1">
        <title>1As a case in point of inaccuracy when dealing with emotion</title>
        <p>related terms, Valence was regarded as an equivalent of ‘sentiment’
throughout the study.</p>
      </sec>
      <sec id="sec-2-2">
        <title>As mentioned above, the data released for the shared task</title>
        <p>derive from the Italian translation of an existing dataset,
EmoBank [30]. EmoBank is the largest genre-balanced
English dataset annotated employing the VAD model of
emotions. As shown in Table 1, it mainly consists of the
MASC: Manually Annotated Sub-Corpus of the American
National Corpus [31], with roughly 10% of the sentences
coming from the dataset of SemEval-2007 Task 14 [32].</p>
        <p>The 10,062 sentences were originally annotated by
English native speakers according to two diferent
perspectives: the emotion they felt the writer meant to express,
and the emotion evoked in an average reader. Figure 1: The SAM scales for VAD values. Dimensions
(Va</p>
        <p>At first, the Italian version of the dataset was studied lence, Arousal and Dominance) are reported in rows, values
as part of a Master’s degree thesis discussed in 2022 at the (1f9r9o4m. 1 to 5) in columns. Copyright of SAM by Peter J. Lang
Department of Humanities of the University of Catania.</p>
        <p>In this context, the sentences were initially translated
automatically to Italian using the neural machine trans- Table 2
lation service ofered by Microsoft Azure. As we were IAA for the three dimensions in the pilot study.
not satisfied with the results, a manual revision was per- V A D Average
formed splitting the corpus evenly between nine Italian r 0.794 0.552 0.676 0.593
native speakers, researchers in linguistics afiliated with MAE 0.357 0.900 0.583 0.613
Interdepartmental Research Center Urban/Eco at the
University of Naples Federico II.</p>
        <p>We also conducted a pilot study asking two of the par- significantly worse than those obtained with the original
ticipants to independently annotate VAD values from the values from the EmoBank dataset (MAE was between 2
reader’s perspective for a small sample of sentences (150). and 3 times higher, r for Valence and Arousal was
respecWe chose the reader’s perspective because, according to tively 33% and 13% lower). This was probably due to the
Buechel and Hahn, it yields better inter-annotator agree- lack of consistency from having a single annotation for
ment (IAA). For annotation, we used the Self-Assessment a sentence.</p>
        <p>Manikin (SAM), a pictographic scale to assess emotional Moreover, we reviewed the manual revisions of the
response [33, 34] already adopted for EmoBank. SAM translations and found that, in at least half of the cases,
consists of three sets of anthropomorphic cartoons dis- the quality was still poor, either because the translated
playing diferences in Valence, Arousal, and Dominance sentence did not feel natural in Italian or because it
convalues, respectively as shown in Figure 1. tained some kind of error.</p>
        <p>We asked participants to attribute a value between To produce EmoITA, we resolved to start over the
en1 (minimum Valence, Arousal, and Dominance) and 5 tire process, only keeping the approximately 5,000
trans(maximum Valence, Arousal, and Dominance), with 4 lations we considered good enough. This time, we chose
intermediate steps of 0.5. This results in a 9-point scale 16 students from the Master’s Degree in Foreign
Lanlike the one originally proposed by Bradley and Lang guages at the University of Catania. All of them are
Ital(Buechel and Hahn preferred a 5-point scale). Instruc- ian native speakers and are specializing in English. The
tions were adapted from those used for EmoBank and sentences were split among the participants: we asked
are available for further analysis upon request. to revise the 5,000 translations we kept from previous</p>
        <p>To measure IAA we used Pearson’s correlation coefi- work and to propose new translations for the rest of the
cient (r) and Mean Absolute Error (MAE), as other metrics corpus. The same group of subjects also labeled each
Itallike Cohen’s k are not designed for scale variables (see ian sentence, and we took care never to ask a participant
§ 4). We obtained encouraging scores in both measures to annotate a sentence he had translated. Overall, we
for all three dimensions, with an average of 0.593 for obtained 7 diferent annotations for each sentence, and
r, indicating a large efect (see Table 2). Therefore, we we judge the quality of translation is now satisfactory if
decided to ask all participants to annotate the remaining not perfect.
sentences individually (one annotator per sentence). We To evaluate the annotations, we proceeded similarly
then used the new labeling to fine-tune several models to the original EmoBank study: we calculated r and MAE
of transformers for dimensional EA, but the scores were between each individual series of annotations and the
aggregated values in EmoITA, and then averaged those
values for each dimension (see Table 3).</p>
        <p>The values of r indicate a large efect in every
dimension, particularly for Valence. Correlation is a little higher
in Dominance than in Arousal, as per our pilot study: this
is somewhat unusual, as in most research we analyzed
regarding the English language the opposite is true. MAE
is not as good, but still acceptable (10% of the 1-5 scale).</p>
        <p>Overall, scores are in line with those of EmoBank (r=
0.634 and MAE= 0.386, on average). They could probably
get better analyzing outliers and excluding some of the
annotations whose disagreement is particularly strong, a
process we have not yet started at this time.</p>
        <p>For the shared task the dataset was randomly split into
a development and a test set of 8,000 and 2,062 sentences
respectively (79.5% and 20.5%), taking care to preserve the
genre distribution in the corpus (with a 1% tolerance). The
development set was provided as a UTF-8, CSV
commaseparated file, reporting the following fields:
standard annotations of the test set. Both constrained
and unconstrained runs for a sub-task are reported in the
same ranking, but we specify the type of the run.</p>
        <p>Evaluation metrics for both sub-tasks are the standard
metrics known in the literature for emotion regression
that we already mentioned throughout this paper: we
measure IAA based on r and MAE. The first metric
estimates linear dependence between two series of data
points: x = 1, ...,  and y = 1, ..., . In our case, x
corresponds to the values annotated in our dataset for
each dimension and y to those predicted by participant
systems. The formula for r is as follows:
r(x, y) := √︀∑︀
=1( − ¯) 2√︀∑︀</p>
        <p>=1( − ¯) 2
∑︀</p>
        <p>=1( − ¯)(  − ¯)
where ¯ and ¯ are respectively the mean value of x and
y.</p>
      </sec>
      <sec id="sec-2-3">
        <title>MAE is a measure of errors between a couple of obser</title>
        <p>vations describing the same phenomenon (in this case
the annotated values of a certain emotional dimension in
the dataset, and those predicted). The formula for MAE
is as follows:</p>
        <p>MAE(x, y) :=
1 ∑︁ | − |
 =1
(1)
(2)</p>
      </sec>
      <sec id="sec-2-4">
        <title>The baselines for both sub-tasks have been built fine</title>
        <p>tuning to a regression a BERT model available on
HuggingFace2, with a learning rate of 1e-05.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Results</title>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation Measures</title>
      <sec id="sec-4-1">
        <title>The two sub-tasks are evaluated separately comparing results obtained by participant systems with the gold</title>
      </sec>
      <sec id="sec-4-2">
        <title>The teams of the EmotivITA challenge were invited to de</title>
        <p>scribe their solution in a technical report; in this section</p>
      </sec>
      <sec id="sec-4-3">
        <title>2https://huggingface.co/dbmdz/bert-base-italian-xxl-uncased, last</title>
        <p>access 06-20-2023.</p>
        <p>See Table 4 for a couple of examples.</p>
        <p>The test set followed the same format, but labels for
Valence, Arousal and Domination were not provided.
where:</p>
      </sec>
      <sec id="sec-4-4">
        <title>We received submissions from two teams. Both of them</title>
        <p>participated to sub-task B, and only one to sub-task A.</p>
        <p>In total, 5 runs were submitted, constrained and
unconid, text, V, A, D strained. In Table 5 we report the results for r and MAE in
sub-task A, in Table 6 those relative to sub-task B, along
with our baselines. We appended a sufix to distinguish
1. ‘id’ denotes the unique identifier of the sentence the ID of the submitted run and another one to identify
constrained (‘_C_’) and unconstrained (‘_U_’) runs.
2. ‘text’ denotes the text of the sentence
Regarding sub-task A, the ISTC-CNR team obtained
3. ‘V’ denotes the average Valence value annotated the best r score in the Valence dimension with his second
for the sentence. run. Anyway, our baseline had better results in every
4. ‘A’ denotes the average Arousal value annotated other dimension and metric.</p>
        <p>for the sentence. Concerning sub-task B, the team extremITA achieved
5. ‘D’ denotes the average Dominance value anno- the best results in all metrics and dimensions with their
tated for the sentence. second run, with the exception of Arousal and
Dominance’s r, where our baseline performed slightly better.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Discussion</title>
      <p>we compare participant systems based on their architec- was used for every task in EVALITA 2023. The second
tures. architecture is a Decoder that adopts instruction-tuning,</p>
      <p>The ISTC-CNR team proposed a method based on Natu- based on a large language model, the LLaMA [39]. The
ral Language Inference (NLI). More specifically, they used model was trained using Low-Rank Adaptation [40] on
a multilingual MNLI-XML-RoBERTa model grounded on Italian translations of the instructions originally
develXML-RoBERTa [35], which was fine-tuned on a version of oped for Alpaca [41], which also builds on LLaMA. It
the MNLI dataset [36] automatically translated to Italian. was then fine-tuned using instructions specific to the
adThe model was adapted for the regression task replacing dressed EVALITA task. In the case of EmotivITA sub-task
its last linear layer. During training, sentences from the B, the sentence from the EmoITA dataset was paired with
EmoITA dataset were used as premises. Then, for sub- a prompt in the form of the instruction: “Scrivi quanta
task A, three diferent models were conceived, with three valenza è espressa in questo testo su una scala da 1 a 5,
diferent prompts acting as hypotheses for the NLI pro- seguito da quanto stimolo è espresso in questo testo su
cess and targeting the VAD dimensions. The prompt for una scala da 1 a 5, seguito da quanto controllo è espresso
Valence was “quanta positività esprime la frase?" (how in questo testo in una scala da 1 a 5" (Rate how much
much positivity does the sentence convey?), the one for valence is expressed in this text on a scale from 1 to 5,
Arousal “quanto è eccitante la frase?" (how exciting is the followed by how much arousal is expressed in this text on
sentence?) and the one for Domination “quanto è con- a scale from 1 to 5, followed by how much dominance is
trollata l’emozione" (how controlled is the emotion?). For expressed in this text on a scale from 1 to 5). This second
sub-task B, a single model was used adopting the prompt model obtained generally better performance than the
“valence, arousal, dominance dell’emozione?" (valence, first one as showcased in Table 6, but it also demanded
arousal, and dominance of the emotion?). The two runs 144 hours of training (on the entire EVALITA dataset),
submitted for sub-task A difer in that the first one only whereas the one based on IT5 only required 12 hours.
utilized 99% of the training set made available, while the Quite interestingly the model proposed by the
ISTCsecond one utilized it entirely. As we can see in Table 5, CNR team and the second one proposed by the extremITA
the results were better with this last configuration. The team both leverage prompting in natural language and
only run submitted by the team for sub-task B exploited no task-specific architectural designs (with the exception
the entire training set. All runs were produced according of the replacement of the last layer in the
MNLI-XMLto the constrained modality. RoBERTa model), proving the eficacy of this approach.</p>
      <p>The extremITA team only participated to sub-task B, On the other hand, one could argue that the main
limitawith two unconstrained runs. Both their models were tions of the ISTC-CNR method was precisely the chosen
trained on the union of all the datasets in the shared prompts, as concepts like Valence, Arousal and
Domitasks at EVALITA 2023. The first one adopts an Encoder- nance are not easy to describe. When evaluating the
Decoder architecture based on IT5 [37], a T5 model [38] extremITA proposal, instead, one could wonder about
pre-trained on Italian texts. The model was fine-tuned the sustainability of a 144 hours training process.
concatenating the name of the shared task as a prefix, Anyway, we observe that the baselines obtained
finefollowed by an input sentence from the EmoITA dataset. tuning the BERT model were not outperformed by the
The output, in the case of the EmotivITA task, was con- proposed systems: maybe the upper limit for the
regresstituted by the predicted VAD values. A similar approach sion problem with such a large dataset as EmoITA has
been reached, at least for the moment. It is also worth
mentioning that the scores are in line with those of the
study representing the state-of-the-art [42] for the
original English dataset, EmoBank, that obtained values of
0.838, 0.573 and 0.536 for r in the three dimensions.</p>
      <p>One last remark is due; neither team explored the
possible relations between the three emotion dimensions,
which was actually one of the purposes of sub-task B,
and remains as a subject for future studies.</p>
    </sec>
    <sec id="sec-6">
      <title>7. Conclusion</title>
      <sec id="sec-6-1">
        <title>We presented the first shared task on Dimensional and</title>
        <p>Multidimensional Emotion Analysis for Italian and
discussed the development of the first dedicated Italian
dataset, EMoITA, based on the VAD model. EmoITA was
obtained by manual translation and annotation of the
EmoBank dataset, performed by Italian native speakers.
The participating systems leveraged NLI, the
EncoderDecoder architecture and Large Language Models to
address the regression problems, obtaining results that are
similar to those of the state-of-the-art for the English
counterpart of the dataset.</p>
        <p>We hope that the proposal of our task and the
availability of a new Italian dataset for EA will foster studies in
this relevant field of NLP. In this spirit, the development
and test set, as well as the complete dataset (licensed
under CC-BY-SA 4.0), the script used for the baselines
and for evaluation will be made available to the public
soon; more details on EmotivITA can be found on the
task website3.</p>
      </sec>
      <sec id="sec-6-2">
        <title>3Repository: https://github.com/GiovanniGafa/EmoITA. Website:</title>
        <p>https://sites.google.com/view/emotivita</p>
        <p>3758/s13428-012-0314-x. english words, in: Proceedings of The Annual
Con[13] S. Mohammad, F. Bravo-Marquez, M. Salameh, ference of the Association for Computational
LinS. Kiritchenko, SemEval-2018 task 1: Afect in guistics (ACL), Melbourne, Australia, 2018.
tweets, in: Proceedings of the 12th Interna- [22] Celli, Fabio, Riccardi, Giuseppe, Ghosh, Aridam,
tional Workshop on Semantic Evaluation, Associa- CorEA: Italian news corpus with emotions and
tion for Computational Linguistics, New Orleans, agreement, in: Proceedings of the First Italian
ConLouisiana, 2018, pp. 1–17. URL: https://aclanthology. ference on Computational Linguistics CLiC-it 2014
org/S18-1001. doi:10.18653/v1/S18-1001. and of the Fourth International Workshop EVALITA
[14] A. Chatterjee, K. N. Narahari, M. Joshi, P. Agrawal, 2014 9-11 December 2014, Pisa, PISA UNIVERSITY
SemEval-2019 task 3: EmoContext contextual emo- PRESS, 2014. URL: http://clic2014.fileli.unipi.it/
tion detection in text, in: Proceedings of the 13th proceedings/Proceedings-CLICit-2014.pdf. doi:10.
International Workshop on Semantic Evaluation, 12871/CLICIT2014120.</p>
        <p>Association for Computational Linguistics, Min- [23] Z. Shibingfeng, F. Francesco, G. Federico, B.-C.
Alneapolis, Minnesota, USA, 2019, pp. 39–48. URL: berto, B. Paolo, P. Angelo, AriEmozione2.0, 2022.
https://aclanthology.org/S19-2005. doi:10.18653/ URL: https://zenodo.org/record/7097913. doi:10.
v1/S19-2005. 5281/ZENODO.7097913.
[15] O. Araque, L. Gatti, J. Staiano, M. Guerini, De- [24] R. Sprugnoli, MultiEmotions-It: a New Dataset
pecheMood++: A Bilingual Emotion Lexicon Built for Opinion Polarity and Emotion Analysis for
ItalThrough Simple Yet Powerful Techniques, IEEE ian, in: J. Monti, F. dell’Orletta, F. Tamburini (Eds.),
Transactions on Afective Computing 13 (2022) 496– Proceedings of the Seventh Italian Conference on
507. URL: https://ieeexplore.ieee.org/document/ Computational Linguistics, CLiC-it 2020, Bologna,
8798675/. doi:10.1109/TAFFC.2019.2934444. Italy, March 1-3, 2021, volume 2769 of CEUR
Work[16] M. Montefinese, E. Ambrosini, B. Fairfield, N. Mam- shop Proceedings, CEUR-WS.org, Torino, 2020. URL:
marella, The adaptation of the Afective Norms for http://ceur-ws.org/Vol-2769/paper_08.pdf. doi:10.
English Words (ANEW) for Italian, Behavior Re- 4000/books.aaccademia.8910.
search Methods 46 (2014) 887–903. URL: https://link. [25] O. Araque, S. Frenda, D. Nozza, V. Patti, R.
Sprugspringer.com/10.3758/s13428-013-0405-3. doi:10. noli, Emit at evalita2023: Overview of the
categori3758/s13428-013-0405-3. cal emotion detection in italian social media task, in:
[17] L. Passaro, L. Pollacci, A. Lenci, ItEM: A Vector M. Lai, S. Menini, M. Polignano, V. Russo, R.
SprugSpace Model to Bootstrap an Italian Emotive Lexi- noli, G. Venturi (Eds.), Proceedings of the Eighth
con, Second Italian Conference on Computational Evaluation Campaign of Natural Language
ProcessLinguistics CLiC-it 2015 II (2015). ing and Speech Tools for Italian. Final Workshop
[18] A. Bolioli, F. Salamino, V. Porzionato, Social Media (EVALITA 2023), CEUR.org, Parma, Italy, 2023.</p>
        <p>Monitoring in Real Life with Blogmeter Platform, [26] M. Lai, S. Menini, M. Polignano, V. Russo, R.
Sprugin: C. Battaglino, C. Bosco, E. Cambria, R. Damiano, noli, G. Venturi, Evalita 2023: Overview of the 8th
V. Patti, P. Rosso (Eds.), Proceedings of the First In- evaluation campaign of natural language
processternational Workshop on Emotion and Sentiment in ing and speech tools for italian, in: Proceedings
Social and Expressive Media: approaches and per- of the Eighth Evaluation Campaign of Natural
Lanspectives from AI (ESSEM 2013) A workshop of the guage Processing and Speech Tools for Italian. Final
XIII International Conference of the Italian Associ- Workshop (EVALITA 2023), CEUR.org, Parma, Italy,
ation for Artificial Intelligence (AI*IA 2013), Turin, 2023.</p>
        <p>Italy, December 3, 2013, volume 1096 of CEUR Work- [27] R. Plutchik, A General Psychoevolutionary Theory
shop Proceedings, CEUR-WS.org, 2013, pp. 156–163. of Emotion, in: Theories of Emotion, Elsevier, 1980,
URL: http://ceur-ws.org/Vol-1096/paper12.pdf. pp. 3–33. URL: https://linkinghub.elsevier.com/
[19] E. Borelli, D. Crepaldi, C. A. Porro, C. Cac- retrieve/pii/B9780125587013500077. doi:10.1016/
ciari, The psycholinguistic and afec- B978-0-12-558701-3.50007-7.
tive structure of words conveying pain, [28] R. Mukherjee, A. Naik, S. Poddar, S. Dasgupta,
PLOS ONE 13 (2018) e0199658. URL: N. Ganguly, Understanding the role of afect
https://dx.plos.org/10.1371/journal.pone.0199658. dimensions in detecting emotions from tweets:
doi:10.1371/journal.pone.0199658. A multi-task approach, CoRR abs/2105.03983
[20] S. M. Mohammad, P. D. Turney, Crowdsourcing a (2021). URL: https://arxiv.org/abs/2105.03983.
word-emotion association lexicon, Computational arXiv:2105.03983.</p>
        <p>Intelligence 29 (2013) 436–465. [29] J. Wang, L.-C. Yu, K. R. Lai, X. Zhang, Dimensional
[21] S. M. Mohammad, Obtaining reliable human rat- sentiment analysis using a regional CNN-LSTM
ings of valence, arousal, and dominance for 20,000 model, in: Proceedings of the 54th Annual
Meeting of the Association for Computational Linguis- https://aclanthology.org/N18-1101. doi:10.18653/
tics (Volume 2: Short Papers), Association for Com- v1/N18-1101.
putational Linguistics, Berlin, Germany, 2016, pp. [37] G. Sarti, M. Nissim, IT5: Large-scale text-to-text
225–230. URL: https://aclanthology.org/P16-2037. pretraining for italian language understanding and
doi:10.18653/v1/P16-2037. generation, ArXiv preprint 2203.03759 (2022). URL:
[30] S. Buechel, U. Hahn, EmoBank: Studying the Im- https://arxiv.org/abs/2203.03759.
pact of Annotation Perspective and Representa- [38] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang,
tion Format on Dimensional Emotion Analysis, in: M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the
M. Lapata, P. Blunsom, A. Koller (Eds.), Proceed- limits of transfer learning with a unified text-to-text
ings of the 15th Conference of the European Chap- transformer, 2020. arXiv:1910.10683.
ter of the Association for Computational Linguis- [39] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A.
tics, EACL 2017, Valencia, Spain, April 3-7, 2017, Lachaux, T. Lacroix, B. Rozière, N. Goyal, E.
HamVolume 2: Short Papers, Association for Compu- bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave,
tational Linguistics, 2017, pp. 578–585. URL: http: G. Lample, Llama: Open and eficient foundation
//aclweb.org/anthology/E17-2092. doi:10.18653/ language models, 2023. arXiv:2302.13971.
v1/E17-2092. [40] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu,
[31] N. Ide, C. Baker, C. Fellbaum, C. Fillmore, R. Passon- Y. Li, S. Wang, L. Wang, W. Chen, Lora:
Lowneau, MASC: the manually annotated sub-corpus rank adaptation of large language models, 2021.
of American English, in: Proceedings of the Sixth arXiv:2106.09685.</p>
        <p>International Conference on Language Resources [41] R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li,
and Evaluation (LREC’08), European Language Re- C. Guestrin, P. Liang, T. B. Hashimoto, Stanford
alsources Association (ELRA), Marrakech, Morocco, paca: An instruction-following llama model, https:
2008. URL: http://www.lrec-conf.org/proceedings/ //github.com/tatsu-lab/stanford_alpaca, 2023.
lrec2008/pdf/617_paper.pdf. [42] S. Park, J. Kim, S. Ye, J. Jeon, H. Y. Park, A. Oh,
[32] C. Strapparava, R. Mihalcea, SemEval-2007 task Dimensional emotion detection from categorical
14: Afective text, in: Proceedings of the Fourth emotion, in: Proceedings of the 2021
ConInternational Workshop on Semantic Evaluations ference on Empirical Methods in Natural
Lan(SemEval-2007), Association for Computational Lin- guage Processing, Association for Computational
guistics, Prague, Czech Republic, 2007, pp. 70–74. Linguistics, Online and Punta Cana,
DominiURL: https://aclanthology.org/S07-1013. can Republic, 2021, pp. 4367–4380. URL: https:
[33] M. M. Bradley, P. J. Lang, Measuring emotion: The //aclanthology.org/2021.emnlp-main.358. doi:10.
self-assessment manikin and the semantic diferen- 18653/v1/2021.emnlp-main.358.
tial, Journal of Behavior Therapy and Experimental
Psychiatry 25 (1994) 49–59. URL: https://linkinghub.
elsevier.com/retrieve/pii/0005791694900639. doi:10.</p>
        <p>1016/0005-7916(94)90063-9.
[34] P. J. Lang, Behavioral treatment and bio-behavioral
assessment: Computer applications, in: J. B.
Sidowski, J. H. Johnson, T. A. Williams (Eds.),
Technology in mental health care delivery systems,
Norwood, NJ: Ablex Publishing, 1980, pp. 119–137.
[35] A. Conneau, K. Khandelwal, N. Goyal, V.
Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised
crosslingual representation learning at scale, CoRR
abs/1911.02116 (2019). URL: http://arxiv.org/abs/
1911.02116. arXiv:1911.02116.
[36] A. Williams, N. Nangia, S. Bowman, A
broadcoverage challenge corpus for sentence
understanding through inference, in: Proceedings of the
2018 Conference of the North American Chapter
of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long
Papers), Association for Computational Linguistics,
New Orleans, Louisiana, 2018, pp. 1112–1122. URL:</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>