=Paper=
{{Paper
|id=Vol-2769/58
|storemode=property
|title=AriEmozione: Identifying Emotions in Opera Verses
|pdfUrl=https://ceur-ws.org/Vol-2769/paper_58.pdf
|volume=Vol-2769
|authors=Francesco Fernicola,Shibingfeng Zhang,Federico Garcea,Paolo Bonora,Alberto Barrón-Cedeño
|dblpUrl=https://dblp.org/rec/conf/clic-it/FernicolaZGBB20
}}
==AriEmozione: Identifying Emotions in Opera Verses==
AriEmozione: Identifying Emotions in Opera Verses
Francesco Fernicola1 , Shibingfeng Zhang1 , Federico Garcea1
Paolo Bonora2 , and Alberto Barrón-Cedeño1
1
Department of Interpreting and Translation
Università di Bologna, Forlı̀, Italy
2
Department of Classical Philology and Italian Studies
Università di Bologna, Bologna, Italy
{francesco.fernicola, zhang.shibingfeng}@studio.unibo.it
{federico.garcea2, paolo.bonora, a.barron}@unibo.it
Abstract whole (Zoppelli, 2001; McClary, 2012). Be-
ing able to automatically identify the emotions
We present a new task: the identifi- expressed by the different arias of each work
cation of the emotions transmitted in would provide scholars with a useful tool for a
Italian opera arias at the verse level. systematic study of the repertoire. The tech-
This is a relevant problem for the orga- nology to identify the emotion(s) expressed by
nization of the vast repertoire of Ital- an aria represents an effective tool to study
ian Opera arias available and to enable the vast repertoire of arias and characters of
further analyses by both musicologists this period for musicologists and the lay pub-
and the lay public. lic alike. As an aria may express more than
We shape the task as a multi-class su- one emotion, we go one granularity level lower
pervised problem, considering six emo- —at the verse level. The task is defined as
tions: love, joy, admiration, anger, sad- follows:
ness, and fear. In order to address it, Identify the emotion expressed in a verse, in
we manually-annotated an opera cor- the context of an aria.
pus with 2.5k verses —which we re-
lease to the research community— and In order to do that we created the
experimented with different classifica- AriEmozione 1.0 corpus: a collection of 678
tion models and representations. Our operas with 2.5k verses, each of which has
best-performing models reach macro- been manually annotated with respect to emo-
averaged F1 measures of ∼0.45, always tion. We experimented with different super-
considering character 3-grams repre- vised models (e.g., SVMs, neural networks)
sentations. Such performance reflects and text (e.g., character n-grams and dis-
the difficulty of the task at hand, par- tributed representations).
tially caused by the size and nature Our experiments show that, regardless of
of the corpus, which consists of rel- the model, character 3-grams outperform
atively short verses written in 18th- all other representations, reaching weighted
century Italian. macro-averaged F1 measures of ∼0.45. Under-
represented classes (e.g., fear) are the hard-
1 Introduction est to identify. Others, such as anger and
sadness, being both negative, are often con-
Opera lyrics have the function of expressing
fused between each other.
the emotional state of the singing character.
In 17th- and 18th-century operas, characters The rest of the contribution is dis-
brought on stage passions induced in their tributed as follows. Section 2 describes the
souls by the succession of events in the drama. AriEmozione 1.0 corpus. Section 3 describe
Musicological studies use these affects as one the explored models and representations. Sec-
of the interpretative keys of the work as a tion 4 discusses the experiments and obtained
results. Section 5 overviews some related
Copyright ©2020 for this paper by its authors. Use
permitted under Creative Commons License Attribu- work. Section 6 closes with conclusions and
tion 4.0 International (CC BY 4.0). proposals for future work.
First of all, thank you for helping with this work.
ammirazione
We are a group of researchers from the D. of Clas-
tristezza
sical Philology and Italian Studies and the D. of In-
nessuna
rabbia
amore
gioia
paura
total
terpreting and Translation, both at UniBO. Your
work will help us to produce artificial intelligence
models to analyse the lyrics in music.
train 289 274 289 414 503 166 38 1,973
At this stage we are focused on opera. You will an-
dev 36 31 23 84 61 12 3 250
notate arie in Italian from diverse periods, looking
test 37 39 30 64 54 15 11 250
for the emotions that they express. Your work con-
overall 362 344 342 562 618 193 52 2,473
sists of identifying the emotion expressed in each
of the verses composing an aria. You can choose
among six emotions (or none of them), which are Table 1: AriEmozione 1.0 corpus statistics.
defined next: [. . . ]
Each row is divided in six columns: The first level of such tree includes six primary
id A unique id, tied to the verse. Do not modify it. emotions: love, joy, surprise, anger, sadness,
verse A verse, inside of an aria. This is the text
that you are going to analyse. and fear. Based on the nature of the material
emotion Here you can select the expressed emotion under review, we substitute surprise with ad-
(or none of them) miration, ending with the following six classes:
emotion sec. This is available to choose a sec-
ondary emotion, in case it is really difficult to choose Amore (love) incl. affection, lust, longing.
just one Gioia (joy) incl. cheerfulness, zest, content-
confidence Not being 100% sure is ok. If that is
the case, please let us know by choosing the right ment, pride, optimism, enthrallment, relief.
confidence level (default: “I am sure”). Ammirazione (admiration) admiration or
comments Feel free to tell us something about this adoration of someone’s talent, skill, or other
instance, if you feel like.
physical or mental qualities.
Figure 1: Instructions given to the annotators Rabbia (anger) incl. irritability, exaspera-
of the emotions in the AriEmozione 1.0 corpus. tion, rage, disgust, envy, torment.
Tristezza (sadness) incl. suffering, disap-
pointment, shame, neglect, sympathy.
2 The AriEmozione 1.0 Corpus Paura (fear) incl. horror and nervousness.
An extra class nessuna (none) applies mostly
The corpus AriEmozione 1.0 is a subset of
to verses with non-actionable words only, ne-
the materials collected by project CORAGO.1
glected in the current experiments.
AriEmozione 1.0 contains a selection of 678 op-
Two native speakers of Italian annotated all
eras composed between 1655 and 1765. We
2,473 instances independently considering the
consider the lyrical text in the arias only. A.
instructions displayed in Figure 1. They were
Zeno and P. Metastasio are among the most
asked to include (i) the emotion transmitted
represented librettists in the corpus (∼ 30% of
by the verse, (ii) an optional secondary label
the operas); they are two of the most represen-
(in case they perceived a second emotion), and
tative and prolific librettists of the 18th cen-
(iii) their level of confidence: total confidence,
tury. All texts are written in the 18th century
partial confidence, or very doubtful.
Italian and articulated in verses and stanzas.
We measured the Cohen’s kappa inter-
We labeled the emotions transmitted by ev- annotator agreement (Fleiss et al., 1969) at
ery single verse, as we observed that this is the this stage on the primary emotion. The re-
right granularity to obtain full text snippets sult was 32.30, which is considered as a fair
expressing one single emotion. René Descartes agreement. This value results from the per-
wrote in 1649 “Les passions de l’âme”, a sort fect matching between the two annotators in
of compendium of all possible emotions and 44% of the instances. When considering the
their possible causes (Garavaglia, 2018). For secondary emotion as well, the two annota-
the sake of concreteness, we leveraged Par- tors coincided in 68% of the instances. These
rott’s (2001) tree of emotions classification. numbers reflect the complexity of the task.
1
CORAGO is the Repertoire and archive of Ital- The same annotators gathered together to dis-
ian opera librettos. It constitutes the first imple- cuss and consolidate all dubious instances. Ta-
mentation of the RADAMES prototype (Repertori- ble 1 shows the number of instances per class
azione e Archiviazione di Documenti Attinenti al Melo-
dramma E allo Spettacolo) (Pompilio et al., 2005); for each corpus partition: training, develop-
http://corago.unibo.it. ment, and test set. The verse average length
id verse class
ZAP1593570 03 Non ho più lagrime; non ho più voce; non posso piangere; non so parlar Tristezza
I have no more tears; I have no more voice; I cannot cry; I
don’t know how to speak
ZAP1596431 00 Barbaro! Oh dio mi vedi divisa dal mio ben; barbaro, e non concedi ch’io Rabbia
ne dimandi almen
Barbarian! Oh Lord, you see me separated from my own good;
barbarian, you don’t even allow me but one demand
ZAP1593766 01 Guardami e tutto obblio e a vendicarti io volo; di quello sguardo solo io Amore
mi ricorderò
Look at me, all else is forgotten and I haste to avenge you;
only I shall remember that gaze
ZAP1594229 00 Su la pendice alpina dura la quercia antica e la stagion nemica per lei fatal Ammirazione
non è;
Up on the slope of the mountain the ancient oak tree still
lives on, and the adverse season poses no fatal threat
ZAP1596807 00 In questa selva oscura entrai poc’anzi ardito; or nel cammin smarrito timido Paura
errando io vo
I entered this dark forest not too long ago, boldly; having now
lost the path I wander around, shyly
ZAP1599979 01 Vede alfin l’amate sponde, vede il porto, e conforto prende allor di riposar Gioia
Finally, the beloved shores, the harbor, are all in sight and
with them come solace and sleep
Table 2: Instances from the AriEmozione 1.0 corpus, including unique identifier, verse in Italian
and English translation, and class. We include free (unofficial) translations for clarity.
is 72.5 ± 31.6 characters and the corpus con- Model Settings
k-NN L2-Norm exploring with k ∈ [1, . . . 9].
tains 34, 608 (4, 458) tokens (types).2 SVM RBF; both explored with c ∈
Table 2 shows examples of verses in the cor- [1, 10, 100, 1000] and γ ∈ [1e − 3, 1e − 4].
pus, including one of each of the six emotions. Log Reg Multinomial Logistic Regression with
Newton-CG solver.
NN 2 (3) hidden layers with size ∈
3 Models and Representations [32, 64, 96, 128, 256] (∈ [8, 16, 32, 64, 96]);
20% dropout; ReLu for input/hidden lay-
The nature of the corpus —a small amount of ers; softmax for output layer; categor-
ical cross-entropy loss function; Adam;
short verses written in 18th-century Italian—
epochs ∈ [1, . . . 15]
led us to select a humble set of models and FastText 300d embeddings with or without pre-
representation alternatives. The baseline is training; learning rate ∈ [0.3, 0.6, 1];
epochs ∈ [1, 3, 5, 10, . . . , 100]
a k–Nearest Neighbors algorithm (kNN), con-
sidered thanks to its success in classification Table 3: Experimental settings overview.
tasks (Zhang and Zhou, 2007). We also ex-
periment with multi-class SVMs, logistic re-
gression, and neural networks. Regarding the As for the text representations, we consider
latter, we experiment with a number of archi- TF–IDF vectors of both character 3-grams and
tectures with two and three hidden layers. Fi- word 1-grams (no higher n values are consid-
nally, we experiment with a FastText classi- ered due to the corpus dimensions). For pre-
fier (Joulin et al., 2017). Table 3 summarizes processing, we employ the spacy Italian tok-
the explored configurations.3 enizer4 and casefold the texts. We also explore
with dense representations, derived from the
2
The corpus is available at https://zenodo.org/ TF–IDF vectors, by means of both LDA (Hoff-
record/4022318.
3
The code is available at https://github.com/ man et al., 2010) and LSA (Halko et al., 2011).
TinfFoil/AriEmozione. We used Sklearn for the
kNN, SVM, and logistic regression models; Keras org, https://keras.io/, and https://github.com/
for the neural networks, and the Facebook-provided facebookresearch/fastText).
library for FastText (cf. https://scikit-learn. 4
https://spacy.io/models/it
ammirazione
model 10-fold CV test
representation
tristezza
F1 Acc F1 Acc
kNN
rabbia
amore
gioia
paura
char 3-grams 0.38 38.51 0.35 35.15
words 0.36 36.08 0.35 34.73
LDA char 0.30 29.97 0.31 30.54
ammirazione 0.37 0.03 0.18 0.07 0.11 0.06
SVM–RBF
amore 0.03 0.43 0.13 0.00 0.09 0.17
char 3-grams 0.44 43.70 0.43 43.00
gioia 0.27 0.16 0.31 0.20 0.09 0.07
words 0.42 42.00 0.44 44.00
paura 0.10 0.03 0.00 0.40 0.02 0.07
LDA char 0.28 28.00 0.30 30.00
rabbia 0.20 0.14 0.03 0.13 0.64 0.17
Log reg
tristezza 0.17 0.14 0.13 0.07 0.19 0.48
char 3-grams 0.44 45.57 0.42 43.10
words 0.41 43.20 0.41 43.10
LDA char 0.28 30.63 0.29 30.96 Table 5: Confusion matrix for the 2-layers neu-
2-layers NN ral network with TF-IDF character 3-grams.
char 3-grams 0.42 43.61 0.47 46.86
words 0.42 42.91 0.43 43.10
LDA char 0.27 29.56 0.27 31.80 for FastText, on which we test with and with-
3-layers NN
char 3-grams 0.49 41.86 0.40 41.84 out pre-trained embeddings. Notice that we
words 0.47 42.60 0.40 41.84 are not interested in combining features, but
LDA char 0.26 31.41 0.30 31.80
FastText
in observing their performance in isolation.
char 3-grams 0.43 45.00 0.41 42.37 The most promising representation on cross-
pre-trained chars 0.43 47.00 0.41 41.00 validation appears to be the simple charac-
words 0.42 42.56 0.39 44.07
pre-trained words 0.38 41.00 0.40 42.00 ter 3-grams, with which we obtained the best
results across all models; although it also
Table 4: F1 and accuracy on cross- features the highest variability across folds.
validation held-out test for some of the Among all 3-gram derived representations,
model/representation combinations. LDA consistently obtained the worst results
across all models. Still, it is more stable across
folds than the sparse 3-gram representation.
In both cases, we target reductions to 16, 32,
As for fastText, with the same epoch num-
and 64 dimensions. As for embeddings, we
ber and learning rate, the character 3-gram
adopted the pre-trained 300-dimensional Ital-
vectors always achieved much higher accuracy
ian vectors of FastText (Joulin et al., 2017),
than the word vectors.
and tried with character 3-grams and words.
Similar patterns are observed when project-
4 Experiments ing to the unseen test set. The character 3-
grams in general hold the best performance,
We conducted several experiments to find the while the 3-gram LDA tends to remain the
best combination of parameters and represen- worst in spite of the model used. This be-
tations. Given the amount of instances avail- havior does not hold in all cases. For instance,
able, we merged the training and development the logistic regression model achieves F1 =0.44
partitions and performed 10-fold cross valida- on cross-validation, but drops to 0.42 on test.
tion. As standard, the test partition was left This might be the result of over-fitting.
aside and only one prediction was carried out It is worth noting that all models tend to
on it, after identifying the best configurations. confuse rabbia and tristezza. Table 5 shows
We evaluate our models on the basis of accu- the confusion matrix for the best model on
racy and weighted macro-averaged F1 measure test. These two emotions get confused be-
to account for the class imbalance. Table 4 tween each other on an average of 18% of
shows the results obtained with some inter- the cases. The classifiers tend to confuse
esting configurations and representations both ammirazione for gioia as well, which is un-
for the cross-validation and on the test set.5 derstandable given their semantic closeness.
Character and word n-grams TF-IDF, LSA,
and LDA were tested with all models except 5 Related Work
5
The full batch of results is available at Building on the numerous pre-existing stud-
https://docs.google.com/spreadsheets/d/
1Ztjry2mJs6ufCZM1O5CQRyZ8pA5YDnToN0h0NGX1nW0/ ies focusing on sentiment analysis (Ain et al.,
edit?usp=sharing 2017; Shi et al., 2019), some researchers have
been seeking to dig deeper, towards multi-class tristezza, which tend to be confused with
emotion analysis. Most of the work thus far each other, followed by ammirazione, which
has focused on social media (e.g. Twitter). is often confused by gioia. In order to fos-
Bouazizi and Ohtsuki (2016) built a classi- ter the research on this topic, we release the
fier for seven emotions: happiness, sadness, AriEmozione 1.0 corpus to the community (cf.
anger, love, hate, sarcasm and neutral; i.e. footnote 2).
an overlap of five classes with respect to the As for the future work, we intend to in-
ones in ariEmozione. In contrast to our exper- crease the size of the AriEmozione 1.0 corpus
iments, they focused on exploiting the polarity by means of active learning (Yang et al., 2009).
of the words from each instance to be fed to a Once a larger data volume is produced, we
random forest classifier. plan to explore with models to identify the
Balabantaray et al. (2012) tried to distin- emotion at the aria rather than at the verse
guish among happy, sad, anger, disgust, level. Following the theory of emotion pro-
fear and surprise using WordNet Af- posed by Plutchik (1980), we could identify
fect (Valitutti et al., 2004). Given that no the emotion of a whole aria by combining the
Word-net-Affect is currently available for Ital- emotions at the verse level, and then con-
ian, such an approach is unfeasible. duct experiments to verify which granularity
Promising work has been carried out on is more adequate as a single emotion unit. In
news articles (Ye et al., 2012), news head- order to address the issue of emotional poly-
lines (Strapparava and Mihalcea, 2007) and semy and ambiguity of aria verses, we aim at
children’s narrative (Alm et al., 2005). While producing explainable models by highlighting
a lexical-based approach is the most frequent the specific fragments expressing the emotion.
to determine the binary positive vs nega- Another interesting alternative is the one
tive classification, Strapparava and Mihal- highlighted by Zhao and Ma (2019), who
cea (2007) combined a high-dimensional word adopted an efficient meta-learning approach
space produced from word TF-IDF vectors to augment the learning ability of emotion
with a set of seed words to predict the va- distribution; i.e. the intensity values of a
lence of a text exploiting the syntagmatic re- set of emotions within a single sentence,
lations between words. A bottom-up semantic when the training dataset is small, as in the
approach has also been proposed (Seal et al., AriEmozione 1.0 corpus.
2020).
To the best of our knowledge, no work in the
Acknowledgments
field of either emotion or sentiment analysis This research is carried out in the frame-
has been performed on operas. work of CRICC: Centro di Ricerca per
l’interazione con le Industrie Culturali e Cre-
6 Conclusions and Future Work ative dell’Università di Bologna; a POR-FESR
2014-2020 Regione Emilia-Romagna project
We addressed the novel problem of emotion
(https://site.unibo.it/cricc).
classification of opera arias at the verse level.
We thank Ilaria Gozzi and Marco Schillaci,
The task is interesting because of the lack of
students at Università di Bologna, for their
automated tools for the analysis of operas and
support in the manual annotation of the
challenging due to both the language used in
AriEmozione 1.0 corpus.
17th- and 18th-century lyrics and the com-
plexity to produce the necessary amount of
quality supervised data. References
We explored with various classification mod-
Qurat Tul Ain, Mubashir Ali, Amna Riaz, Amna
els and representations. A neural network Noureen, Muhammad Kamran, Babar Hayat,
with two hidden layers fed with a simple and A Rehman. 2017. Sentiment analysis us-
TF-IDF character 3-gram representation is ing deep learning techniques: a review. Int J
among the most promising approaches to the Adv Comput Sci Appl, 8(6):424.
problem. Among the six possible emotions, Cecilia O. Alm, Dan Roth, and Richard Sproat.
the most difficult to identify are rabbia and 2005. Emotions from text: machine learning
for text-based emotion prediction. In Proceed-
ings of Human Language Technology Conference Proceedings - First International Conference on
and Conference on Empirical Methods in Natu- Automated Production of Cross Media Content
ral Language Processing (HLT–EMNLP), pages for Multi-channel Distribution. IEEE.
579–586.
Dibyendu Seal, Uttam K. Roy, and Rohini Basak.
Rakesh C. Balabantaray, Mudasir Mohammad, 2020. Sentence-level emotion detection from
and Nibha Sharma. 2012. Multi-class twitter text based on semantic rules. In Milan Tuba,
emotion classification: A new approach. In- Shyam Akashe, and Amit Joshi, editors, In-
ternational Journal of Applied Information Sys- formation and Communication Technology for
tems, 4(1):48–53. Sustainable Development, pages 423–430, Singa-
pore. Springer Singapore.
Mondher Bouazizi and Tomoaki Ohtsuki. 2016.
Sentiment analysis: From binary to multi-class Yong Shi, Luyao Zhu, Wei Li, Kun Guo, and
classification: A pattern-based approach for Yuanchun Zheng. 2019. Survey on clas-
multi-class sentiment analysis in twitter. In sic and latest textual sentiment analysis ar-
IEEE International Conference on Communica- ticles and techniques. International Journal
tions (ICC), pages 1–6. IEEE. of Information Technology & Decision Making,
18(04):1243–1287.
Joseph L. Fleiss, Jacob Cohen, and B.S. Everitt.
1969. Large sample standard errors of kappa Carlo Strapparava and Rada Mihalcea. 2007.
and weighted kappa. Psychological Bulletin, SemEval-2007 task 14: Affective text. In Pro-
72(5):323–327. ceedings of the Fourth International Workshop
on Semantic Evaluations (SemEval-2007), pages
Andrea Garavaglia. 2018. Funzioni espressive 70–74, Prague, Czech Republic, June. Associa-
dell’aria a metà seicento secondo il ”Giasone” tion for Computational Linguistics.
di Cicognini e Cavalli. Il Saggiatore Musicale,
Anno XXV(1):5–31.
Alessandro Valitutti, Carlo Strapparava, and
Nathan Halko, Per-Gunnar Martinsson, and Joel Oliviero Stock. 2004. Developing affective lexi-
A. Tropp. 2011. Finding structure with ran- cal resources. PsychNology Journal, 2(1).
domness:probabilistic algorithms for construct-
ing approximatematrix decompositions. SIAM Bishan Yang, Jian-Tao Sun, Tengjiao Wang, and
Review, 53(2):217–288. Zheng Chen. 2009. Effective multi-label active
learning for text classification. In Proceedings of
Matthew Hoffman, Francis R. Bach, and David M. the 15th ACM SIGKDD International Confer-
Blei. 2010. Online learning for latent dirichlet ence on Knowledge Discovery and Data Mining,
allocation. In J. D. Lafferty, C. K. I. Williams, KDD ’09, page 917âĂŞ925, New York, NY. As-
J. Shawe-Taylor, R. S. Zemel, and A. Culotta, sociation for Computing Machinery.
editors, Advances in Neural Information Pro-
cessing Systems 23, pages 856–864. Curran As- Lu Ye, Rui-Feng Xu, and Jun Xu. 2012. Emotion
sociates, Inc. prediction of news articles from reader’s per-
spective based on multi-label classification. In
Armand Joulin, Edouard Grave, Piotr Bojanowski, 2012 international conference on machine learn-
and Tomas Mikolov. 2017. Bag of tricks for ing and cybernetics, volume 5, pages 2019–2024.
efficient text classification. In Proceedings of IEEE.
the 15th Conference of the European Chapter of
the Association for Computational Linguistics: Min-Ling Zhang and Zhi-Hua Zhou. 2007. Ml-knn:
Volume 2, Short Papers, pages 427–431. ACL, A lazy learning approach to multi-label learning.
April. Pattern recognition, 40(7):2038–2048.
Susan McClary. 2012. Desire and Pleasure in Zhenjie Zhao and Xiaojuan Ma. 2019. Text
Seventeenth-Century Music. University of Cali- emotion distribution learning from small sam-
fornia Press, Berkeley, CA, 1 edition. ple: A meta-learning approach. In Proceed-
W. Gerrod Parrott. 2001. Emotions in social psy- ings of the 2019 Conference on Empirical Meth-
chology: Essential readings. Psychology Press. ods in Natural Language Processing and the
9th International Joint Conference on Natural
Robert Plutchik. 1980. A general psychoevolu- Language Processing (EMNLP-IJCNLP), pages
tionary theory of emotion. Theories of emotion, 3957–3967, Hong Kong, China, November. As-
1:3–31. sociation for Computational Linguistics.
Angelo Pompilio, Lorenzo Bianconi, Fabio Luca Zoppelli. 2001. Il teatro dell’umane pas-
Regazzi, and Paolo Bonora. 2005. RADAMES: sioni: note sull’antropologia dell’aria secentesca.
A new management approach to opera: Reper- In I luoghi dell’immaginario barocco. Liguori,
tory, archives and related documents. In Napoli, Italia.