AriEmozione: Identifying Emotions in Opera Verses Francesco Fernicola1 , Shibingfeng Zhang1 , Federico Garcea1 Paolo Bonora2 , and Alberto Barrón-Cedeño1 1 Department of Interpreting and Translation Università di Bologna, Forlı̀, Italy 2 Department of Classical Philology and Italian Studies Università di Bologna, Bologna, Italy {francesco.fernicola, zhang.shibingfeng}@studio.unibo.it {federico.garcea2, paolo.bonora, a.barron}@unibo.it Abstract whole (Zoppelli, 2001; McClary, 2012). Be- ing able to automatically identify the emotions We present a new task: the identifi- expressed by the different arias of each work cation of the emotions transmitted in would provide scholars with a useful tool for a Italian opera arias at the verse level. systematic study of the repertoire. The tech- This is a relevant problem for the orga- nology to identify the emotion(s) expressed by nization of the vast repertoire of Ital- an aria represents an effective tool to study ian Opera arias available and to enable the vast repertoire of arias and characters of further analyses by both musicologists this period for musicologists and the lay pub- and the lay public. lic alike. As an aria may express more than We shape the task as a multi-class su- one emotion, we go one granularity level lower pervised problem, considering six emo- —at the verse level. The task is defined as tions: love, joy, admiration, anger, sad- follows: ness, and fear. In order to address it, Identify the emotion expressed in a verse, in we manually-annotated an opera cor- the context of an aria. pus with 2.5k verses —which we re- lease to the research community— and In order to do that we created the experimented with different classifica- AriEmozione 1.0 corpus: a collection of 678 tion models and representations. Our operas with 2.5k verses, each of which has best-performing models reach macro- been manually annotated with respect to emo- averaged F1 measures of ∼0.45, always tion. We experimented with different super- considering character 3-grams repre- vised models (e.g., SVMs, neural networks) sentations. Such performance reflects and text (e.g., character n-grams and dis- the difficulty of the task at hand, par- tributed representations). tially caused by the size and nature Our experiments show that, regardless of of the corpus, which consists of rel- the model, character 3-grams outperform atively short verses written in 18th- all other representations, reaching weighted century Italian. macro-averaged F1 measures of ∼0.45. Under- represented classes (e.g., fear) are the hard- 1 Introduction est to identify. Others, such as anger and sadness, being both negative, are often con- Opera lyrics have the function of expressing fused between each other. the emotional state of the singing character. In 17th- and 18th-century operas, characters The rest of the contribution is dis- brought on stage passions induced in their tributed as follows. Section 2 describes the souls by the succession of events in the drama. AriEmozione 1.0 corpus. Section 3 describe Musicological studies use these affects as one the explored models and representations. Sec- of the interpretative keys of the work as a tion 4 discusses the experiments and obtained results. Section 5 overviews some related Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribu- work. Section 6 closes with conclusions and tion 4.0 International (CC BY 4.0). proposals for future work. First of all, thank you for helping with this work. ammirazione We are a group of researchers from the D. of Clas- tristezza sical Philology and Italian Studies and the D. of In- nessuna rabbia amore gioia paura total terpreting and Translation, both at UniBO. Your work will help us to produce artificial intelligence models to analyse the lyrics in music. train 289 274 289 414 503 166 38 1,973 At this stage we are focused on opera. You will an- dev 36 31 23 84 61 12 3 250 notate arie in Italian from diverse periods, looking test 37 39 30 64 54 15 11 250 for the emotions that they express. Your work con- overall 362 344 342 562 618 193 52 2,473 sists of identifying the emotion expressed in each of the verses composing an aria. You can choose among six emotions (or none of them), which are Table 1: AriEmozione 1.0 corpus statistics. defined next: [. . . ] Each row is divided in six columns: The first level of such tree includes six primary id A unique id, tied to the verse. Do not modify it. emotions: love, joy, surprise, anger, sadness, verse A verse, inside of an aria. This is the text that you are going to analyse. and fear. Based on the nature of the material emotion Here you can select the expressed emotion under review, we substitute surprise with ad- (or none of them) miration, ending with the following six classes: emotion sec. This is available to choose a sec- ondary emotion, in case it is really difficult to choose Amore (love) incl. affection, lust, longing. just one Gioia (joy) incl. cheerfulness, zest, content- confidence Not being 100% sure is ok. If that is the case, please let us know by choosing the right ment, pride, optimism, enthrallment, relief. confidence level (default: “I am sure”). Ammirazione (admiration) admiration or comments Feel free to tell us something about this adoration of someone’s talent, skill, or other instance, if you feel like. physical or mental qualities. Figure 1: Instructions given to the annotators Rabbia (anger) incl. irritability, exaspera- of the emotions in the AriEmozione 1.0 corpus. tion, rage, disgust, envy, torment. Tristezza (sadness) incl. suffering, disap- pointment, shame, neglect, sympathy. 2 The AriEmozione 1.0 Corpus Paura (fear) incl. horror and nervousness. An extra class nessuna (none) applies mostly The corpus AriEmozione 1.0 is a subset of to verses with non-actionable words only, ne- the materials collected by project CORAGO.1 glected in the current experiments. AriEmozione 1.0 contains a selection of 678 op- Two native speakers of Italian annotated all eras composed between 1655 and 1765. We 2,473 instances independently considering the consider the lyrical text in the arias only. A. instructions displayed in Figure 1. They were Zeno and P. Metastasio are among the most asked to include (i) the emotion transmitted represented librettists in the corpus (∼ 30% of by the verse, (ii) an optional secondary label the operas); they are two of the most represen- (in case they perceived a second emotion), and tative and prolific librettists of the 18th cen- (iii) their level of confidence: total confidence, tury. All texts are written in the 18th century partial confidence, or very doubtful. Italian and articulated in verses and stanzas. We measured the Cohen’s kappa inter- We labeled the emotions transmitted by ev- annotator agreement (Fleiss et al., 1969) at ery single verse, as we observed that this is the this stage on the primary emotion. The re- right granularity to obtain full text snippets sult was 32.30, which is considered as a fair expressing one single emotion. René Descartes agreement. This value results from the per- wrote in 1649 “Les passions de l’âme”, a sort fect matching between the two annotators in of compendium of all possible emotions and 44% of the instances. When considering the their possible causes (Garavaglia, 2018). For secondary emotion as well, the two annota- the sake of concreteness, we leveraged Par- tors coincided in 68% of the instances. These rott’s (2001) tree of emotions classification. numbers reflect the complexity of the task. 1 CORAGO is the Repertoire and archive of Ital- The same annotators gathered together to dis- ian opera librettos. It constitutes the first imple- cuss and consolidate all dubious instances. Ta- mentation of the RADAMES prototype (Repertori- ble 1 shows the number of instances per class azione e Archiviazione di Documenti Attinenti al Melo- dramma E allo Spettacolo) (Pompilio et al., 2005); for each corpus partition: training, develop- http://corago.unibo.it. ment, and test set. The verse average length id verse class ZAP1593570 03 Non ho più lagrime; non ho più voce; non posso piangere; non so parlar Tristezza I have no more tears; I have no more voice; I cannot cry; I don’t know how to speak ZAP1596431 00 Barbaro! Oh dio mi vedi divisa dal mio ben; barbaro, e non concedi ch’io Rabbia ne dimandi almen Barbarian! Oh Lord, you see me separated from my own good; barbarian, you don’t even allow me but one demand ZAP1593766 01 Guardami e tutto obblio e a vendicarti io volo; di quello sguardo solo io Amore mi ricorderò Look at me, all else is forgotten and I haste to avenge you; only I shall remember that gaze ZAP1594229 00 Su la pendice alpina dura la quercia antica e la stagion nemica per lei fatal Ammirazione non è; Up on the slope of the mountain the ancient oak tree still lives on, and the adverse season poses no fatal threat ZAP1596807 00 In questa selva oscura entrai poc’anzi ardito; or nel cammin smarrito timido Paura errando io vo I entered this dark forest not too long ago, boldly; having now lost the path I wander around, shyly ZAP1599979 01 Vede alfin l’amate sponde, vede il porto, e conforto prende allor di riposar Gioia Finally, the beloved shores, the harbor, are all in sight and with them come solace and sleep Table 2: Instances from the AriEmozione 1.0 corpus, including unique identifier, verse in Italian and English translation, and class. We include free (unofficial) translations for clarity. is 72.5 ± 31.6 characters and the corpus con- Model Settings k-NN L2-Norm exploring with k ∈ [1, . . . 9]. tains 34, 608 (4, 458) tokens (types).2 SVM RBF; both explored with c ∈ Table 2 shows examples of verses in the cor- [1, 10, 100, 1000] and γ ∈ [1e − 3, 1e − 4]. pus, including one of each of the six emotions. Log Reg Multinomial Logistic Regression with Newton-CG solver. NN 2 (3) hidden layers with size ∈ 3 Models and Representations [32, 64, 96, 128, 256] (∈ [8, 16, 32, 64, 96]); 20% dropout; ReLu for input/hidden lay- The nature of the corpus —a small amount of ers; softmax for output layer; categor- ical cross-entropy loss function; Adam; short verses written in 18th-century Italian— epochs ∈ [1, . . . 15] led us to select a humble set of models and FastText 300d embeddings with or without pre- representation alternatives. The baseline is training; learning rate ∈ [0.3, 0.6, 1]; epochs ∈ [1, 3, 5, 10, . . . , 100] a k–Nearest Neighbors algorithm (kNN), con- sidered thanks to its success in classification Table 3: Experimental settings overview. tasks (Zhang and Zhou, 2007). We also ex- periment with multi-class SVMs, logistic re- gression, and neural networks. Regarding the As for the text representations, we consider latter, we experiment with a number of archi- TF–IDF vectors of both character 3-grams and tectures with two and three hidden layers. Fi- word 1-grams (no higher n values are consid- nally, we experiment with a FastText classi- ered due to the corpus dimensions). For pre- fier (Joulin et al., 2017). Table 3 summarizes processing, we employ the spacy Italian tok- the explored configurations.3 enizer4 and casefold the texts. We also explore with dense representations, derived from the 2 The corpus is available at https://zenodo.org/ TF–IDF vectors, by means of both LDA (Hoff- record/4022318. 3 The code is available at https://github.com/ man et al., 2010) and LSA (Halko et al., 2011). TinfFoil/AriEmozione. We used Sklearn for the kNN, SVM, and logistic regression models; Keras org, https://keras.io/, and https://github.com/ for the neural networks, and the Facebook-provided facebookresearch/fastText). library for FastText (cf. https://scikit-learn. 4 https://spacy.io/models/it ammirazione model 10-fold CV test representation tristezza F1 Acc F1 Acc kNN rabbia amore gioia paura char 3-grams 0.38 38.51 0.35 35.15 words 0.36 36.08 0.35 34.73 LDA char 0.30 29.97 0.31 30.54 ammirazione 0.37 0.03 0.18 0.07 0.11 0.06 SVM–RBF amore 0.03 0.43 0.13 0.00 0.09 0.17 char 3-grams 0.44 43.70 0.43 43.00 gioia 0.27 0.16 0.31 0.20 0.09 0.07 words 0.42 42.00 0.44 44.00 paura 0.10 0.03 0.00 0.40 0.02 0.07 LDA char 0.28 28.00 0.30 30.00 rabbia 0.20 0.14 0.03 0.13 0.64 0.17 Log reg tristezza 0.17 0.14 0.13 0.07 0.19 0.48 char 3-grams 0.44 45.57 0.42 43.10 words 0.41 43.20 0.41 43.10 LDA char 0.28 30.63 0.29 30.96 Table 5: Confusion matrix for the 2-layers neu- 2-layers NN ral network with TF-IDF character 3-grams. char 3-grams 0.42 43.61 0.47 46.86 words 0.42 42.91 0.43 43.10 LDA char 0.27 29.56 0.27 31.80 for FastText, on which we test with and with- 3-layers NN char 3-grams 0.49 41.86 0.40 41.84 out pre-trained embeddings. Notice that we words 0.47 42.60 0.40 41.84 are not interested in combining features, but LDA char 0.26 31.41 0.30 31.80 FastText in observing their performance in isolation. char 3-grams 0.43 45.00 0.41 42.37 The most promising representation on cross- pre-trained chars 0.43 47.00 0.41 41.00 validation appears to be the simple charac- words 0.42 42.56 0.39 44.07 pre-trained words 0.38 41.00 0.40 42.00 ter 3-grams, with which we obtained the best results across all models; although it also Table 4: F1 and accuracy on cross- features the highest variability across folds. validation held-out test for some of the Among all 3-gram derived representations, model/representation combinations. LDA consistently obtained the worst results across all models. Still, it is more stable across folds than the sparse 3-gram representation. In both cases, we target reductions to 16, 32, As for fastText, with the same epoch num- and 64 dimensions. As for embeddings, we ber and learning rate, the character 3-gram adopted the pre-trained 300-dimensional Ital- vectors always achieved much higher accuracy ian vectors of FastText (Joulin et al., 2017), than the word vectors. and tried with character 3-grams and words. Similar patterns are observed when project- 4 Experiments ing to the unseen test set. The character 3- grams in general hold the best performance, We conducted several experiments to find the while the 3-gram LDA tends to remain the best combination of parameters and represen- worst in spite of the model used. This be- tations. Given the amount of instances avail- havior does not hold in all cases. For instance, able, we merged the training and development the logistic regression model achieves F1 =0.44 partitions and performed 10-fold cross valida- on cross-validation, but drops to 0.42 on test. tion. As standard, the test partition was left This might be the result of over-fitting. aside and only one prediction was carried out It is worth noting that all models tend to on it, after identifying the best configurations. confuse rabbia and tristezza. Table 5 shows We evaluate our models on the basis of accu- the confusion matrix for the best model on racy and weighted macro-averaged F1 measure test. These two emotions get confused be- to account for the class imbalance. Table 4 tween each other on an average of 18% of shows the results obtained with some inter- the cases. The classifiers tend to confuse esting configurations and representations both ammirazione for gioia as well, which is un- for the cross-validation and on the test set.5 derstandable given their semantic closeness. Character and word n-grams TF-IDF, LSA, and LDA were tested with all models except 5 Related Work 5 The full batch of results is available at Building on the numerous pre-existing stud- https://docs.google.com/spreadsheets/d/ 1Ztjry2mJs6ufCZM1O5CQRyZ8pA5YDnToN0h0NGX1nW0/ ies focusing on sentiment analysis (Ain et al., edit?usp=sharing 2017; Shi et al., 2019), some researchers have been seeking to dig deeper, towards multi-class tristezza, which tend to be confused with emotion analysis. Most of the work thus far each other, followed by ammirazione, which has focused on social media (e.g. Twitter). is often confused by gioia. In order to fos- Bouazizi and Ohtsuki (2016) built a classi- ter the research on this topic, we release the fier for seven emotions: happiness, sadness, AriEmozione 1.0 corpus to the community (cf. anger, love, hate, sarcasm and neutral; i.e. footnote 2). an overlap of five classes with respect to the As for the future work, we intend to in- ones in ariEmozione. In contrast to our exper- crease the size of the AriEmozione 1.0 corpus iments, they focused on exploiting the polarity by means of active learning (Yang et al., 2009). of the words from each instance to be fed to a Once a larger data volume is produced, we random forest classifier. plan to explore with models to identify the Balabantaray et al. (2012) tried to distin- emotion at the aria rather than at the verse guish among happy, sad, anger, disgust, level. Following the theory of emotion pro- fear and surprise using WordNet Af- posed by Plutchik (1980), we could identify fect (Valitutti et al., 2004). Given that no the emotion of a whole aria by combining the Word-net-Affect is currently available for Ital- emotions at the verse level, and then con- ian, such an approach is unfeasible. duct experiments to verify which granularity Promising work has been carried out on is more adequate as a single emotion unit. In news articles (Ye et al., 2012), news head- order to address the issue of emotional poly- lines (Strapparava and Mihalcea, 2007) and semy and ambiguity of aria verses, we aim at children’s narrative (Alm et al., 2005). While producing explainable models by highlighting a lexical-based approach is the most frequent the specific fragments expressing the emotion. to determine the binary positive vs nega- Another interesting alternative is the one tive classification, Strapparava and Mihal- highlighted by Zhao and Ma (2019), who cea (2007) combined a high-dimensional word adopted an efficient meta-learning approach space produced from word TF-IDF vectors to augment the learning ability of emotion with a set of seed words to predict the va- distribution; i.e. the intensity values of a lence of a text exploiting the syntagmatic re- set of emotions within a single sentence, lations between words. A bottom-up semantic when the training dataset is small, as in the approach has also been proposed (Seal et al., AriEmozione 1.0 corpus. 2020). To the best of our knowledge, no work in the Acknowledgments field of either emotion or sentiment analysis This research is carried out in the frame- has been performed on operas. work of CRICC: Centro di Ricerca per l’interazione con le Industrie Culturali e Cre- 6 Conclusions and Future Work ative dell’Università di Bologna; a POR-FESR 2014-2020 Regione Emilia-Romagna project We addressed the novel problem of emotion (https://site.unibo.it/cricc). classification of opera arias at the verse level. We thank Ilaria Gozzi and Marco Schillaci, The task is interesting because of the lack of students at Università di Bologna, for their automated tools for the analysis of operas and support in the manual annotation of the challenging due to both the language used in AriEmozione 1.0 corpus. 17th- and 18th-century lyrics and the com- plexity to produce the necessary amount of quality supervised data. References We explored with various classification mod- Qurat Tul Ain, Mubashir Ali, Amna Riaz, Amna els and representations. A neural network Noureen, Muhammad Kamran, Babar Hayat, with two hidden layers fed with a simple and A Rehman. 2017. Sentiment analysis us- TF-IDF character 3-gram representation is ing deep learning techniques: a review. Int J among the most promising approaches to the Adv Comput Sci Appl, 8(6):424. problem. Among the six possible emotions, Cecilia O. Alm, Dan Roth, and Richard Sproat. the most difficult to identify are rabbia and 2005. Emotions from text: machine learning for text-based emotion prediction. In Proceed- ings of Human Language Technology Conference Proceedings - First International Conference on and Conference on Empirical Methods in Natu- Automated Production of Cross Media Content ral Language Processing (HLT–EMNLP), pages for Multi-channel Distribution. IEEE. 579–586. Dibyendu Seal, Uttam K. Roy, and Rohini Basak. Rakesh C. Balabantaray, Mudasir Mohammad, 2020. Sentence-level emotion detection from and Nibha Sharma. 2012. Multi-class twitter text based on semantic rules. In Milan Tuba, emotion classification: A new approach. In- Shyam Akashe, and Amit Joshi, editors, In- ternational Journal of Applied Information Sys- formation and Communication Technology for tems, 4(1):48–53. Sustainable Development, pages 423–430, Singa- pore. Springer Singapore. Mondher Bouazizi and Tomoaki Ohtsuki. 2016. Sentiment analysis: From binary to multi-class Yong Shi, Luyao Zhu, Wei Li, Kun Guo, and classification: A pattern-based approach for Yuanchun Zheng. 2019. Survey on clas- multi-class sentiment analysis in twitter. In sic and latest textual sentiment analysis ar- IEEE International Conference on Communica- ticles and techniques. International Journal tions (ICC), pages 1–6. IEEE. of Information Technology & Decision Making, 18(04):1243–1287. Joseph L. Fleiss, Jacob Cohen, and B.S. Everitt. 1969. Large sample standard errors of kappa Carlo Strapparava and Rada Mihalcea. 2007. and weighted kappa. Psychological Bulletin, SemEval-2007 task 14: Affective text. In Pro- 72(5):323–327. ceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages Andrea Garavaglia. 2018. Funzioni espressive 70–74, Prague, Czech Republic, June. Associa- dell’aria a metà seicento secondo il ”Giasone” tion for Computational Linguistics. di Cicognini e Cavalli. Il Saggiatore Musicale, Anno XXV(1):5–31. Alessandro Valitutti, Carlo Strapparava, and Nathan Halko, Per-Gunnar Martinsson, and Joel Oliviero Stock. 2004. Developing affective lexi- A. Tropp. 2011. Finding structure with ran- cal resources. PsychNology Journal, 2(1). domness:probabilistic algorithms for construct- ing approximatematrix decompositions. SIAM Bishan Yang, Jian-Tao Sun, Tengjiao Wang, and Review, 53(2):217–288. Zheng Chen. 2009. Effective multi-label active learning for text classification. In Proceedings of Matthew Hoffman, Francis R. Bach, and David M. the 15th ACM SIGKDD International Confer- Blei. 2010. Online learning for latent dirichlet ence on Knowledge Discovery and Data Mining, allocation. In J. D. Lafferty, C. K. I. Williams, KDD ’09, page 917âĂŞ925, New York, NY. As- J. Shawe-Taylor, R. S. Zemel, and A. Culotta, sociation for Computing Machinery. editors, Advances in Neural Information Pro- cessing Systems 23, pages 856–864. Curran As- Lu Ye, Rui-Feng Xu, and Jun Xu. 2012. Emotion sociates, Inc. prediction of news articles from reader’s per- spective based on multi-label classification. In Armand Joulin, Edouard Grave, Piotr Bojanowski, 2012 international conference on machine learn- and Tomas Mikolov. 2017. Bag of tricks for ing and cybernetics, volume 5, pages 2019–2024. efficient text classification. In Proceedings of IEEE. the 15th Conference of the European Chapter of the Association for Computational Linguistics: Min-Ling Zhang and Zhi-Hua Zhou. 2007. Ml-knn: Volume 2, Short Papers, pages 427–431. ACL, A lazy learning approach to multi-label learning. April. Pattern recognition, 40(7):2038–2048. Susan McClary. 2012. Desire and Pleasure in Zhenjie Zhao and Xiaojuan Ma. 2019. Text Seventeenth-Century Music. University of Cali- emotion distribution learning from small sam- fornia Press, Berkeley, CA, 1 edition. ple: A meta-learning approach. In Proceed- W. Gerrod Parrott. 2001. Emotions in social psy- ings of the 2019 Conference on Empirical Meth- chology: Essential readings. Psychology Press. ods in Natural Language Processing and the 9th International Joint Conference on Natural Robert Plutchik. 1980. A general psychoevolu- Language Processing (EMNLP-IJCNLP), pages tionary theory of emotion. Theories of emotion, 3957–3967, Hong Kong, China, November. As- 1:3–31. sociation for Computational Linguistics. Angelo Pompilio, Lorenzo Bianconi, Fabio Luca Zoppelli. 2001. Il teatro dell’umane pas- Regazzi, and Paolo Bonora. 2005. RADAMES: sioni: note sull’antropologia dell’aria secentesca. A new management approach to opera: Reper- In I luoghi dell’immaginario barocco. Liguori, tory, archives and related documents. In Napoli, Italia.