=Paper= {{Paper |id=Vol-3033/paper25 |storemode=property |title=Sentiment Analysis of Latin Poetry: First Experiments on the Odes of Horace |pdfUrl=https://ceur-ws.org/Vol-3033/paper25.pdf |volume=Vol-3033 |authors=Rachele Sprugnoli,Francesco Mambrini,Marco Passarotti,Giovanni Moretti |dblpUrl=https://dblp.org/rec/conf/clic-it/SprugnoliMPM21 }} ==Sentiment Analysis of Latin Poetry: First Experiments on the Odes of Horace== https://ceur-ws.org/Vol-3033/paper25.pdf
                 Sentiment Analysis of Latin Poetry: First Experiments
                                on the Odes of Horace
        Rachele Sprugnoli, Francesco Mambrini, Marco Passarotti, Giovanni Moretti
                CIRCSE Research Centre, Università Cattolica del Sacro Cuore
                          Largo Agostino Gemelli 1, 20123 Milano
                    {rachele.sprugnoli,francesco.mambrini
              marco.passarotti,giovanni.moretti}@unicatt.it


                         Abstract                                 (1) lead to replicable results, (2) benefit from tech-
                                                                  niques developed for analysing the sentiment con-
    In this paper we present a set of anno-                       veyed by any type of text and (3) be performed
    tated data and the results of a number of                     with freely available lexical and textual resources.
    unsupervised experiments for the analy-                       As for the latter, the research area dedicated to
    sis of sentiment in Latin poetry. More                        building and using linguistic resources for Clas-
    specifically, we describe a small gold stan-                  sical languages has seen a substantial growth dur-
    dard made of eight poems by Horace, in                        ing the last two decades (Sprugnoli and Passarotti,
    which each sentence is labeled manually                       2020). For what concerns SA, we recently built
    for the sentiment using a four-value clas-                    a polarity lexicon for Latin nouns and adjectives,
    sification (positive, negative, neutral and                   called LatinAffectus. The current version of the
    mixed). Then, we report on how this gold                      lexicon includes 4,125 Latin lemmas with their
    standard has been used to evaluate two au-                    corresponding prior polarity value (Sprugnoli et
    tomatic approaches for sentiment classifi-                    al., 2020b). LatinAffectus was developed in the
    cation: one is lexicon-based and the other                    context of the LiLa: Linking Latin project (2018-
    adopts a zero-shot transfer approach.1                        2023)2 (Passarotti et al., 2020) which aims at
                                                                  building a Knowledge Base of linguistic resources
1    Introduction                                                 for Latin based on the Linked Data paradigm,
                                                                  i.e. a collection of several data sets described us-
The task of automatically classifying a (piece of)                ing the same vocabulary of knowledge description
text according to the sentiment conveyed by it,                   and linked together. LatinAffectus is connected to
known as Sentiment Analysis (SA), is usually per-                 the Knowledge Base, thus making it interoperable
formed for purposes such as monitoring contents                   with the other linguistic resources linked so far to
of social media or evaluating customer experi-                    LiLa (Sprugnoli et al., 2020a).
ence, by analysing texts like tweets, comments,                      In this paper we describe the use of LatinAf-
and micro-blogs.                                                  fectus to perform SA of the Odes (Carmina) by
   A still under-investigated yet promising re-                   Horace (65 - 8 BCE). Written between 35 and 13
search area where developing and applying SA                      BCE, the Odes are a collection of lyric poems in
resources and techniques is the study of literary                 four books. Following the models of Greek lyrical
texts written in historical and, particularly, Classi-            poets like Alcaeus, Sappho, and Pindar, the Odes
cal languages (e.g. Ancient Greek and Latin). Ac-                 cover a wide range of topics related to the indi-
tually, investigating the lexical properties of Clas-             vidual and social life in Rome during the age of
sical literary texts is a century-long common prac-               Augustus, like love, friendship, religion, morality,
tice. However, such investigation can nowadays                    patriotism, the uncertainty of life, the cultivation
                                                                  of tranquility and the observance of moderation.
      Copyright © 2021 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In-         In spite of a rather lukewarm initial reception, the
ternational (CC BY 4.0).                                          Odes quickly became a capital source of influence,
    1
      This paper is the result of the collaboration between the   in particular as a model of authorial voice and
four authors. For the specific concerns of the Italian academic
attribution system, Rachele Sprugnoli is responsible for Sec-
tions 2, 3, 4.2, 5; Marco Passarotti is responsible for Section
1; Francesco Mambrini is responsible for Section 4.1. Gio-
                                                                     2
vanni Moretti developed the zero-shot classification script.             https://lila-erc.eu
identity.3 Considering not only the importance of                   As for automatic classification systems, the lit-
the Odes in the history of Latin and European lit-               erature reports both lexicon-based (Bonta and Ja-
erature, but also the diversity of the contents and              nardhan, 2019) and machine learning approaches,
tones of the poems collected therein, we argue that              with a constant increasing use of deep learning
performing SA on such work can lead to interest-                 techniques (Zhang et al., 2018). For example, Mo-
ing results and might represent a use case to open               hanty et al. (2018) experiment with Linear-SVM,
a discussion about the pros and cons of applying                 Naive-Bayes and Logistic Regression classifiers
SA techniques and resources to literary texts writ-              on Odia poems, while Haider et al. (2020) perform
ten in ancient languages.                                        multi-label classification on German stanzas with
   All data presented in this paper are publicly re-             BERT. Given the lack of training data for Latin po-
leased: https://github.com/CIRCSE/La                             etry, in this paper we will instead test unsupervised
tin Sentiment Analysis .                                         approaches.

2    Related Work                                                3     Gold Standard Creation
The majority of linguistic resources and applica-                3.1    Annotation
tions in the field of SA involve non-literary and
non-poetic texts, such as news and user-generated                The Gold Standard (GS) consists of eight ran-
content on the web (Medhat et al., 2014). How-                   domly selected odes,5 two from each of the four
ever, affective information plays a crucial role in              books that make up the work, for a total of 955 to-
literature and, in particular, in poetry where au-               kens, without punctuation, and 44 sentences (aver-
thors try to provoke an emotional response in the                age sentence length: 21, standard deviation: 11).
reader (Johnson-Laird and Oatley, 2016). Anno-                   Texts were taken from the corpus prepared by the
tated corpora of poems and SA systems specifi-                   LASLA laboratory in Liège.6 We performed a
cally designed for poetry are not as numerous as                 single-label annotation of the original Latin text by
those in other areas of research, first of all that              Horace at sentence level. We have chosen the sen-
of social media, but works have been carried out                 tence as unit of annotation because it represents an
for several languages,4 including Arabic (Alsharif               intermediate degree of granularity between that of
et al., 2013), Spanish (Barros et al., 2013), Odia               the verse and that of the stanza. In fact, the limited
(Mohanty et al., 2018), German (Haider et al.,                   length of a verse can hinder the full understanding
2020), Classical Chinese (Hou and Frank, 2015)                   of the sentiment it conveys, while a stanza, being
and, of course, English (Sheng and Uthus, 2020;                  longer, risks to contain very different content and
Sreeja and Mahalakshmi, 2019).                                   thus, potentially, even opposite sentiments. Fur-
   Available annotated corpora of poems differ                   thermore, not all poems can be divided into stan-
from each other from at least four points of view:               zas, as this depends on the metric scheme of the
annotation procedure (either involving experts or                poem. Instead, sentences can be detected in every
using crowdsourcing techniques), unit of analysis                poem regardless of its metric scheme, and repre-
(verse, stanza, whole poem), granularity of classi-              sent a unit of meaning in their own right.
fication (from binary classes, such as positive and                 In the annotation phase, we involved two ex-
negative, to wide sets of emotions), foci of the                 perts in Latin language and literature (A1 and A2)
emotions (annotation of the emotions as depicted                 and another annotator with basic knowledge of
in the text by the author or as felt by the reader).             Latin but provided with previous experience in
With respect to previous work, in this paper we                  sentiment annotation (A3). Annotators were asked
chose to involve experts, to perform annotation at               to identify the sentiment conveyed by each sen-
the sentence level (as an intermediate degree of                 tence in the GS, taking into consideration both the
granularity between verse and stanza), to assign                 vocabulary used by the author and the images that
four generic classes without defining the specific               are evoked in the ode. More specifically, annota-
emotion conveyed by the text, and to focus on the                tors were asked to answer the following question:
sentiment as depicted by the author.                             which of the following classes best describes how
   3                                                                  5
     For an orientation on the vast subject of the fortune and          Book I: odes 10 and 17; Book II: odes 7 and 13; Book
reception of the Odes see Baldo (2012).                          III: odes 13 and 23; Book IV: odes 7 and 11.
   4                                                                  6
     For a recent survey on sentiment and emotion analysis              http://web.philo.ulg.ac.be/lasla/oper
applied to literature, see Kim and Klinger (2018).               a-latina/.
are the emotions conveyed by the poet in the sen-        this sentence as mixed, considering that it is im-
tence under analysis?                                    possible to identify a prevailing emotion between
                                                         the negativity expressed by the verb ‘vietare’ (‘to
   • positive: the only emotions that are con-           forbid’) and the positivity of ‘giorno radioso’ (‘ra-
     veyed at lexical level and the only images that     diant day’). However, the translation of the Latin
     are evoked are positive, or positive emotions       verb rapio is not appropriate: the Italian verb ‘in-
     are clearly prevalent;                              volare’ (‘to steal’) does not convey the idea of the
   • negative: the only emotions that are con-           violent force inherent in rapio, which can be more
     veyed at lexical level and the only images that     correctly translated with the verb ‘to plunder’.7
     are evoked are negative, or negative emotions
     are clearly prevalent;                              3.3      Reconciliation
   • neutral: there are no emotions conveyed             Disagreements were discussed and reconciled by
     by the text;                                        the three annotators: Table 1 presents the num-
                                                         ber of sentences and tokens per sentiment class.
   • mixed: lexicon and evoked images produce
                                                         Our GS includes a majority of positive sentences
     opposite emotions; it is not possible to find a
                                                         (45.4%). Positive (average length: 21, standard
     clearly prevailing emotion.
                                                         deviation: 11), negative (average length: 24, stan-
The annotation of the GS was organized in four           dard deviation: 14), and mixed (average length:
phases. In the first phase, annotators worked            25, standard deviation: 9) sentences are consid-
together collaboratively assigning the sentiment         erably longer than neutral ones (average length:
class to four of the eight odes (21 sentences): the      8, standard deviation: 3). Annotated examples
task was discussed and a common procedure was            are given in Table 2: English translations by
defined. In the second phase, annotators worked          Kaimowitz et al. (2008) are included for clarity.
independently on the other four odes (23 sen-
                                                                                   Sentences       Tokens
tences): A1 and A2 annotated the original Latin
                                                                   positive               20          411
text, while A3 annotated the same odes using an
                                                                   negative               12          292
Italian translation (Horace and Nuzzo, 2009) to
understand how the use of texts not in the origi-                  neutral                 3           23
nal language can alter the annotation of the senti-                mixed                   9          229
ment. In the third phase, we calculated the Inter-                 TOTAL                  44          955
Annotator Agreement, whereas in the last phase                      Table 1: Gold Standard statistics.
disagreements were discussed and reconciled.

3.2   Inter-Annotator Agreement                          4       Experiments
Cohen’s k between A1 and A2 resulted in 0.5,
                                                         4.1      Lexicon-Based Sentiment Analysis
while Fleiss’s k among the three annotators (A1-
A2-A3) resulted in 0.48 (both these results are          The dataset for this experiment is obtained by
considered moderate agreement). In particular, the       means of a simple dictionary lookup of the lem-
negative class proved to be the easiest to be            mas in the LatinAffectus sentiment lexicon. En-
annotated (with a Fleiss’s k of 0.64), followed by       tries in the lexicon are assigned a score of: -1.0,
neutral (0.57) and positive (0.45), whereas              -0.5 (negative polarity), 0 (neutral polarity), +0.5,
mixed was the most problematic class (0.23).             +1.0 (positive polarity). The tokens in the Odes
   We noticed that the Italian translation was           that are lemmatized under lemmas that also have
sometimes misleading, resulting in cases of dis-         an entry in the LatinAffectus are assigned the score
agreement: e.g., the sentence inmortalia ne speres       that is found in the lexicon. For instance, the ad-
monet annus et almum quae rapit hora diem, (ode          jective malus ‘bad’ is found with a polarity value
IV, 7) is translated as ‘speranze di eterno ti vietano   of -1.0 in LatinAffectus. All tokens lemmatized as
gli anni e le ore che involano il giorno radioso’        malus (adj.) are thus given a score of -1.0. Note
(literal translation of the Italian sentence into En-        7
                                                              See for instance the English translation by Kaimowitz et
glish: ‘hopes of eternity forbid you the years and       al. (2008): “Do not hope for what’s immortal, the year warns,
the hours that steal the radiant day’). A3 marked        and the hour which plunders the day”.
 Ode     Sent.   Text                                         Translation                      Class
                                                              Here for you will flow
                 hic tibi copia manabit ad plenum
 1.17    103                                                  abundance from the horn that     positive
                 benigno ruris honorum opulenta cornu
                                                              spills the country’s splendors
                                                              All that you bestow upon
                 cuncta manus auidas fugient
 4.7     549                                                  your heart escapes the greedy    negative
                 heredis amico quae dederis animo
                                                              hands of an heir
                                                              With the Zephyrs cold grows
                 frigora mitescunt Zephyris uer               mild, summer tramples
                 proterit aestas interitura simul             springtime, soon to die,
 2.13    265                                                                                   mixed
                 pomifer autumnus fruges effuderit            once productive autumn pours
                 et mox bruma recurrit iners                  forth its fruits, and shortly
                                                              lifeless winter is back
                                                              Who will Venus name as
 2.7     235     quem Venus arbitrum dicet bibendi                                             neutral
                                                              master of the wine?

                        Table 2: Annotated examples taken from the Gold Standard.


that a score of 0.0 is assigned to both words ex-       negative, 67% neutral, while no correct
pressly annotated as neutral in LatinAffectus and       predictions were given for mixed.
to those that do not have an entry in the lexicon.
   The dictionary lookup required some manual           4.2    Zero-Shot Classification
disambiguation in cases of ambiguity due to ho-         We trained a language model for SA on English
mography. For 18 lemmas (corresponding to 49            and tested it on our GS by relying on two state-
tokens in the Odes), the sentiment lexicon pro-         of-the-art multilingual models. More specifically,
vides multiple values; in most cases, as with ales      we fine-tuned Multilingual BERT (mBERT) (Pires
‘winged’ (adj.), but also ‘bird’ (n.), the variation    et al., 2019) and XLM-RoBERTa (Conneau et al.,
is due to a different polarity attributed to the syn-   2020) with the GoEmotions corpus (Demszky et
tactic uses of the word (in the example, to the ad-     al., 2020) using the Hugging Face’s PyTorch im-
jective and the noun). In such cases, the PoS an-       plementation.8 GoEmotions is a dataset of com-
notation in the LASLA corpus was used to dis-           ments posted on Reddit manually annotated for
ambiguate and assign the correct score. We also         27 emotion categories or Neutral. In order to
reviewed those words that, although not tagged as       adapt this dataset to our needs, we mapped the
nouns or adjectives in LASLA, still yield a match       emotions into sentiment categories as suggested
in LatinAffectus. After revision, we decided to         by the authors themselves. For example, joy and
keep the scores for a series of lemmas annotated        love were converged into a unique positive
as numerals in the corpus (simplex ‘simple, plain’,     class, whereas fear and grief were merged under
primus and primum ‘first’, prius ‘former, prior’)       the same negative class. The neutral cat-
and the indefinite pronoun solus ‘alone, only’ that     egory remained intact and comments annotated
in LatinAffectus are marked as adjectives.              with emotions belonging to opposite sentiments
   A sentence score (S) was computed by sum-            were marked as mixed. Comments labeled with
ming the values of all words. Thus, we attributed       ambiguous emotions (i.e. realization, surprise, cu-
the label positive to all the sentences with            riosity, confusion) were instead left out.9 With this
score S > 0 and negative where S < 0.                   procedure, we built a training set made of 18,617
For S = 0, we attributed neutral to sen-                positive, 10,133 negative, 1,965 neutral and 1,581
tences where all words had a score of 0 and             mixed comments. For fine-tuning, we chose the
mixed where positive and negative words were
                                                          8
equivalent. The overall accuracy of this method             https://huggingface.co/transformers/
is 48% (macro-average F1 37, weighted macro-            index.html
                                                          9
                                                            For the full mapping, please see: https://github
average F1 44) with unbalanced scores among             .com/google-research/google-research/blo
the four classes: 70% for positive, 42% for             b/master/goemotions/data/sentiment mappi
                                                        ng.json.
           Language      Test Set             Genre               mBERT     XLM-RoBERTa
                         GoEmotions           social media        86%       73%
           English       AIT-2018             social media        64%       59%
                         Poem Sentiment       literary - poetry   50%       70%
                         MultiEmotions-It     social media        70%       75%
           Italian
                         AriEmozione          literary - opera    50%       52%
           Latin         Horace GS            literary - poetry   32%       30%

       Table 3: Accuracy of the mono-lingual and cross-lingual (zero-shot) classification method.

                       Lexicon-Based SA       Zero-Shot mBERT        Zero-Shot XML-RoBERTa
                       P     R     F1         P     R    F1          P     R    F1
       positive        0.56 0.70 0.62         0.83 0.25 0.38         1.00 0.10 0.18
       negative        0.62 0.42 0.50         0.75 0.50 0.60         0.53 0.67 0.59
       neutral         0.25 0.67 0.36         0.10 1.00 0.18         0.11 1.00 0.20
       mixed           0.00 0.00 0.00         0.00 0.00 0.00         0.00 0.00 0.00

Table 4: Precision (P), recall (R) and F1-score (F1) for the lexicon-based method and for the zero-shot
classification experiment.


following hyperparameters: 32 for batch size, 2e-5        • MultiEmotions-It: a multi-labeled emotion
for learning rate, 6 epoches, AdamW optimizer.10            dataset made of Italian comments posted on
   We evaluated the trained model on different              YouTube and Facebook (Sprugnoli, 2020).
datasets, including our GS. For each of the follow-         The original emotion labels were converted
ing test sets, we randomly selected 44 texts so to          into our four classes.
have the same number of input data as in our GS:
                                                       Table 3 reports the results of mono-lingual
   • GoEmotions: test set taken from the same
                                                       and cross-lingual classification for the different
     corpus used for training the English model.
                                                       datasets briefly described above and for the two
   • Poem Sentiment: collection of English verses      pre-trained multilingual models. There is no clear
     annotated with the same sentiment classes as      prevalence of one model over the other: results
     in our GS (Sheng and Uthus, 2020).                vary greatly from one dataset to another. On
   • AIT-2018: English data of the emotion clas-       the same language (thus without zero-shot trans-
     sification task of SemEval-2018 Task 1: Af-       fer), we notice a drop in the performance for both
     fect in Tweets (Mohammad et al., 2018).           mBERT and XML-RoBERTa when moving from
     Each tweet is annotated as neutral or as one,     Reddit comments, that is the same type of text
     or more, of eleven emotions. The original an-     as the training data, to tweets, but even more so
     notation was mapped onto our four sentiment       when they are evaluated on poems. As for the
     classes, leaving out ambiguous emotions.          zero-shot classification, results on Italian YouTube
   • AriEmozione: verses taken from 18th cen-          and Facebook comments are better than the ones
     tury Italian opera texts annotated with one       registered on English tweets, but accuracy drops
     or two emotions and the level confidence of       when applied to opera verses. However, the worst
     the annotators (Fernicola et al., 2020). We       results are recorded for Latin with an accuracy
     randomly selected our test set from verses        equal to, or slightly above 30% (for mBERT:
     with high confidence scores, mapping emo-         macro-average F1 29, weighted macro-average F1
     tions onto our four sentiment classes. Since      35; for XML-RoBERTa: macro-average F1 24,
     the dataset does not contain verses annotated     weighted macro-average F1 26). For both mBERT
     with opposite emotions, the class mixed is        and XML-RoBERTa, we register the same trend
     not present in the test set we built.             at class level: perfect accuracy for neutral,
  10
                                                       good accuracy for negative (50% with mBERT
   We adapted the following implementation: https://
gist.github.com/sayakmisra/b0cd67f406b4e               and 67% with XML-RoBERTa), low accuracy for
4d5972f339eb20e64a5.                                   positive (25% with mBERT and 10% with
XML-RoBERTa) and no correct predictions for            should reverses the positive polarity of facundia
mixed.                                                 ‘eloquence’ and pietas ‘devotion’. This problem
                                                       could be mitigated by modifying the script with
5   Conclusions and Future Work                        rules that take into account negations and their fo-
                                                       cus.
In this paper we have presented a new GS, made            Regarding the zero-shot classification approach,
of odes written by Horace, for the annotation of       the very low performances on Latin deserve fur-
sentiment in Latin poetry. The extension of the        ther investigation. It is possible that the problem
manually annotated dataset is one of our future        lies in the data used to build the pre-trained mod-
work: the goal is to have a sufficient amount of       els: i.e., Wikipedia for mBERT and Common-
data to test supervised systems. We have also ex-      crawl for XML-RoBERTa. Both resources were
perimented two different SA approaches that do         developed by relying on automatic language de-
not require training data: both of them are not able   tection engines and are highly noisy due to the
to correctly identify sentences with mixed senti-      presence of languages other than Latin and of
ments, which, in any case, are the most problem-       terms related to modern times. An additional im-
atic also for human annotators. Table 4 reports a      provement may also come from using for fine-
comparison in terms of precision, recall and F1-       tuning an annotated in-domain corpus in a well-
score among the lexicon-based approach and the         resource language, that is a corpus of annotated
zero-shot classification experiments with both the     poems: unfortunately, the currently available cor-
mBERT and the XML-RoBERTa models. The                  pora are not big enough for such purpose.
former performs better on the positive class
whereas the zero-shot method achieves a higher         Acknowledgments
F1-score on the negative one even if this class
                                                       This project has received funding from the Eu-
is not the most frequent in the training data. Both
                                                       ropean Research Council (ERC) under the Euro-
mBERT and XML-RoBERTa obtain a very high
                                                       pean Union’s Horizon 2020 research and innova-
precision on the sentences marked as positive
                                                       tion programme – Grant Agreement No. 769994.
(0.83 and 1.00 respectively) but the recall is ex-
tremely low (0.25 and 0.10 respectively). On the
contrary, for the neutral class, the recall is per-    References
fect (1.00 for both models) but the precision is
                                                       Ouais Alsharif, Deema Alshamaa, and Nada Ghneim.
very low (0.10 and 0.11 respectively).                   2013. Emotion classification in arabic poetry using
   A manual inspection of the output of the              machine learning. International Journal of Com-
lexicon-based method revealed two main prob-             puter Applications, 65(16).
lems of that approach: i) the limited coverage
                                                       Gianluigi Baldo. 2012. Horace (Quintus Horatius
of LatinAffectus and ii) sentiment shifters are not      Flaccus), Carmina. In Christine Walde and Brigitte
properly taken into consideration. As for the first      Egger, editors, Brill’s New Pauly Supplements I -
point, LatinAffectus covers the 43% of nominal           Volume 5 : The Reception of Classical Literature.
and adjectival lemmas in the GS, leaving out lem-        Brill, Amsterdam, October. Publisher: Brill.
mas with a clear sentiment orientation. To over-       Linda Barros, Pilar Rodriguez, and Alvaro Ortigosa.
come this issue, we are currently working on the         2013. Automatic Classification of Literature Pieces
extension of the lexicon with additional 10,000          by Emotion Detection: A Study on Quevedo’s Po-
lemmas. Regarding the sentiment shifters, their          etry. In 2013 Humaine Association Conference
                                                         on Affective Computing and Intelligent Interaction,
impact is exemplified by the following sentence:         pages 141–146. IEEE.
cum semel occideris et de te splendida Minos fe-
cerit arbitria non Torquate genus non te facun-        Venkateswarlu Bonta and Nandhini Kumaresh2and N
                                                         Janardhan. 2019. A comprehensive study on
dia non te restituet pietas (‘When you at last have
                                                         lexicon based approaches for sentiment analysis.
died and Minos renders brillant judgement on your        Asian Journal of Computer Science and Technology,
life, no Torquatus, not birth, not eloquence, not        8(S2):1–6.
your devotion will bring you back.’ - ode IV, 7).
                                                       Alexis Conneau, Kartikay Khandelwal, Naman Goyal,
Here, the sentiment score calculated by the script       Vishrav Chaudhary, Guillaume Wenzek, Francisco
is very positive (3) because it does not handle          Guzmán, Edouard Grave, Myle Ott, Luke Zettle-
the frequent negations: however, the particle non        moyer, and Veselin Stoyanov. 2020. Unsupervised
  cross-lingual representation learning at scale. In       Proceedings of the Eleventh International Confer-
  Proceedings of the 58th Annual Meeting of the Asso-      ence on Language Resources and Evaluation (LREC
  ciation for Computational Linguistics, pages 8440–       2018), Paris, France, may. European Language Re-
  8451, Online, July. Association for Computational        sources Association (ELRA).
  Linguistics.
                                                         Marco Passarotti, Francesco Mambrini, Greta Franzini,
Dorottya Demszky, Dana Movshovitz-Attias, Jeong-          Flavio Massimiliano Cecchini, Eleonora Litta, Gio-
  woo Ko, Alan Cowen, Gaurav Nemade, and Su-              vanni Moretti, Paolo Ruffolo, and Rachele Sprug-
  jith Ravi. 2020. GoEmotions: A Dataset of Fine-         noli. 2020. Interlinking through lemmas. the lexical
  Grained Emotions. In 58th Annual Meeting of the         collection of the LiLa knowledge base of linguis-
  Association for Computational Linguistics (ACL).        tic resources for Latin. Studi e Saggi Linguistici,
                                                          58(1):177–212.
Francesco Fernicola, Shibingfeng Zhang, Federico
  Garcea, Paolo Bonora, and Alberto Barrón-Cedeño.     Telmo Pires, Eva Schlinger, and Dan Garrette. 2019.
  2020. AriEmozione: Identifying Emotions in Opera         How Multilingual is Multilingual BERT? In Pro-
  Verses. In Proceedings of the Seventh Italian            ceedings of the 57th Annual Meeting of the Asso-
  Conference on Computational Linguistics (CLiC-it         ciation for Computational Linguistics, pages 4996–
  2020). Accademia University Press.                       5001.

Thomas Haider, Steffen Eger, Evgeny Kim, Roman           Emily Sheng and David C Uthus. 2020. Investigat-
  Klinger, and Winfried Menninghaus. 2020. PO-             ing societal biases in a poetry composition system.
  EMO: Conceptualization, annotation, and model-           In Proceedings of the Second Workshop on Gender
  ing of aesthetic emotions in German and English          Bias in Natural Language Processing, pages 93–
  poetry. In Proceedings of the 12th Language Re-          106.
  sources and Evaluation Conference, pages 1652–
  1663, Marseille, France, May. European Language        Rachele Sprugnoli and Marco Passarotti, editors. 2020.
  Resources Association.                                   Proceedings of LT4HALA 2020 - 1st Workshop on
                                                           Language Technologies for Historical and Ancient
Horace and Gianfranco Nuzzo. 2009. I quattro libri         Languages, Marseille, France, May. European Lan-
  delle Odi e l’Inno secolare di Quinto Orazio Flacco.     guage Resources Association (ELRA).
  Flaccovio.
                                                         Rachele Sprugnoli, Francesco Mambrini, Giovanni
Yufang Hou and Anette Frank. 2015. Analyzing sen-          Moretti, and Marco Passarotti. 2020a. Towards the
  timent in classical chinese poetry. In Proceedings       Modeling of Polarity in a Latin Knowledge Base. In
  of the 9th SIGHUM Workshop on Language Tech-             Proceedings of the Third Workshop on Humanities
  nology for Cultural Heritage, Social Sciences, and       in the Semantic Web (WHiSe 2020), pages 59–70.
  Humanities (LaTeCH), pages 15–24.                      Rachele Sprugnoli, Marco Passarotti, Daniela Cor-
                                                           betta, and Andrea Peverelli. 2020b. Odi et Amo.
Philip N. Johnson-Laird and Keith Oatley. 2016.
                                                           Creating, Evaluating and Extending Sentiment Lexi-
  Emotions in music, literature, and film. In Lisa
                                                           cons for Latin. In Proceedings of the 12th Language
  Feldman Barrett, Michael Lewis, and Jeannette M.
                                                           Resources and Evaluation Conference, pages 3078–
  Haviland-Jones, editors, Handbook of emotions,
                                                           3086.
  chapter 3, pages 82–97. The Guildford Press.
                                                         Rachele Sprugnoli. 2020. MultiEmotions-it: A new
Jeffrey H Kaimowitz, Ronnie Ancona, et al. 2008. The       dataset for opinion polarity and emotion analysis
   odes of Horace. Johns Hopkins University Press.         for Italian. In Proceedings of the Seventh Italian
                                                           Conference on Computational Linguistics (CLiC-it
Evgeny Kim and Roman Klinger. 2018. A survey on
                                                           2020), pages 402–408. Accademia University Press.
  sentiment and emotion analysis for computational
  literary studies. arXiv preprint arXiv:1808.03137.     PS Sreeja and GS Mahalakshmi. 2019. Perc-an emo-
                                                           tion recognition corpus for cognitive poems. In
Walaa Medhat, Ahmed Hassan, and Hoda Korashy.              2019 International Conference on Communication
  2014. Sentiment analysis algorithms and applica-         and Signal Processing (ICCSP), pages 0200–0207.
  tions: A survey. Ain Shams engineering journal,          IEEE.
  5(4):1093–1113.
                                                         Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep
Saif M. Mohammad, Felipe Bravo-Marquez, Moham-             learning for sentiment analysis: A survey. Wiley In-
  mad Salameh, and Svetlana Kiritchenko. 2018.             terdisciplinary Reviews: Data Mining and Knowl-
  Semeval-2018 Task 1: Affect in tweets. In Proceed-       edge Discovery, 8(4):e1253.
  ings of International Workshop on Semantic Evalu-
  ation (SemEval-2018), New Orleans, LA, USA.

Gaurav Mohanty, Pruthwik Mishra, and Radhika
  Mamidi. 2018. Kabithaa: An annotated corpus of
  odia poems with sentiment polarity information. In