Annotating Shakespeare’s Sonnets with Appraisal Theory to Detect Irony Nicolò Busetto Rodolfo Delmonte Department of Linguistic Studies Department of Linguistic Studies Ca Foscari University Ca Foscari University Ca Bembo - Venezia Ca Bembo - Venezia 830070@stud.unive.it delmont@unive.it Abstract stata poi raccolta automaticamente e con- frontata con il gold standard per verificare English. In this paper we propose an ap- la persistenza di certi schemi che possono proach to irony detection based on Ap- essere identificati come ironici, satirici o praisal Theory(Martin and White(2005)) sarcastici, raggiungendo una corrispon- in Shakespeare’s Sonnets, a well-known denza finale del 80%. data set that is statistically valuable. In order to produce meaningful experiments, we created a gold standard by collecting 1 Introduction opinions from famous literary critics on Shakespeare’s Sonnets focusing on irony. Shakespeare’s Sonnets are a collection of 154 po- We started by manually annotating the ems which is renowned for being full of ironic data using Appraisal Theory as a refer- content (Weiser(1983)), (Weiser(1987)) and for its ence theory. This choice is motivated by ambiguity thus sometimes reverting the overall in- the fact that Appraisal annotation schemes terpretation of the sonnet. Lexical mbiguity, i.e. allow smooth evaluation of highly elab- a word with several meanings, emanates from the orated texts like political commentaries. way in which the author uses words that can be The annotation is then automatically com- interpreted in more ways not only because inher- piles and checked against the gold stan- ently polysemous, but because sometimes the ad- dard in order to verify the persistence of ditional meaning meaning they evoke can some- certain schemes that can be identified as times be derived on the basis of the sound, i.e. ironic, satiric or sarcastic. Upon observa- homophone (see “eye”, “I” in sonnet 152). The tion, irony detection reaches a final match sonnets are also full of metaphors which many of 80%1 . times requires contextualising the content to the historical Elizabethan life and society. Further- Italiano. In questo articolo si propone un more, there is an abundance of words related to approccio basato sulla Appraisal Theory specific language domains in the sonnets. For in- per l’individuazione dell’ironia nei Sonetti stance, there are words related to the language of di Shakespeare, un dataset che è statistica- economy, war, nature and to the discoveries of the mente valido. Allo scopo di produrre es- modern age, and each of these words may be used perimenti significativi, abbiamo creato un as a metaphor of love. Many of the sonnets are gold standard raccogliendo le opinioni di organized around a conceptual contrast, an oppo- famosi critici letterari sullo stesso corpus, sition that runs parallel and then diverges, some- con l’ironia come tema. Abbiamo poi an- times with the use of the rhetorical figure of the notato manualmente i sonetti utilizzando chiasmus. It is just this contrast that generates gli strumenti e i tratti della Appraisal The- irony, sometimes satire, sarcasm, and even par- ory che permettono di ottenere una valu- ody. Irony may be considered in turn as: what tazione di testi altamente elaborati come one means using language that normally signifies gli articoli di politica. L’annotazione è the opposite, typically for humorous or emphatic 1 effect; a state of affairs or an event that seems Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 contrary to what one expects and is amusing as International (CC BY 4.0) a result. As to sarcasm this may be regarded the use of irony to mock or convey contempt. Par- THEME: One against many ACTION: Young ody is obtained by using the words or thoughts man urged to reproduce METAPHOR: of a person but adapting them to a ridiculously Through progeny the young man will not be inappropriate subject. There are several types of alone NEG.EVAL: The young man seems irony, though we select verbal irony which, in the to be disinterested POS.EVAL: Young man strict sense, is saying the opposite of what you positive aesthetic evaluation CONTRAST: mean for outcome, and it depends on the extra- Between one and many linguistics context(Attardo(1994)). As a result, Satire and Irony are slightly overlapping but con- • SONNET 21 stitute two separate techniques; eventually Sar- SEQUENCE: 18-86 Time and Immortal- casm can be regarded as a specialization or a sub- ity MAIN THEME: Love ACTION: The set of Irony. It is important to remark that in many Young man must understand the sincerity cases, these linguistic structures may require the of poet’s love METAPHOR: True love is use of nonliteral or figurative language, i.e. the use sincere NEG.EVAL: The young man listens of metaphors. This has been carefully taken into the false praise made by others POS.EVAL: account when annotating the sonnets by means Young Man positive aesthetic evaluation of Appraisal Theory Framework (hence ATF). In CONTRAST: Between true and fictitious love our approach we will follow the so-called incon- gruity presumption or incongruity-resolution pre- As can be seen, we indicate SEQUENCE for sumption. Theories connected to the incongruity the thematic sequence into which the sonnet is in- presumption are mostly cognitive-based and re- cluded; this is followed by MAIN THEME which lated to concepts highlighted for instance, in (At- is the theme the sonnet deals with; ACTION re- tardo(2000)). The focus of theorization under this ports the possible action proposed by the poet presumption is that in humorous texts, or broadly to the protagonist of the poem; METAPHOR is speaking in any humorous situation, there is an op- the main metaphor introduced in the poem some- position between two alternative dimensions. As a times using words from a specialized domain; result, we will look for contrast in our study of the NEG.EVAL and POS.EVAL stand for Negative sonnets, produced by the contents of manual clas- Evaluation and Positive Evaluation contained in sification. The purpose of this study is to show the poem in relation to the theme and the protag- how ATF can be useful for detecting irony, con- onist(s); finally, CONTRAST is the key to signal sidering its ambiguity and its elusive traits. presence of opposing concrete or abstract concepts used by Shakespeare to reinforce the arguments 2 Producing the Gold Standard purported in the poem. Many sonnets have re- ceived more than one possible pragmatic category. In order to produce a gold standard that may en- This is due to the difficulty in choosing one cate- compass strong hints to classification in terms of gory over another. In particular, it has been par- humour as explained above, we collected literary ticular hard to distinguish Irony from Satire, and critics’ reviews of the sonnets. We used criticism Irony from Sarcasm. Overall, we ended up with 54 from a set of authors including (Frye(1957)) sonnets receiving a double marking over 98, rep- (Calimani(2009)) (Melchiori(1971)) (Ea- resenting the total number of sonnets with some gle(1916)) (Marelli(2015)) (Schoenfeldt(2010)) kind of pragmatic label by the literary critics, with (Weiser(1987)) (Serpieri(2002)) all listed in the a ratio of 98/154, corresponding to a percentage of reference section. The gold standard classification 63.64%. We ended up with the count of annotated has been produced by second author and checked sonnets reported above in Table 1. by first author. It is organized into a number of separate fields in a sequence to allow the Eventually, as commented in the section be- reader to get a better picture of the sonnet in the low, the introduction of annotations based on Ap- collection. All classifications are reported in a praisal Theory has helped in choosing best prag- supplementary file in the Appendix. Here below matic classification. In fact, literary critics were are the classifications for two sonnets: simply hinting at "irony" or "satire", but the anno- tation gave us a precise measure of the level of • SONNET 8 contrast present in each of the sonnets regarded SEQUENCE: 1-17 Procreation MAIN generically as "ironic". • Judgement is any kind of ethical evaluation of Table 1: Final distribution of sonnets in the 5 prag- human behaviour, (e.g. good/bad), and con- matic categories siders the ethical evaluation on people and Type Quantity their behaviours. Blank 57 Irony 73 • Appreciation is every aesthetic or functional Satire 20 evaluation of things, processes and state of Parody 4 affairs (e.g. beautiful/ugly; useful/useless), Sarcasm 47 and represent any aesthetic evaluation of Duplicated 54 things, both man-made and natural phenom- ena. 2.1 Appraisal Theory for Poetry and Eventually, we end up with six different classes: Literary Texts Affect positive, Affect Negative, Judgement Pos- itive, Judgement Negative, Appreciation Positive, The experiment we have been working on is an Appreciation Negative. Overall in the annotation attempt to describe irony, parody and sarcasm in there is a total majority of positive polarities with terms of a strict scientifically viable linguistic the- a ratio of 0.511, in comparison to negative anno- ory, the Appraisal Framework Theory (Martin and tations with a ratio of 0.488. In short, the whole White(2005)), as has already been done in the past of the positive poles is 607, and the totality of the by other authors (see (Taboada and Grieve(2004)) negative poles is 579 for a total number of 1186 (Read and Carrol(2012)) but also (Stingo and Del- annotations. Judgement is the more interesting monte(2016)) (Delmonte and Marchesini(2017)) . category because it allows social moral sanction, The idea is as follows: produce a complete anno- in that it refers to two subfields, Social Esteem tation of the sonnets using the tools made avail- and Social Sanction - which however we decided able by the theory and then verify how well it fits not to mark. In particular, whereas the positive into the gold standard produced. The primary pur- polarity annotation of Judgement extends to Ad- pose of the Appraisal Framework Theory(hence miration and Praise, the negative polarity annota- AFT) is to delineate the interpersonal dimension tion deals with Criticism and Condemnation or So- of communication, supplying schemes by which cial Esteem and Social Sanction (see (Martin and it is possible to recognize evaluative sequences White(2005)), p.52). In particular, Judgement is within texts and information about the positioning found mainly in the final couplet of the sonnets. of the author in relation to evaluated targets.2 The annotation work on the texts has been The annotation has been organized around only accomplished by first author and checked by one category, Attitude, and its direct subcate- second author. Given the level of objective gories, in order to keep the annotation at a more difficulty in understanding the semantic content workable level, and to optimize time and space in of the sonnets, we have decided not to resort to the XML annotation. Attitude includes different additional annotators - second author produced options for expressing positive or negative evalua- the annotation as part of his Master thesis work. tion, and expresses the author’s feelings. The main So far, we have not been able to produce a mea- category is divided into three primary fields with sure for interannotator agreement: however, since their relative positive or negative polarity, namely: I was obliged to correct 35% of all annotations that measure could be approximated by 65% of • Affect is every emotional evaluation of agreement. The tags we used for the annotation things, processes or states of affairs, (e.g. include a tag for contains the whole text like/dislike), it describes proper feelings and of the sonnet;

to mark stanzas, and any emotional reaction within the text aimed to mark lines. Focusing on the annotation of towards human behaviour/process and phe- the evaluative sequences instead, every time we nomena. found an evaluative word (or sequence of words), 2 we delimited the item/phrase within the tags Further information can be found on the dedicated website dedicated to the Appraisal Framework Theory: . Subsequently, following the http://www.languageofevaluation.info/appraisal/ general indications mentioned above provided by (Martin and White(2005)), we assigned one of In the choice of which and how many items the three subcategories – affect, judgement and to annotate, we adopted the following linguistic appreciation – as an attribute of the tag , criteria to enhance the notational analysis. also providing the positive/negative sentiment orientation as a value of the attribute. Here below • Semantic criteria: we show the annotation for Sonnet 40 which is Anytime one or more verb/noun modifiers are highly contrasted: found, when they do not represent meaning- ful evaluation by themselves, they are anno-

Take all my loves, my love, they contribute to modify. Any instance of yea take them all, What hast thou then evaluation of a multiword expression, is an- more than thou hadst before? No love, notated as a single appraisal unit. Any in- mylove,that stance of evaluation of rhetorical or figurative thou mayst language, is annotated as a single appraisal truelove call, All mine was unit. When possible, the evaluations are em- thine, before thou hadst this more: bedded so as to include appraisal units into a

Then if for mylove,thou mylovereceivest, goges, rhetorical questions, interjections and I cannotblamethee, for • Syntactic Criteria: mylovethou Without exceeding the length of the propo- usest, But yetbe blamed,if single appraisal unit up until a clause-level, thou thy selfdeceivest tions. Additionally, for those cases where Bywilfultaste limited ourselves to the annotation of the of what thy selfrefusest

quence, so as to avoid overproduction of

I long annotation. Again, when possible, do forgivethy robbery gentle the evaluation on a clause-level in greater de- thief textbfAlthoughthou steal thee quences on a clause level even beyond the all my poverty: punctuation marks limits. However, these an- And yet love knows it is agreater grief of items, whenever they share the same at- Tobear tribute and the same polarity orientation, they love’s wrong,thanhate’s known injury. case of more than three items in a row that

Lasciviousgrace, orientation, they were annotated separately. in whom all ill well As to interpretation criteria, we assumed that shows, Kill me withspitesyet we must not be justified by the fact that a high level of Negative foes.

Judgements accompanied by Positive Apprecia- tions or Affect is by itself interpretable as the in- tention to provoke a sarcastic mood. As a final result, there are 44 sonnets that present the highest Table 3: Quantitative data for six appraisal classes contrast and are specifically classified according for sonnets with lowest contrast to the six classes above (see Figure 1 in the Ap- pendix). There is also a group that contains am- Classes Sum Mean St.Dev. biguous sonnets which have been classified with Appr.Pos 139 5.346 18.821 a double class, mainly by Irony and Sarcasm. As Appr.Neg 65 2.5 8.844 a first remark, in all these sonnets, negative polar- Affct.Pos 64 2.462 8.708 ity is higher than positive polarity with the excep- Affct.Neg 81 3.115 11.009 tion of sonnet 106. In other words, if we consider Judgm.Pos 59 2.269 8.029 this annotation as the one containing the highest Judgm.Neg 37 1.423 5.047 levels of Judgement, we come to the conclusion that possible Sarcasm reading is mostly associated with presence of Judgement Negative and in gen- Table 4: Quantitative data for six appraisal classes eral with high Negative polarity annotations (see for sonnets with no contrast table 2 below). As a first result, we may notice Classes Sum Mean St.Dev. a very high convergence existing between critics’ Appr.Pos 88 3.034 1.269 opinions as classified by us with the label highest Appr.Neg 59 2.034 7.638 contrast and the output of manual annotation by Affct.Pos 89 3.069 11.483 Appraisal classes. Affct.Neg 109 3.759 14.052 Judgm.Pos 49 1.689 6.367 Table 2: Quantitative data for six appraisal classes Judgm.Neg 8 0.276 1.079 for sonnets with highest contrast Classes Sum Mean St.Dev. nets look different from the two groups we already Appr.Pos 56 2.534 8.199 analysed. The prevailing trait is Affect Negative; Appr.Neg 25 1.134 3.691 Judgement Negative is only occasionally present; Affct.Pos 53 2.4 7.733 the second preminent trait is Affect Positive. In Affct.Neg 77 3.467 11.202 order to know how much the difference is, we can Judgm.Pos 32 1.445 4.721 judge from the quantities shown in table 3 above Judgm.Neg 122 5.467 17.611 (but see also Figure 3 in the Appendix). In particular, in this case the ratio Nega- In the group of 50 sonnets classified, mainly or tive/Positive is more balanced 226 over 176 with a exclusively, with Irony, the presence of Judgement majority of Positive annotations as happened with Negative is much lower than in the previous ta- Irony but with a lower gap. The appraisal category ble for Sarcasm (see Figure 2 in the Appendix). with highest number of annotations is now Affect, In fact only half of them – 25 – has annotation whereas in the case of Irony it was Appreciation, for that class, the remaining half introduces two and in Sarcasm it was Judgement. So eventually other negative classes: mainly Affect Negative, we have been able to differentiate the three main but also Appreciation Negative - see table 3 be- and more frequent pragmatic categories by means low. As to the main Positive class, we can see that of Appraisal Framework features: they are char- it is no longer Judgement Positive, but Apprecia- acterized by a different distribution of positive vs. tion Positive which is present in 33 sonnets. This negative evaluations and also by a prominent pres- is followed by Affect Positive which is better dis- ence of one of the three main subcategories into tributed. which Appraisal has been subdivided that is Ap- In other words we can now consider that Sar- preciation for Irony, Judgement for Sarcasm and casm is characterized by a majority of negative Affect where no evaluation has been expressed. evaluations 224 over 141; while Irony is charac- terized by a majority of Positive evaluations 262 3 Conclusion over 183 and that the values are sparse and un- equally distributed. The final table concerns the In this paper we have presented work carried out to number of sonnets with blank evaluation by critics annotate and experiment with the theme of irony in which amount to 60. As a rule, this group of son- Shakespeare’s Sonnets. The gold standard for the experiment has been created by collecting com- M. Taboada and J. Grieve. 2004. Analyzing appraisal ments produced by literary critics on the presence automatically. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: of some kind of thematic, semantic and syntac- Theories and Applications, pages 158–161. AAAI tic opposition in the sonnets as to produce some Press. sort of irony. At first the sonnets have been an- notated using the framework of Appraisal Theory David K. Weiser. 1983. http://www.jstor.org/stable/43343552 Shake- and then we checked the results: we obtained a spearean irony: The ’sonnets’. Neuphilologische very high level of matching with the critics’ opin- Mitteilungen, 84(4):456–469. ions at 80%. Eventually, Appraisal framework has David K. Weiser. 1987. Mind in Character – Shake- shown its ability to classify and diversify different speare s Speaker in the Sonnets. The University of levels of irony effectively. Missouri Press. References Salvatore Attardo. 1994. Linguistic Theories of Hu- mor. Mouton de Gruyter, Berlin – New York. Salvatore Attardo. 2000. Irony as relevant inappropri- ateness. Journal of Pragmatics, 84(32). Dario Calimani. 2009. William Shakespeare, I sonetti della menzogna. Carrocci, Roma. Rodolfo Delmonte and Giulia Marchesini. 2017. A semantically-based approach to the annotation of narrative style. In Proceedings of the 13th Joint ISO- ACL Workshop on Interoperable Semantic Annota- tion (ISA-13), pages 14–25, Stroudsburg, PA, USA. ACL. R.L. Eagle. 1916. New light on the enigmas of Shake- speare’s Sonnets. John Long Limited, London. Northrop Frye. 1957. Anatomy of Criticism: Four Es- says. Princeton University Press. Maria Antonietta Marelli. 2015. William Shakespeare, I Sonetti – con testo a fronte. Garzanti. J. Martin and P.R. White. 2005. Language of Eval- uation, Appraisal in English. Palgrave Macmillan, London and New York. Giorgio Melchiori. 1971. Shakespeare s Sonnets. Adriatica Editrice, Bari. J. Read and J. Carrol. 2012. Annotating expressions of appraisal in english. Language Resources and Eval- uation, 46:421–447. Michael Schoenfeldt. 2010. Cambridge introduction to Shakespeare’s poetry. Cambridge University Press, Cambridge. Alessandro Serpieri. 2002. Polifonia Shakespeariana. Bulzoni, Roma. Michele Stingo and Rodolfo Delmonte. 2016. Anno- tating satire in italian political commentaries with appraisal theory. In Natural Language Processing meets Journalism - Proceedings of the Workshop, NLPMJ-2016, pages 74–79, Stroudsburg, PA, USA. ACL. APPENDIX. Figures Of the Six Pragmatic Categories for Appraisal-Based Classification Figure 1: Subdivision into six appraisal classes for sonnets with highest contrast Figure 2: Subdivision into six appraisal classes for sonnets with lowest contrast Figure 3: Subdivision into six appraisal classes for sonnets with no contrast