=Paper= {{Paper |id=Vol-2402/paper9 |storemode=property |title=Sentiment Annotation for Lessing’s Plays: Towards a Language Resource for Sentiment Analysis on German Literary Texts |pdfUrl=https://ceur-ws.org/Vol-2402/paper9.pdf |volume=Vol-2402 |authors=Thomas Schmidt,Manuel Burghardt,Katrin Dennerlein,Christian Wolff |dblpUrl=https://dblp.org/rec/conf/ldk/SchmidtBDW19 }} ==Sentiment Annotation for Lessing’s Plays: Towards a Language Resource for Sentiment Analysis on German Literary Texts== https://ceur-ws.org/Vol-2402/paper9.pdf
Sentiment Annotation for Lessing’s Plays:
Towards a Language Resource for Sentiment
Analysis on German Literary Texts
Schmidt, Thomas
Media Informatics Group, University of Regensburg, Germany
http://mi.ur.de/sekretariat-team/thomas-schmidt
thomas.schmidt@ur.de

Burghardt, Manuel
Computational Humanities, University of Leipzig, Germany
https://ch.uni-leipzig.de/
burghardt@informatik.uni-leipzig.de

Dennerlein, Katrin
University of Würzburg, Germany
https://www.germanistik.uni-wuerzburg.de/ndl1/mitarbeiter/dennerlein/
katrin.dennerlein@uni-wuerzburg.de

Wolff, Christian
Media Informatics Group, University of Regensburg, Germany
http://mi.ur.de
christian.wolff@ur.de

      Abstract
We present first results of an ongoing research project on sentiment annotation of historical plays
by German playwright G. E. Lessing (1729-1781). For a subset of speeches from six of his most
famous plays, we gathered sentiment annotations by two independent annotators for each play. The
annotators were nine students from a Master’s program of German Literature. Overall, we gathered
annotations for 1,183 speeches. We report sentiment distributions and agreement metrics and put
the results in the context of current research. A preliminary version of the annotated corpus of
speeches is publicly available online and can be used for further investigations, evaluations and
computational sentiment analysis approaches.

2012 ACM Subject Classification Document preparation → Annoation; Retrieval tasks and goals
→ Sentiment analysis

Keywords and phrases Sentiment Annotation, Sentiment Analysis, Corpus, Annotation, Annotation
Behavior, Computational Literary Studies, Lessing


 1      Introduction

Sentiment analysis typically makes use of computational methods to detect and analyze
sentiment (e.g. positive or negative) in written text [7, p.1]. Emotion analysis, a research area
very closely related to sentiment analysis, deals with the analysis of more complex emotion
categories like anger, sadness or joy in written text. More recently, sentiment and emotion
analysis have also gained attention in computational literary studies [15] and are used to
investigate novels [3, 4], fairy tales [1, 8] and plays [8, 9, 11, 12, 13]. However, there are only
few resources available for sentiment analysis on literary texts, since sentiment annotation of
literary texts has turned out to be a tedious, time-consuming, and generally challenging task
[5, 14, 15], especially for historical literary texts, with their archaic language and complex
plots [14]. Previous studies have shown that agreement levels among annotators of sentiment
annotation are rather low, since literary texts can be understood and interpreted in a number
            © Schmidt, T., Burghardt, M., Dennerlein, K. and Wolff, C.;
            licensed under Creative Commons License CC-BY
LDK 2019 - Posters Track.
Editors: Thierry Declerck and John P. McCrae
XX:2   Sentiment Annotation in Lessing’s plays


       of ways, i.e. annotations are very much dependent on the subjective understanding of the
       annotator [14, 15]. On the technical side, most computational approaches for sentiment
       analysis on literary texts currently employ heuristic and rule-based methods [4, 8, 9, 11, 12, 13].
       For more advanced machine learning tasks as well as for standardized evaluations, well-
       curated sentiment-annotated corpora for literary texts would be an important desideratum,
       but in practice they are largely missing.
           We want to close this gap for the genre of German historical plays, more precisely plays
       of the playwright G. E. Lessing (1729-1781), by creating a corpus annotated with sentiment
       information. Since this is a very tedious task, we first want to investigate the suitability of
       different user groups with different levels of knowledge for literary works. We want to find
       out how important domain knowledge (experts, semi-experts and non-experts) is for this kind
       of task, since the acquisition of domain experts (literary scholars) is obviously more difficult
       than the acquisition of annotators without any formal literary training, who could possibly
       be hired on a large scale via a crowdsourcing platform. In this article we present first results
       from an annotation study with a group of semi-experts.


        2     Design of the Annotation Study

       We conducted the annotation study for semi-experts as part of a course in the Master’s
       program of German Literature at the Würzburg University. The course "Sentiment Analysis
       vs. Affektlehre" was focused on the plays of Lessing and revolved around the comparison
       of more traditional sentiment and emotion analysis approaches with recent computational
       methods [11, 12]. Each student had to write an essay and a presentation for a specific play.
       In addition, students were asked to participate in the sentiment annotation study. For this
       study, we chose six of the most famous plays by Lessing. Every student had to annotate
       200 randomly selected speeches except for the students assigned with Damon, since this
       play only consists of 183 speeches. A speech is a single utterance of a character, typically
       separated by utterances of other characters beforehand and afterwards. A speech can consist
       of one or multiple sentences. For each of the six plays we asked two independent students to
       provide sentiment annotations. Overall, nine students participated in the annotation project.
       One student, who was an advanced tutor, conducted annotations for multiple plays. The
       entire corpus comprises 1,183 speeches and we acquired 2,366 annotations (2 annotations
       per speech). The corpus consists of 3,738 sentences and 44,192 tokens. A speech consists on
       average of 3.16 sentences and 37.36 tokens.
           The overall process was very similar to [14]: Students received an annotation instruction
       and guidelines of three pages length, explaining the entire annotation process with examples.
       The annotation material was provided via Microsoft Word, which was also used as a basic
       annotation tool during the annotation process. Conducting annotation studies with well
       known products like Word or Excel is not uncommon in Digital Humanities projects, as
       humanities scholars are typically familiar with standard software tools [2, 14]. Every student
       received a file with the play-specific speeches. A speech was presented with the following
       information: (1) speaker name, (2) text of the speech and (3) position of the speech in the
       entire play. Furthermore, the predecessor and successor speech were presented to give some
       context information for the interpretation of the actual speech. Sentiment annotations were
       documented in a predefined table structure. Annotators were asked to write down the overall
       sentiment of an entire speech rather than the sentiment for specific parts of that speech. In
       case there were multiple, possibly conflicting sentiment indicators in one speech, annotators
       documented the most dominant sentiment for that speech.
Schmidt, T., Burghardt, M., Dennerlein, K. and Wolff, C.                                              XX:3


   Students were asked to annotate one of the following classes in the first table: negative,
positive, neutral, mixed, uncertain, and other (we refer to this scheme as differentiated
polarity). If annotators did not choose negative or positive, they were asked to provide a
tendency for one of those two classes (binary polarity). Figure 1 illustrates the annotation
process via an example annotation. Here the annotator chose neutral for the differentiated
polarity and negative for the binary polarity.




                           Figure 1 Example of the annotation material


    We also gathered annotations about the reference of the sentiment and the certainty of the
annotations to get more sophisticated insights about the annotation behavior. However, we
will only focus on the polarity in the upcoming sections. Students had a month to complete
the annotations, but other than that could freely organize the time to complete the task by
themselves. In the next section, we report some of the major findings from our annotation
study.


 3     Results
First, we present the distributions concerning the differentiated polarity for the entire corpus
and all 2,366 annotations (table 1):

           Negative     Positive     Neutral      Mixed        Uncertain      Other
           734 (31%)    442 (19%)    540 (23%)    405 (17%)    235 (9%)       10 (0.4%)

                         Table 1 Sentiment distribution for all annotations


   Most of the annotations are negative, and a significant number of annotations are mixed
and uncertain. These results are in line with current research about sentiment annotation of



                                                                                                   L D K Po s t e r s
XX:4   Sentiment Annotation in Lessing’s plays


       literary texts [1, 14], which seem to have a general tendency toward more negative annotations.
       To analyze the agreement among annotators we used Cohen’s Kappa and the average observed
       agreement(AOA, number of agreements divided by the total number of annotated speeches)
       as basic metrics. According to [6], Kappa metrics can be interpreted like this (table 2):


                                       Kappa value   Interpretation
                                       <0.20         Poor agreement
                                       0.21-0.40     Fair agreement
                                       0.41-0.60     Moderate agreement
                                       0.61-0.80     Substantial agreement
                                       0.81-1.00     Very good agreement

                                       Table 2 Interpretation of Cohen’s Kappa



           As described above, we had the annotators document differentiated (6 values) as well as
       binary (2 values) polarity values for each of the speeches. In addition, we also inferred a third
       variant from those annotations, which we call threefold polarity. We basically extend the
       binary polarity metric by the value "neutral" in the following manner: if the value "neutral"
       is annotated at the differentiated polarity, this value is taken over in this variant. For any
       other case the value "positive" or "negative" is taken over from the binary polarity annotation.
       We present these metrics per play along with average values for the entire corpus (table 3):


                                Play                Annotation type        kappa   AOA
                                                 Differentiated polarity    0.03   15%
                              Damon               Threefold polarity        0.12   45&
                                                     Binary polarity        0.14   56%
                                                 Differentiated polarity    0.43   59%
                           Emilia Galotti         Threefold polarity        0.49   68%
                                                     Binary polarity        0.49   76%
                                                 Differentiated polarity    0.13   30%
                            Der Freigeist               Threefold           0.28   57%
                                                     Binary polarity        0.34   68%
                                                 Differentiated polarity    0.29   44%
                        Minna von Barnhelm        Threefold polarity        0.39   59%
                                                     Binary polarity        0.29   63%
                                                 Differentiated polarity    0.32   46%
                         Nathan der Weise         Threefold polarity        0.40   62%
                                                     Binary polarity        0.43   72%
                                                 Differentiated polarity    0.38   54%
                         Miß Sara Sampson         Threefold polarity        0.48   66%
                                                     Binary polarity        0.65   83%
                                                 Differentiated polarity    0.30   45%
                          Overall average         Threefold polarity        0.36   59%
                                                     Binary polarity        0.39   69%

                                              Table 3 Agreement levels
Schmidt, T., Burghardt, M., Dennerlein, K. and Wolff, C.                                                    XX:5


    4    Discussion
The agreement levels are poor to fair for many annotation types and plays (0.00-0.40).
For annotation types with fewer classes and for several plays we identified Kappa values
with moderate agreement (0.41-0.60). The highest agreement levels are achieved for Emilia
Galotti and Miss Sara Sampson, in the latter case with substantial agreement for the binary
annotation (0.61-0.80). Overall, these results are in line with current research [1, 5, 14],
proving that sentiment annotation on literary texts is very subjective, difficult and dependent
on the interpretation of the annotator, thus leading to rather low agreement levels compared
to sentiment annotations on other text sorts like movie reviews [17] and social media content
[10]. However, for several plays and annotation types higher agreements were achieved than
compared to similar annotations with non-experts [14]. Therefore, we propose to explore
sentiment annotations with advanced experts concerning the literary text to acquire more
stable annotations. Although the agreement levels are too low to use the full corpus for
evaluation or machine learning purposes, we still think that the corpus can be useful for the
computational literary studies community and therefore provide a first alpha version of the
corpus with all annotations1 .


    5    Future Directions
As a next step, we want to gather more annotations by other user groups and compare them
to each other. Although non-experts currently seem to produce lower levels of agreement, we
want to explore if we can produce valuable corpora with non-experts by using majority decision
with a high number of annotations per speech. To gain insights for possible improvements
concerning the annotation scheme and process, we also conducted a focus group with the
annotators and acquired feedback via a questionnaire. Note that we focused on the overall
sentiment of an entire speech. In the future we want to use more sophisticated models
including more precise annotations of the sentiment direction and object. Taking into account
existing studies on multi-modal sentiment analysis on dramatic texts [16], we currently also
explore the possibility to extend the speech annotation to modalities other than text, e.g.
audio and video material of the corresponding theater plays to improve agreement levels.


        References
    1   Cecilia Ovesdotter Alm and Richard Sproat. Emotional sequencing and development in fairy
        tales. In International Conference on Affective Computing and Intelligent Interaction, pages
        668–674. Springer, 2005.
    2   Lars Döhling and Manuel Burghardt. PaLaFra – Entwicklung einer Annotationsumgebung für
        ein diachrones Korpus spätlateinischer und altfranzösischer Texte. In Book of Abstracts, DHd
        2017, Bern, Switzerland, 2017.
    3   Fotis Jannidis, Isabella Reger, Albin Zehe, Martin Becker, Lena Hettinger, and Andreas
        Hotho. Analyzing features for the detection of happy endings in german novels. arXiv preprint
        arXiv:1611.09028, 2016.
    4   Tuomo Kakkonen and Gordana Galic Kakkonen. Sentiprofiler: Creating comparable visual pro-
        files of sentimental content in texts. In Proceedings of the Workshop on Language Technologies
        for Digital Humanities and Cultural Heritage, pages 62–69, 2011.


1
    A preliminary version of this resource is available online:
    https://www.dropbox.com/sh/8mu29ny8fhrpgg2/AABFXw7qYHLoJ-4yx8CBlXX9a?dl=0




                                                                                                         L D K Po s t e r s
XX:6   Sentiment Annotation in Lessing’s plays


        5   Evgeny Kim and Roman Klinger. Who feels what and why? annotation of a literature corpus
            with semantic roles of emotions. In Proceedings of the 27th International Conference on
            Computational Linguistics, pages 1345–1359, 2018.
        6   Rainer Leonhart. Lehrbuch Statistik: Einstieg und Vertiefung. Verlag Hans Huber, 2013.
        7   Bing Liu. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge
            University Press, 2016.
        8   Saif Mohammad. From once upon a time to happily ever after: Tracking emotions in novels and
            fairy tales. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural
            Heritage, Social Sciences, and Humanities, pages 105–114. Association for Computational
            Linguistics, 2011.
        9   Eric T Nalisnick and Henry S Baird. Character-to-character sentiment analysis in shakespeare’s
            plays. In Proceedings of the 51st Annual Meeting of the Association for Computational
            Linguistics (Volume 2: Short Papers), volume 2, pages 479–483, 2013.
       10   Rudy Prabowo and Mike Thelwall. Sentiment analysis: A combined approach. Journal of
            Informetrics, 3(2):143–157, 2009.
       11   Thomas Schmidt and Manuel Burghardt. An evaluation of lexicon-based sentiment analysis
            techniques for the plays of gotthold ephraim lessing. In Proceedings of the Second Joint
            SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences,
            Humanities and Literature, pages 139–149. Association for Computational Linguistics, 2018.
            URL: http://aclweb.org/anthology/W18-4516.
       12   Thomas Schmidt and Manuel Burghardt. Toward a tool for sentiment analysis for
            german historic plays. In Michael Piotrowski, editor, COMHUM 2018: Book of Ab-
            stracts for the Workshop on Computational Methods in the Humanities 2018, pages 46–48,
            Lausanne, Switzerland, June 2018. Laboratoire laussannois d’informatique et statistique
            textuelle. URL: https://www.researchgate.net/publication/331907713_Toward_a_Tool_
            for_Sentiment_Analysis_for_German_Historic_Plays.
       13   Thomas Schmidt, Manuel Burghardt, and Katrin Dennerlein. „Kann man denn auch
            nicht lachend sehr ernsthaft sein?“ – Zum Einsatz von Sentiment Analyse-Verfahren
            für die quantitative Untersuchung von Lessings Dramen.              In Georg Vogeler, ed-
            itor, Book of Abstracts, DHd 2018, pages 244–249, Cologne, Germany, 2018. URL:
            https://www.researchgate.net/publication/331908052_Kann_man_denn_auch_nicht_
            lachend_sehr_ernsthaft_sein-Zum_Einsatz_von_Sentiment_Analyse-Verfahren_fur_
            die_quantitative_Untersuchung_von_Lessings_Dramen.
       14   Thomas Schmidt, Manuel Burghardt, and Katrin Dennerlein. Sentiment annotation of historic
            german plays: An empirical study on annotation behavior. In Sandra Kübler and Heike
            Zinsmeister, editors, Proceedings of the Workshop for Annotation in Digital Humantities
            (annDH), pages 47–52, Sofia, Bulgaria, August 2018. URL: http://ceur-ws.org/Vol-2155/
            schmidt.pdf.
       15   Thomas Schmidt, Manuel Burghardt, and Christian Wolff. Herausforderungen für Sentiment
            Analysis-Verfahren bei literarischen Texten. In Manuel Burghardt and Claudia Müller-Birn,
            editors, INF-DH-2018, Berlin, Germany, September 2018. Gesellschaft für Informatik e.V.
            URL: https://dl.gi.de/handle/20.500.12116/16996.
       16   Thomas Schmidt, Manuel Burghardt, and Christian Wolff. Toward multimodal sentiment
            analysis of historic plays: A case study with text and audio for lessing’s emilia galotti. In
            4th Conference of the Association of Digital Humanities in the Nordic Countries (DHN
            2019), 2019.       URL: https://www.researchgate.net/publication/331907721_Toward_
            Multimodal_Sentiment_Analysis_of_Historic_Plays_A_Case_Study_with_Text_and_
            Audio_for_Lessing’s_Emilia_Galotti.
       17   Tun Thura Thet, Jin-Cheon Na, and Christopher SG Khoo. Aspect-based sentiment analysis
            of movie reviews on discussion boards. Journal of information science, 36(6):823–848, 2010.