Exploring Multimodal Sentiment Analysis in Plays: A Case Study for a Theater Recording of Emilia Galotti Thomas Schmidt1 , Christian Wolff2 1 Media Informatics Group, University of Regensburg, D-93040, Regensburg, Germany 2 Media Informatics Group, University of Regensburg, D-93040, Regensburg, Germany Abstract We present first results of an exploratory study about sentiment analysis via different media channels on a German historical play. We propose the exploration of other media channels than text for sentiment analysis on plays since the auditory and visual channel might offer important cues for sentiment analysis. We perform a case study and investigate how textual, auditory (voice-based), and visual (face-based) sentiment analysis perform compared to human annotations and how these approaches differ from each other. As use case we chose Emilia Galotti by the famous German playwright Gotthold Ephraim Lessing. We acquired a video recording of a 2002 theater performance of the play at the “Wiener Burgtheater”. We evaluate textual lexicon-based sentiment analysis and two state-of-the-art audio and video sentiment analysis tools. As gold standard we use speech-based annotations of three expert annotators. We found that the audio and video sentiment analysis do not perform better than the textual sentiment analysis and that the presentation of the video channel did not improve annotation statistics. We discuss the reasons for this negative result and limitations of the approaches. We also outline how we plan to further investigate the possibilities of multimodal sentiment analysis. Keywords sentiment analysis, computational literary studies, video, annotation, multimodality 1. Introduction Sentiments and emotions are an important part of qualitative and hermeneutical analysis in literary studies and are important cues for the understanding and interpretation of narrative art (cf. [49, 22, 23, 53]). The computational method of predicting and analyzing sentiment, predominantly in written text, is referred to as sentiment analysis and has a long tradition in the computational analysis of social media and user generated content on the web [20]. In general, sentiment analysis regards sentiment (also often referred to as opinion, polarity or valence) as a class-based phenomenon describing the connotation of a text unit as either positive, negative or neutral. The prediction and analysis of more complex categories like anger, sadness or joy (e.g. in a multi-class setting) is called computational emotion analysis [20]. While more complex emotions are of great interest for literary studies, we focus on sentiment solely for our first explorations of sentiment analysis on multiple media channels. There is a growing interest for sentiment and emotion analysis applications in Digital Hu- manities (DH), especially in Computational Literary Studies (CLS). Researchers explore these CHR 2021: Computational Humanities Research Conference, November 17–19, 2021, Amsterdam, The Netherlands £ thomas.schmidt@ur.de (T. Schmidt); christian.wolff@ur.de (C. Wolff) Å go.ur.de/Thomas-schmidt (T. Schmidt); go.ur.de/christian-wolff (C. Wolff) DZ 0000-0001-7171-8106 (T. Schmidt); 0000-0001-7278-8595 (C. Wolff) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) 392 methods on text genres like fairy tales [1, 24], novels [31, 14, 17, 12], fan fictions [29, 16], movie subtitles [27, 13, 42] and social media content [44, 46]. In the context of historical plays, schol- ars apply sentiment and emotion analysis to investigate visualization possibilities [24, 34, 37], relationships among characters [34, 26] or evaluate the performance compared to human-based expert annotations [35]. For an in-depth analysis of the current state of sentiment analysis in DH and CLS see [15]. While these projects are very promising, there are several problems concerning the current state of this field: The annotation process has been shown to be rather tedious and challenging, often leading to the need of (expensive) experts that understand context and language of the material. Furthermore, agreement among annotators is low due to the inherent subjectivity of narrative and poetic texts but also due to problems in the interpretation because of historical or vague language [1, 27, 50, 36, 38, 48, 41]. The annotation problems pose challenges to the creation of valid corpora that are necessary for modern machine learning. While some research intends to address this problem by developing user-centered annotation tools [45], the problems still persist. Thus, sentiment and emotion analysis is predominantly performed with rather primitive rule-based methods (lexicon-based methods, cf. [15]) and achieves prediction accuracies on annotated literary corpora between 40 and 80% [35, 2], which is low compared with other text genres like social media or product reviews [20, 52]. As main problems, re- searchers report the historical and poetic language, as well as the usage of irony and metaphors which pose challenges for rule-based methods that are dependent of a fixed set of contemporary vocabulary annotated with sentiment information (see chapter 4.1). We argue that the fact that the majority of sentiment analysis in CLS is performed on the textual level might be a reason for the resulting limitations. We propose the exploration of other media channels to advance this area of research. Indeed, multimodal sentiment analysis has proven to be successful in several application areas that offer next to text a video channel [30] and is used in various contexts in human-computer interaction [28, 11, 10]. While many literary texts are at their core texts written for a reading audience, we argue that plays are similar to movie scripts and mostly not intended to be read, but to be performed on stage. In addition one might argue that many narrative forms have their origin in a communicative situation based on spoken language [7] as text is a fairly late invention. Indeed, the important role of the performance and problems with the sole focus on the text for plays have been a major argument and point of discussion throughout the history of theater studies (cf. [51]). Especially concerning attributes like sentiment and emotions, one can argue that they are more so communicated by the actors in a live performance using voice, facial expressions as well as gait and gesture, than via the written text. Furthermore, theater recordings of canonical plays are nowadays easier to access, for example online or via library services. Thus, we perform a case study about how the inclusion of the auditory (voice-based) and the visual (face-based) channel influences (1) the sentiment annotation process and (2) the computational sentiment prediction. The presentation of the video channel might facilitate the annotation process and improve agreements since annotators do not have to solely rely on the difficult language for interpretation. As case study we select the play Emilia Galotti (1772) by Gotthold Ephraim Lessing (1729-1781), one of the most famous German playwrights. We describe the material in more detail in chapter 2. Afterwards, we discuss the creation of an annotated gold standard (chapter 3) and the applied sentiment analysis methods for the three media channels: text, audio and video (chapter 4). In chapter 5 we compare the three methods to each other and investigate the performance on the gold standard before discussing the results in chapter 6. 393 2. Material As material for this case study we use the play Emilia Galotti by G. E. Lessing. Lessing is one of the most famous German playwrights and Emilia Galotti one of his most important plays. The play has already been explored in the context of audio sentiment analysis [39]. For our analysis, we use a recorded theater performance. The performance dates from 2002 and was performed at the “Wiener Burgtheater” in German language.1 The recording has decent audio and picture quality and meets the necessary quality re- quirements as demanded by our sentiment analysis tools. The video format is mp4 with a resolution of 640 x 360 and 25 frames per second. The audio was extracted as a wav-file with a sampling rate of 44.1 kHz stereo. Current research on quantitative drama analysis is focused on speeches as a basic structural unit. A speech is usually a single utterance of a character separated by utterances of other characters beforehand and afterwards. Emilia Galotti consists of 835 speeches. However, it is common that the real stage production of a play does not com- pletely adhere to the published text and order of the original material. Thus, we deviate from the textual speeches offered by the written original and acquire the text as actually spoken by the actors during staging to enable correct comparisons among all sentiment analysis methods. To acquire the text of this performance we used the Google Clouds Speech to Text-API (GCST) for German.2 GCST is considered state-of-the-art for speech-to-text-tasks and pro- duces text structured by units separated when longer breaks occur during the utterances. In the following, we will refer to these units as ”speeches”. The API produces 672 of these textual units and is therefore quite different from the original speeches of the textual source material. Note that these text units sometimes consist of utterances of multiple speakers or are separated during a speech or a sentence (depending on when a break appears). Furthermore, GCST is not intended for the usage for video recordings of theater performances and while we did not perform an exact evaluation, we were able to identify that GCST produces various mistakes that are quite substantial for some passages. Thus, in a subsequent step, we corrected the mistakes in the output by listening to the play and transcribing certain passages from scratch. GCST also delivers precise time stamps for the 672 units and we separated the audio- and video-file according to these timestamps to obtain 672 comparable units for every modality. 3. Annotation Process and Results The speeches were annotated by two annotators who are familiar with Lessing’s work and the specific play. Annotators assigned one of three polarity classes to every speech: negative, neutral and positive. The instruction was to assign the class the speech is most connoted with depending on the overall sentiment the characters express. Annotators were shown the entire video of a speech as well as the text (meaning all modalities) via a table and a video player. The annotators were given one week to finish the annotation and were compensated monetarily. We conducted short interviews about the annotation process afterwards, which we discuss briefly in chapter 6. The annotation results are as follows: The annotators agree upon 348 of the 672 speeches (52%) with a Cohen’s κ-value of 0.233 (fair agreement according to [18]). These are rather low 1 More information about the recording: https://www.amazon.de/Emilia-Galotti-Lessing-Wiener- Burgtheater/dp/B0038QGXOK 2 https://cloud.google.com/speech-to-text/ 394 Figure 1: Sentiment distribution of gold standard annotations levels of agreement, which are however in line with previous research concerning annotation of literary or historical texts [1, 27, 42, 50, 36, 38, 48, 41]. We define the gold standard we use for the evaluation of sentiment prediction via the following approach: If annotators agree upon a speech, the speech is assigned the chosen class. Considering speeches the first two annotators did not agree upon, a third expert annotator decided upon the final annotation via the same annotation process as described above. Figure 1 illustrates the distribution of these gold standard annotations. The majority of annotations are negative, which is in line with previous research considering the annotation of literary texts [1, 42, 36, 38, 48, 41]. The high number of neutral annotations is, according to our analysis, due to the fact that many speeches are very short (e.g. consisting of one word), thus making the assignment of positive or negative sentiment rather difficult. 4. Sentiment Analysis Methods In this chapter we describe the different sentiment analysis approaches. All approaches were implemented in Python with support of various Software Development Kits (SDKs) which we describe more detailed in the upcoming chapters. Statistical analysis was performed in Python or with the IBM SPSS statistics software. 4.1. Textual Sentiment Analysis For the textual sentiment analysis we employ a lexicon-based approach. A sentiment lexicon is a list or table of words annotated concerning sentiment information, e.g. if a word is rather negatively or positively connoted. Due to simple word-based calculations one can infer the sentiment of a text: By summing up the number of positive words and subtracting the number of negative words, one receives an overall value for the sentiment of the text unit which can be regarded as negative if the value is below 0, neutral for 0 and positive if the value is above 0. Oftentimes, sentiment lexicons offer continuous values instead of nominal assignments which 395 can be used similarly. In research, lexicon-based sentiment analysis is often chosen when machine learning is not possible due to the lack of well annotated corpora and is a common method in sentiment analysis on literary and historical texts [1, 24, 14, 29, 34, 26, 35, 50, 2, 40] or for special social media corpora [44, 46, 25]. We utilize the sentiment lexicon SentimentWortschatz (SentiWS) [32], which is one of the most well-known and validated lexicons for German [8] and perform calculations for all speeches as described above. The words in SentiWS are annotated with floating point numbers con- cerning the polarity on a continuous scale from +3 (very positive) to -3 (very negative). The lexicon consists of 3,469 entries along with their inflections. To address the problem of his- torical language to some extent we apply the optimizations recommended by Schmidt and Burghardt [35]: lemmatization via treetagger [33] and the extension of the lexicon with histor- ical variants. This lexicon-based approach has been shown to be successful in the setting of German historical plays compared to more basic lexicon-based approaches. 4.2. Audio Sentiment Analysis For the audio sentiment analysis, we use the free developer version of the tool Vokaturi.3 Vokaturi is an emotion recognition tool for spoken language employing machine learning. It is considered to be language independent, is recommended as the best free software for sentiment analysis of spoken language [9] and used in similar comparative research [47]. To implement the analysis, Vokaturi uses machine learning on two larger databases with voice- and audio- based features. We use each of the 672 speeches as input for Vokaturi and receive numerical values on a range from 0 (none) – 1 (a lot) for the five categories neutrality, fear, sadness, anger and happiness. The value specifies to which degree the corresponding concept is present in the audio file. However, the tool does not report a sentiment/valence score directly. Thus, to map this output to the concept of the sentiment classes (positive/negative/neutral), we apply the following heuristic: we sum up the values for the negative emotions fear, sadness and anger to get an overall value for the negative polarity. We regard the value of happiness as the positive polarity. We then compare these two values and the value for neutrality and assign the maximum of these three values as overall sentiment of the speech. We refer to this method as audio sentiment analysis. 4.3. Visual Sentiment Analysis To conduct the video sentiment analysis we utilize the free version of the Emotion SDK of Affectiva.4 The Affectiva Emotion SDK is a cross-platform face-expression recognition toolkit focused on face detection and facial emotion recognition [21] also used in various research fields [19]. According to Affectiva, the analysis is based on a large training database with over 6.5 million faces. To perform the video sentiment analysis we segment the 672 video parts into frames, one frame per second. Then, we use the facial emotion recognition of the Affectiva Emotion SDK on all of the frames. The SDK produces multiple values relevant for emotion recognition. However, we solely rely on the valence value, which is a value produced to describe the overall expression of the face as rather positive or negative. The valence value is positive if predominately positive emotions are recognized and negative for predominately negative emotions. The value can also 3 https://vokaturi.com/ 4 https://www.affectiva.com/ 396 Table 1 Distribution of sentiment classes output per modality approach. # marks the absolute number and % the proportion of the sentiment classes among all speeches. negative (#) negative (%) neutral (#) neutral (%) positive (#) positive (%) Textual 313 46.58 187 27.83 172 25.60 Audio (Voice) 420 62.50 62 9.23 190 28.27 Video (Facial) 137 20.39 490 72.92 45 6.70 Table 2 Accuracy results (proportion of correctly predicted speeches) per modality approach. The absolute number of correctly predicted speeches is in brackets. Textual Audio (Voice) Video (Facial) Accuracy 46% (311) 40% (264) 44% (295) be zero if no emotion is apparent or no face can be detected. We sum up all valence values of all frames corresponding to the time-frame of a speech and then assign the sentiment accordingly: positive if the overall valence is positive, negative if the overall valence is negative and neutral for a value of 0. Note, that we configured the SDK to choose the valence of the largest face that is detected on the frame. We will refer to this method as video sentiment analysis in the following. 5. Results 5.1. Comparison of Textual, Audio and Visual Sentiment Analysis First, we report the general frequency distributions concerning the predicted sentiment of all three modalities: text, audio and video for all 672 speeches (see table 1). All approaches produce very different results: The textual sentiment analysis predicts the majority of speeches as negative (47%). Neutral predictions are mostly due to short speeches consisting of only few words with no representation in the sentiment lexicon. They are however slightly more frequent (28%) than positive predictions (26%). In contrast, the audio sentiment analysis rarely assigns the neutral class (10%) while negative predictions are dominant (63%). The video sentiment analysis predicts the vast majority of speeches as neutral (73%) and only a small fraction as positive (7%). We identified that the reason for this behavior is that faces are not identified due to difficult angles and camera movements. Thus, no emotion recognition is performed and the frames are regarded as neutral. 5.2. Performance Evaluation In the following section, we report on the sentiment prediction accuracies of the computational methods using the annotations as gold standard (Table 2). The overall accuracy is the pro- portion of correctly predicted speeches among all speeches. The random baseline is 33%, the majority baseline is around 42%. All approaches are above the random baseline and some slightly above the majority baseline. Overall, the accuracies are rather close to each other and no significant differences are identified. The highest accuracy is achieved with the textual approach (46%) followed by the video (44%) and the audio (40%) sentiment analysis. The results are however way below reported accuracies 397 Figure 2: Frame of a speech correctly predicted as negative by the video/facial sentiment analysis in similar and different fields: Lexicon-based sentiment analysis on literary texts achieves around 40-80% [35, 2]. Modern deep learning-based approaches in other research areas can achieve up to 95% [20]. In a similar study comparing text to audio of a theater recording, the results are equivalent, however [39]. All data (corpus, annotations and results) is publicly available via a GitHub-repository.5 6. Discussion While we identified that all approaches behave rather differently, the accuracy levels for them are below the results of sentiment analysis with other text and media genres applying state- of-the-art machine learning using large training corpora of the fitting domain [20, 52, 30]. In the context of literary texts, the results are in line with the overall – mediocre – accuracies of other studies applying lexicon-based [1, 35] or audio-based methods [39], thus proving again the general difficulty of the task. We could not show that the audio or video sentiment analysis ap- proach outperforms textual sentiment analysis. While problems like historical language might be solved, novel problems occur that decrease the performance of audio and video sentiment analysis, although the applied approaches showed state-of-the-art results with other media material like social media videos [30, 9]. The audio and video approaches perform rather well for extreme emotional expressions (see figure 2 for a correct example). However, the video approach is dependent on the picture quality and has problems with bad lightning, disadvan- tageous camera positions and when actors express a complex grimace (figure 3). Indeed, facial sentiment analysis is mostly trained on images of people looking directly towards the camera (cf. [5]) which is rarely the case for a live theater recording. Thus, no faces are detected and no emotion recognition can be performed which lead to many false neutral predictions. The audio sentiment analysis detects emotional nuances even in short speeches; however, the annotators tend to rate those speeches as rather neutral. Many problems well known from textual sentiment analysis also remain: how to deal with irony and sarcasm, long speeches or switching between sentiments during a speech. Despite the mediocre results of the case study, plays are meant to be performed and thus experienced with multiple modalities. Therefore, the application of multimodal sentiment analysis is closer to the artistic experience of the theatergoers as it is intended to be. We want to pursue this idea further by changing the material from theater performances of historical plays to rather simple contemporary movies that might be less challenging considering camera 5 https://github.com/lauchblatt/Video-Emotion-Analysis-for-EmiliaGalotti 398 Figure 3: Frame of a speech falsely predicted as positive by the video/facial sentiment analysis although negative angles, performance and audio quality. Thus, we want to investigate to what extent the quality and complexity of the material influences the approaches. Furthermore, we have focused on ready-to-use sentiment analysis approaches without op- timization or domain adaptation for the specific context, which is not uncommon in DH. However, the usage of general-purpose approaches did not prove to be beneficial or at least acceptable. Lexicon-based methods for textual sentiment analysis might very well not be able to deal with the historical language and the nuanced emotional expressions of plays. The audio and video-based models are, of course, trained on contemporary online videos and not on theater recordings. Therefore, we want to explore more sophisticated approaches based on machine learning on the specific material used. On a conceptual level, all three modal- ities might suffer from the fact that we did not integrate a ”mixed”-class in the prediction. Especially for longer passages, change of sentiment can occur and might lead to false inter- pretations. More sophisticated tools will however enable us to explore the integration of this class in more detail. Another reason for problems concerning the computational predictions but also the corresponding annotations might be that annotators included other cues besides language, face and voice into their interpretation. Indeed, research shows that body cues and body movement might be more important cues for emotional expressions [3], which is some- thing that the applied tools mostly neglect but could be investigated via pose detection [6]. Furthermore, individual differences in the expression of emotions among humans in general and actors specifically might be large, a question that is intensively discussed in psychology [4]. Lastly, our main goal is to fuse the different approaches to a multimodal classification approach that encompasses all modalities as has been successfully applied in other research areas [30]. Considering the annotation, annotators reported that being offered multiple media channels facilitated the annotation and that the image and audio channel helped a lot when the language and the context of the play was unclear. However, this did not show any positive effects for agreement among the annotators. The agreement level remains similarly mediocre as with annotations of literary or historical texts with solely the textual representation [1, 27, 50, 36, 38]. Our assumption that the presentation of the video makes the annotation more clear was proven wrong for this specific use case. The subjectivity in the interpretation of literary material is not affected by the presentation of the video channel. We want to pursue improvements for the annotation process by developing specific video annotation tools, enabling the annotation while watching the movie [42, 43] While the presented case study can be regarded as negative result we did learn that the ap- plication of general-purpose sentiment analysis is not sufficient for our material. Thus, we are 399 currently conducting larger annotation studies to gather training material for optimized ma- chine learning approaches but also to explore the influence of multimodality on the annotation procedure. References [1] C. O. Alm and R. Sproat. “Emotional Sequencing and Development in Fairy Tales”. In: Affective Computing and Intelligent Interaction. Ed. by J. Tao, T. Tan, and R. W. Picard. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2005, pp. 668– 674. doi: 10.1007/11573548\_86. [2] C. O. Alm and R. Sproat. “Emotional sequencing and development in fairy tales”. In: International Conference on Affective Computing and Intelligent Interaction. Springer. 2005, pp. 668–674. [3] H. Aviezer, Y. Trope, and A. Todorov. “Body cues, not facial expressions, discriminate between intense positive and negative emotions”. In: Science 338.6111 (2012), pp. 1225– 1229. [4] L. F. Barrett. How emotions are made: The secret life of the brain. Houghton Mifflin Harcourt, 2017. [5] E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang. “Training deep networks for facial expression recognition with crowd-sourced label distribution”. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction. 2016, pp. 279–283. [6] Q. Dang, J. Yin, B. Wang, and W. Zheng. “Deep learning based 2D human pose esti- mation: A survey”. In: Tsinghua Science and Technology 24.6 (2019), pp. 663–676. doi: 10.26599/tst.2018.9010100. [7] K. Dautenhahn. “The origins of narrative: In search of the transactional format of nar- ratives in humans and other social animals”. In: International Journal of Cognition and Technology 1.1 (2002), pp. 97–123. doi: https://doi.org/10.1075/ijct.1.1.07dau. url: https://www.jbe-platform.com/content/journals/10.1075/ijct.1.1.07dau. [8] J. Fehle, T. Schmidt, and C. Wolff. “Lexicon-based Sentiment Analysis in German: Sys- tematic Evaluation of Resources and Preprocessing Techniques”. In: Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021). Düsseldorf, Ger- many, 2021. [9] J. M. Garcia-Garcia, V. M. Penichet, and M. D. Lozano. “Emotion detection: a technol- ogy review”. In: Proceedings of the XVIII international conference on human computer interaction. 2017, pp. 1–8. [10] D. Halbhuber, J. Fehle, A. Kalus, K. Seitz, M. Kocur, T. Schmidt, and C. Wolff. “The Mood Game - How to Use the Player’s Affective State in a Shoot’em up Avoiding Frustration and Boredom”. In: Proceedings of Mensch Und Computer 2019. MuC’19. Hamburg, Germany: Association for Computing Machinery, 2019, pp. 867–870. doi: 10.1145/3340764.3345369. [11] P. Hartl, T. Fischer, A. Hilzenthaler, M. Kocur, and T. Schmidt. “AudienceAR - Utilising Augmented Reality and Emotion Tracking to Address Fear of Speech”. In: Proceedings of Mensch Und Computer 2019. MuC’19. Hamburg, Germany: Association for Computing Machinery, 2019, pp. 913–916. doi: 10.1145/3340764.3345380. 400 [12] F. Jannidis, I. Reger, A. Zehe, M. Becker, L. Hettinger, and A. Hotho. “Analyzing features for the detection of happy endings in german novels”. In: arXiv preprint arXiv:1611.09028 (2016). [13] K. Kajava, E. Öhman, P. Hui, and J. Tiedemann. “Emotion Preservation in Translation: Evaluating Datasets for Annotation Projection”. In: Proceedings of Digital Humanities in Nordic Countries (DHN 2020). Ceur, 2020, pp. 38–50. [14] T. Kakkonen and G. Galić Kakkonen. “SentiProfiler: Creating Comparable Visual Pro- files of Sentimental Content in Texts”. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage. Hissar, Bulgaria: Association for Computational Linguistics, 2011, pp. 62–69. [15] E. Kim and R. Klinger. “A Survey on Sentiment and Emotion Analysis for Computational Literary Studies”. In: Zeitschrift für digitale Geisteswissenschaften (2019). doi: 10.17175/ 2019\_008. url: http://arxiv.org/abs/1808.03137. [16] E. Kim and R. Klinger. “An Analysis of Emotion Communication Channels in Fan- Fiction: Towards Emotional Storytelling”. In: Proceedings of the Second Workshop on Storytelling. Florence, Italy: Association for Computational Linguistics, 2019, pp. 56–64. [17] E. Kim, S. Padó, and R. Klinger. “Prototypical Emotion Developments in Literary Gen- res”. In: Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. 2017, pp. 17–26. [18] J. R. Landis and G. G. Koch. “The Measurement of Observer Agreement for Categorical Data”. In: Biometrics 33.1 (1977), pp. 159–174. [19] M. Magdin and F. Prikler. “Real time facial expression recognition using webcam and SDK affectiva”. In: Ijimai 5.1 (2018), pp. 7–15. [20] M. V. Mäntylä, D. Graziotin, and M. Kuutila. “The evolution of sentiment analysis–A review of research topics, venues, and top cited papers”. In: Computer Science Review 27 (2018), pp. 16–32. doi: 10.1016/j.cosrev.2017.10.002. [21] D. McDuff, A. Mahmoud, M. Mavadati, M. Amr, J. Turcot, and R. e. Kaliouby. “AFFDEX SDK: a cross-platform real-time multi-face expression recognition toolkit”. In: Proceed- ings of the 2016 CHI conference extended abstracts on human factors in computing systems. 2016, pp. 3723–3726. [22] K. Mellmann. “Literaturwissenschaftliche Emotionsforschung”. In: Handbuch Literarische Rhetorik. De Gruyter, 2015, pp. 173–192. [23] B. Meyer-Sickendiek. Affektpoetik: eine Kulturgeschichte literarischer Emotionen. Königshausen & Neumann, 2005. [24] S. Mohammad. “From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales”. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Portland, OR, USA: Association for Computational Linguistics, 2011, pp. 105–114. [25] L. Moßburger, F. Wende, K. Brinkmann, and T. Schmidt. “Exploring Online Depres- sion Forums via Text Mining: A Comparison of Reddit and a Curated Online Forum”. In: Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task. Barcelona, Spain (Online): Association for Computational Linguistics, 2020, pp. 70–81. url: https://www.aclweb.org/anthology/2020.smm4h-1.11. 401 [26] E. T. Nalisnick and H. S. Baird. “Character-to-Character Sentiment Analysis in Shake- speare’s Plays”. In: Proceedings of the 51st Annual Meeting of the Association for Com- putational Linguistics (Volume 2: Short Papers). Sofia, Bulgaria: Association for Compu- tational Linguistics, 2013, pp. 479–483. url: https://www.aclweb.org/anthology/P13- 2085. [27] E. Öhman. “Challenges in Annotation: Annotator Experiences from a Crowdsourced Emotion Annotation Task”. In: Proceedings of the Digital Humanities in the Nordic Countries 5th Conference. CEUR Workshop Proceedings, 2020, pp. 293–301. [28] A.-M. Ortloff, L. Güntner, M. Windl, T. Schmidt, M. Kocur, and C. Wolff. “SentiBooks: Enhancing Audiobooks via Affective Computing and Smart Light Bulbs”. In: Proceedings of Mensch Und Computer 2019. MuC’19. Hamburg, Germany: Association for Computing Machinery, 2019, pp. 863–866. doi: 10.1145/3340764.3345368. [29] F. Pianzola, S. Rebora, and G. Lauer. “Wattpad as a resource for literary studies. Quan- titative and qualitative examples of the importance of digital social reading and readers’ comments in the margins”. In: Plos One 15.1 (2020), e0226708. doi: 10.1371/journal. pone.0226708. [30] S. Poria, E. Cambria, R. Bajpai, and A. Hussain. “A review of affective computing: From unimodal analysis to multimodal fusion”. In: Information Fusion 37 (2017), pp. 98–125. [31] A. J. Reagan, L. Mitchell, D. Kiley, C. M. Danforth, and P. S. Dodds. “The emotional arcs of stories are dominated by six basic shapes”. In: EPJ Data Science 5.1 (2016), p. 31. doi: 10.1140/epjds/s13688-016-0093-1. [32] R. Remus, U. Quasthoff, and G. Heyer. “SentiWS-A Publicly Available German-language Resource for Sentiment Analysis.” In: Lrec. 2010. [33] H. Schmid. “Probabilistic part-ofispeech tagging using decision trees”. In: New methods in language processing. 2013, p. 154. [34] T. Schmidt. “Distant Reading Sentiments and Emotions in Historic German Plays”. In: Abstract Booklet, DH_Budapest_2019. Budapest, Hungary, 2019, pp. 57–60. doi: 10.5283/epub.43592. [35] T. Schmidt and M. Burghardt. “An Evaluation of Lexicon-based Sentiment Analysis Techniques for the Plays of Gotthold Ephraim Lessing”. In: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sci- ences, Humanities and Literature. Santa Fe, New Mexico: Association for Computational Linguistics, 2018, pp. 139–149. url: https://www.aclweb.org/anthology/W18-4516. [36] T. Schmidt, M. Burghardt, and K. Dennerlein. “Sentiment Annotation of Historic Ger- man Plays: An Empirical Study on Annotation Behavior”. In: Proceedings of the Work- shop on Annotation in Digital Humanities 2018 (annDH 2018), Sofia, Bulgaria, August 6-10, 2018. Ed. by S. Kübler and H. Zinsmeister. 2018, pp. 47–52. url: https://epub.uni- regensburg.de/43701/. [37] T. Schmidt, M. Burghardt, K. Dennerlein, and C. Wolff. “Katharsis–A Tool for Com- putational Drametrics”. In: Book of Abstracts, Digital Humanities Conference 2019 (DH 2019). Utrecht, Netherlands, 2019. url: https://epub.uni-regensburg.de/43579/. 402 [38] T. Schmidt, M. Burghardt, K. Dennerlein, and C. Wolff. “Sentiment Annotation for Less- ing’s Plays: Towards a Language Resource for Sentiment Analysis on German Literary Texts”. In: 2nd Conference on Language, Data and Knowledge (LDK 2019). Ed. by T. De- clerck and J. P. McCrae. 2019, pp. 45–50. url: https://epub.uni-regensburg.de/43569/. [39] T. Schmidt, M. Burghardt, and C. Wolff. “Toward Multimodal Sentiment Analysis of Historic Plays: A Case Study with Text and Audio for Lessing’s Emilia Galotti”. In: Proceedings of the Digital Humanities in the Nordic Countries 4th Conference. Ed. by C. Navarretta, M. Agirrezabal, and B. Maegaard. Vol. 2364. CEUR Workshop Proceedings. Copenhagen, Denmark: CEUR-WS.org, 2019, pp. 405–414. url: http://ceur-ws.org/Vol- 2364/37%5C%5Fpaper.pdf. [40] T. Schmidt, J. Dangel, and C. Wolff. “SentText: A Tool for Lexicon-based Sentiment Analysis in Digital Humanities”. In: Information Science and its Neighbors from Data Science to Digital Humanities. Proceedings of the 16th International Symposium of In- formation Science (ISI 2021). Ed. by T. Schmidt and C. Wolff. Vol. 74. Glückstadt: Werner Hülsbusch, 2021, pp. 156–172. doi: 10.5283/epub.44943. url: https://epub.uni- regensburg.de/44943/. [41] T. Schmidt, K. Dennerlein, and C. Wolff. “Towards a Corpus of Historical German Plays with Emotion Annotations”. In: 3rd Conference on Language, Data and Knowledge (LDK 2021). Ed. by D. Gromann, G. Sérasset, T. Declerck, J. P. McCrae, J. Gracia, J. Bosque- Gil, F. Bobillo, and B. Heinisch. Vol. 93. Open Access Series in Informatics (OASIcs). Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021, 9:1–9:11. doi: 10.4230/OASIcs.LDK.2021.9. [42] T. Schmidt, I. Engl, D. Halbhuber, and C. Wolff. “Comparing Live Sentiment Annotation of Movies via Arduino and a Slider with Textual Annotation of Subtitles.” In: DHN Post- Proceedings. 2020, pp. 212–223. [43] T. Schmidt and D. Halbhuber. “Live Sentiment Annotation of Movies via Arduino and a Slider”. In: Digital Humanities in the Nordic Countries 5th Conference 2020 (DHN 2020). Late Breaking Poster. 2020. [44] T. Schmidt, P. Hartl, D. Ramsauer, T. Fischer, A. Hilzenthaler, and C. Wolff. “Acquisi- tion and Analysis of a Meme Corpus to Investigate Web Culture.” In: Digital Humanities Conference 2020 (DH 2020). Ottawa, Canada, 2020. doi: 10.17613/mw0s-0805. [45] T. Schmidt, M. Jakob, and C. Wolff. “Annotator-Centered Design: Towards a Tool for Sentiment and Emotion Annotation”. In: INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft (Workshop-Beiträge). Ed. by C. Draude, M. Lange, and B. Sick. Bonn: Gesellschaft für Informatik e.V., 2019, pp. 77–85. doi: 10.18420/inf2019\_ws08. [46] T. Schmidt, F. Kaindl, and C. Wolff. “Distant Reading of Religious Online Communities: A Case Study for Three Religious Forums on Reddit.” In: Dhn. Riga, Latvia, 2020, pp. 157–172. [47] T. Schmidt, M. Schlindwein, K. Lichtner, and C. Wolff. “Investigating the Relationship Between Emotion Recognition Software and Usability Metrics”. In: i-com 19.2 (2020), pp. 139–151. doi: 10.1515/icom-2020-0009. 403 [48] T. Schmidt, B. Winterl, M. Maul, A. Schark, A. Vlad, and C. Wolff. “Inter-Rater Agree- ment and Usability: A Comparative Evaluation of Annotation Tools for Sentiment An- notation”. In: INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft (Workshop-Beiträge). Ed. by C. Draude, M. Lange, and B. Sick. Bonn: Gesellschaft für Informatik e.V., 2019, pp. 121–133. doi: 10.18420/inf2019\_ws12. [49] A. Schonlau. Emotionen im Dramentext: eine methodische Grundlegung mit exemplar- ischer Analyse zu Neid und Intrige 1750-1800. Deutsche Literatur Band 25. Berlin Boston: De Gruyter, 2017. [50] R. Sprugnoli, S. Tonelli, A. Marchetti, and G. Moretti. “Towards sentiment analysis for historical texts”. In: Digital Scholarship in the Humanities 31 (2015), pp. 762–772. doi: 10.1093/llc/fqv027. [51] D. Taylor. The archive and the repertoire. Duke University Press, 2003. [52] G. Vinodhini and R. Chandrasekaran. “Sentiment analysis and opinion mining: a survey”. In: International Journal 2.6 (2012), pp. 282–292. [53] S. Winko. Über Regeln emotionaler Bedeutung in und von literarischen Texten. De Gruyter, 2011. 404