=Paper= {{Paper |id=Vol-2865/poster1 |storemode=property |title=Comparing Live Sentiment Annotation of Movies via Arduino and a Slider with Textual Annotation of Subtitles |pdfUrl=https://ceur-ws.org/Vol-2865/poster1.pdf |volume=Vol-2865 |authors=Thomas Schmidt,Isabella Engl,David Halbhuber,Christian Wolff |dblpUrl=https://dblp.org/rec/conf/dhn/SchmidtEH020 }} ==Comparing Live Sentiment Annotation of Movies via Arduino and a Slider with Textual Annotation of Subtitles== https://ceur-ws.org/Vol-2865/poster1.pdf
Comparing Live Sentiment Annotation of Movies via Arduino and
        a Slider with Textual Annotation of Subtitles
        Thomas Schmidt, Isabella Engl, David Halbhuber and Christian Wolff

               Media Informatics Group, University of Regensburg, Germany
                          {firstname.lastname}@ur.de



       Abstract. Movies in Digital Humanities are often enriched with information by
       annotating the text e.g. via subtitles. However, we hypothesize that the missing
       presentation of the multimedia content is disadvantageous for certain annotation
       types like sentiment annotation. We claim that performing the annotation live
       during the viewing of the movie is beneficial for the annotation process. We pre-
       sent and evaluate the first version of a novel approach and prototype to perform
       live sentiment annotation of movies while watching them. The prototype consists
       of an Arduino microcontroller and a potentiometer which is paired with a slider.
       We perform an annotation study for five movies receiving sentiment annotations
       from three annotators each, once via live annotation and once via traditional sub-
       title annotation to compare the approaches. While the agreement among annota-
       tors increases slightly by using live sentiment annotation, the overall experience
       and subjective effort measured by quantitative post questionnaires improves sig-
       nificantly. The qualitative analysis of post annotation interviews validates these
       findings.

       Keywords: Sentiment Annotation, Sentiment Analysis, Movies, Film Studies,
       Arduino, Annotation.


1      Introduction

Annotation is an important task in Digital Humanities (DH) in order to enrich cultural
artefacts with additional information. While there are annotation tasks that can be car-
ried out automatically, human annotations are still necessary for many DH projects.
Various forms of syntactic (cf. [4]) or semantic annotations [3, 6, 41] exist for various
media types. In film studies various approaches towards annotation are used. Scholars
use annotation tools to annotate information like shot types and lengths [51], camera
angles, important movements or people [16] to add a more objective and in-depth un-
derstanding of movies accompanying the hermeneutical approach towards interpreta-
tion. Film archives employ methods of crowdsourcing to gather metadata information
about their movie inventory [37]. Annotations can also be used as data to train and
evaluate modern machine learning approaches which have become more and more im-
portant in DH in recent years [cf. 8, 15]. One research branch in DH explores the de-
velopment and evaluation of tools for these various annotation processes [7, 17, 31, 45,
48] and the influence of context, material and task on annotation quality (cf. [28, 32]).
The work presented here is in line with this strand of research. As our annotation use
case we investigate sentiment annotation in movies.




            Copyright © 2021 for this paper by its authors. Use permitted under
           Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                            213




    Sentiment analysis is the research area concerned with the computational analysis of
sentiments, predominantly in written text [25, p. 1]. Sentiment and emotion analysis
have been explored in the context of the DH. Researchers explore sentiment analysis in
various literary genres like plays [27, 30, 39, 38, 40, 52], novels [20, 23, 36], fairy tales
[1, 2] and fan fictions [21, 35] but also in the social media context [44, 46]. While the
focus of research is predominantly on text, esp. traditional text genres like novels and
plays, research for movies is rare: Öhman and Kajava [33] have developed Sentimen-
tator, an annotation tool specifically designed to annotate sentiment and emotion for
movie subtitles with gamification elements. They apply the tool to acquire emotion
annotated subtitles [19, 34]. Chu and Roy [9] explore multimodal sentiment analysis in
videos and focus on short web videos to identify emotional arcs. Schmidt et al. [43]
explore multimodal sentiment analysis on theatre recordings with mixed results.
    As with most classification tasks, well-curated corpora are an important resource to
develop modern machine learning algorithms. However, specifically for the research
area of narrative media and texts one can identify a lack of such corpora (cf. [22]).
Reasons for this might be that annotators perceive the task as challenging and tedious
[1, 41, 42, 49]. If the annotators have no expertise, they report problems with the lan-
guage and the missing context [1, 41, 42, 49]. Furthermore, narrative texts are generally
more prone to subjectivity since they can be interpreted in different ways. Therefore,
annotation agreements are typically rather low [1, 41, 42, 49], which is also a problem
for the successful creation of corpora. In the context of movies, sentiment or emotion
annotation projects are rare and mostly focused on the annotation of the textual content
of movies like the subtitles [19, 33]. Similar to literary studies one can identify an in-
terest in more sophisticated concepts besides sentiment like differentiated emotion cat-
egories and scales [32]. We will focus on sentiment for our study. While the sentiment
concept cannot fully represent the complex emotional expressions in movies, we regard
it as simpler and therefore more fitting for this first pilot study. We present a live sen-
timent annotation solution enabling the annotation of movies while watching them. We
argue that this approach is beneficial to more traditional approaches like the annotation
of subtitles when dealing with movies.
    First, movies are multimedia artefacts and the lack of the presentation of the video
channel leads to information loss. Many emotions are expressed via the face and the
voice of the actor (or via additional aspects like music, colors, and camera perspectives)
and not just the text. Therefore, viewers might be able to annotate sentiment and emo-
tions easier and more consistent when experiencing the entire movie. Additionally, a
lot of context that might be important to understand the feelings of the characters might
be expressed via other channels than the text. Furthermore, emotions are also often
expressed without saying anything in a movie. Textual annotation only allows the an-
notation of parts in which characters talk, everything else is neglected. While there are
video annotation tools that offer the video and audio channel to be used for movie an-
notation, they often need training before usage and rather support asynchronous work
needing to constantly pause and adjust the time and frame of the movie for the annota-
tion [10, 26]. We assume that live annotation during the viewing of the movie facilitates
the annotation process because the viewer/annotator can directly and immediately as-
sign their annotations based on what they are experiencing. Furthermore, the usage of
                                             214




a continuous slider as in our setting might also resemble the rather vague concept of
sentiment much more than nominal class assignments [5, 39] or ordinal ratings [29, 50].
Following Nobel laureate Daniel Kahneman’s line of thought, annotating in the actual
movie watching situation might come closer to the emotional reality than more reflec-
tive post-hoc annotation [18]. Please note that while we focus on sentiment annotation
for our study, the system can be similarly used for every sort of emotion or other scale
for which one desires live video annotations.


2      Live Sentiment Annotation Approach

2.1    Technical Setup

The annotation system consists of an Arduino microcontroller connected to a linear
potentiometer, which is paired with a slider. The Arduino itself is connected to a com-
puter running a Python script. The script represents the core of the system, it is respon-
sible for reading the current value of the slider, logging it and presenting it to the user
in a small GUI while watching the movie on a TV. The slider depicts continuously
changing resistance levels between 0 and 1023; these values may be translated pro-
grammatically to other scales. The python script, running in the background (e.g. on a
laptop connected to the TV), records these values simultaneously and it shows the user
the current slider position and thus the currently selected value in a simple GUI. Figure
1 depicts the user view for an exemplary application in a movie annotation.




Fig. 1. Example scene from a TV show (left). Python script displaying the currently chosen
value on the Arduino slider. The GUI also depicts a rudimentary scale (right).


2.2    Annotation Process

For the annotation process, the annotator/viewer will be presented with the movie and
the interface. Additionally, the annotators are equipped with the cased Arduino slider.
Figure 2 shows an early prototype of the slider’s encasing.
                                               215




Fig. 2. Prototype of the slider casing. The shell also features a rudimentary scale allowing the
user to navigate without the GUI. Please note that the slider is continuous and not nominal.

The slider will be operated by the user while watching a movie or TV show and is
placed at the side of his chair so the viewer can adjust the scale intuitively with his
hands, while watching the movie. The slider is portable and can be placed as the viewer
wishes. During the study, the value of the Arduino slider, time-stamped, is read and
logged by the Python script every 100ms. By saving the timestamp, the slider value can
be exactly assigned to a certain time in a film or TV program in a subsequent data
analysis. The movie shown as well as the slider are synchronized via a Python script
connecting the slider and VLC-player, which is the media player we use to present the
movie.
   To start the annotation, we simply connect a laptop with a TV and start the script
and a movie via VLC-player. The annotators can also stop and continue the movie as
they wish without deranging the synchronization. The final output is currently a simple
table with the value of the slider for every 100ms.


3       Annotation Study

To validate and compare the live sentiment annotation approach we conducted an an-
notation study comparing live annotation with textual annotation. Five different movies
were annotated by three different annotators separately from each other for each method
leading to 30 annotations, 15 for each method. We then compare the performance in
sense of time needed and annotation metrics as well as the subjective experience of the
annotators measured via a questionnaire and an interview. The entire study was per-
formed in German and only German material was used since all annotators’ first lan-
guage is German.


3.1     Sample

Ten annotators (7 female, 3 male) participated in the study and we split the annotation
in a way that every annotator annotated at most three different movies and at least one
textual and one live annotation so every annotator experienced the difference. The order
of the movies and the annotation types was counterbalanced to compensate for learning
effects and no annotator annotated the same movie twice. The age of the annotators
ranged from 25 to 31 (M= 26.5). All annotators were students in Digital Humanities or
                                            216




similar education programs. Participation was voluntary. We interviewed for
knowledge of the movies and divided the annotation in a way that every annotator had
either no knowledge of the content of the movie or very minor knowledge.


3.2    Material

We selected five different movies from varying genres and epochs to avoid the possi-
bility of specific annotation problems due to these factors: Rear Window (1954;
thriller), Christmas Vacation (1989; comedy), Scream (1996, horror), Avengers (2012,
action), The Fault in Our Stars (2014; Drama). We decided to use commercial Holly-
wood movies since they are subject of our own research. We acquired the DVDs of
these movies via our institutional library. We used the German subtitles and trans-
formed them to a simple list of subtitles in a table. Please note that what characters are
actually saying and what is displayed by subtitles is not always exactly the same.


3.3    Textual Sentiment Annotation

All annotators that performed textual sentiment annotation received a table with a sub-
title per line as well as a summary of the movie and an annotation instruction. In a first
meeting the annotators were introduced in the annotation process 1. Annotators had to
mark the sentiment expressed by the character that is saying the subtitle on a scale from
-5 (very negative) to 0 (neutral) to 5 (positive). We chose this differentiated scale since
it resembles the live sentiment annotation more than a nominal annotation. Annotators
received this table as xls-file and had one week to complete the annotation but were
recommended to perform the annotation in one go. Further, they were instructed to not
watch the movies.


3.4    Live Sentiment Annotation
The live sentiment annotation was performed in a media lab at our university. Annota-
tors were sitting on a couch in front of a TV using the annotation slider while watching
the movie. They were instructed in the process and the functionality and went through
a short trial phase for a short movie. Annotators were instructed to mark the expressed
sentiment of the characters seen on the screen via the slider on a scale from 0% (very
negative) to 100% (very positive). A test coordinator was present but stayed in the
background for the entire viewing. It was possible to take breaks by contacting the test
coordinator.


3.5    Post Annotation Questionnaire and Interview

All annotators had to fill out an online questionnaire after each annotation. Next to
demographic information we asked for the time needed for the annotation, how difficult

1 The entire study was performed previous to the COVID pandemic, therefore many steps of the

   study included in-person meetings.
                                           217




the annotation was perceived on a scale from 1 (not difficult at all) to 7 (very difficult)
and how certain one was about the annotation on a scale from 1 (very unsure) to 7 (very
sure). We further used the NASA Task Load Index [13] (NASA-TLX) to get a value
for the perceived cognitive and physical effort. This is an established questionnaire in
psychology [14] consisting of 6 questions about the perceived effort resulting to a scale
on 6 (very low effort) to 60 (very high effort). We added open-ended questions into the
questionnaire in which annotators could give feedback on problems, challenges and the
overall perception of the annotation process. Lastly, we performed a short semi-struc-
tured interview with the annotators asking about the perceived difficulties and prob-
lems.


4      Results

4.1    Time

We inquired about the exact time needed for annotation without breaks. The average
time needed for the textual annotation is 123 minutes, so around 2 hours, while the
average for the live annotation is 109 minutes (which is basically the average of the
length of all films). However, this difference is not significant as shown with a Mann-
Whitney test for independent samples and a significance level of p=0.05 (U=-1.12,
p=.235). We also asked for the time needed with breaks in the questionnaire showing
that textual annotators took most of the time multiple breaks while the maximum break
a live annotator took was one short break.


4.2    Annotation Metrics

Agreement among annotators is an important factor in annotation. High agreement is
beneficial for later machine learning approaches and also validates the theoretical idea
of the annotation. We investigated if the agreement and therefore the overall under-
standing of the sentiment annotation changes by annotation modality by looking (1) at
Fleiss’ Kappa [11], an established agreement metric for annotations by more than two
annotators and (2) the percentage of agreements among annotators.
    We transform all annotations into the three classes negative, neutral and positive
which is common in sentiment annotation. For the textual annotation we regard -5 to -
1 as negative, 0 as neutral and 1-5 as positive and we regard every subtitle as one data
point. To keep the data points comparable between the live annotation and the textual
annotation we made use of the following heuristic: We regarded the exact time frames
in which a subtitle was spoken as data point for the analysis. For this time frame we
calculated the average of all annotations we received (since we measure the live anno-
tation in 100ms intervals). We then regard an average of 0-40% as negative 41-59% as
neutral and 60-100% as positive. Annotators reported that it is difficult to mark exactly
50% for neutral so we increased the neutral area. Please note however that by this heu-
ristic we do neglect any annotations that are done outside of the time frames of subtitles.
                                                  218




This step is necessary since the agreement statistic reacts sensitive on varying numbers
of data points. Table 1 shows the agreement metrics:

                           Table 1. Agreement metrics per movie and overall.
                               Fleiss’ Kappa       Fleiss’ Kappa      Percentage         Percentage
            Movie
                                   (Text)              (Live)           (Text)             (Live)
  The Fault in Our Stars            0.26                0.42             52.15             61.44
      Christmas Vacation            0.41                0.35              63.66            51.95
           Scream                   0.34                0.40              60.83            64.34
        The Avengers                0.14                0.10              45.35            48.56
        Rear Window                 0.06                0.33              44.11            57.91
           Average                  0.29                0.32              53.22            56.84


The results show that the agreement is slight (0.0-0.2) to fair (0.21-0.40) according to
[24], which is rather low but very common in sentiment annotation of narrative and
artistic art due to the subjective nature of the task [1, 41, 42, 49]. While there are strong
differences for some movies, the averages are just slightly increased for the live senti-
ment annotation.


4.3       Post Annotation Questionnaire

Table 2 illustrates the results for the perceived difficulty and security (scale from 1-7)
and the perceived effort operationalized via the NASA-TLX (6-60).

                            Table 2. Post Annotation Questionnaire results.
 Annotation Type             Perceived Difficulty    Perceived Certainty          Perceived Effort
                                                                                  (NASA-TLX)
 Textual                     Avg           Std          Avg        Std            Avg        Std
                             5.1           1.55         2.6        1.29           35.7       7.41
 Live                        Avg           Std          Avg        Std            Avg        Std
                             2.53          1.3          4.2        1.57           31.6       6.86

A Mann-Whiney test of significance for independent samples for all three variables
shows that annotators perceived the textual annotation as significantly more difficult
(U=-3.6, p<.001) and are less certain when annotating textual (U=-2.6, p=.008). While
the NASA-TLX shows to be of average perceived effort for both types a Mann-Whit-
ney-U Test also shows that the difference is significant (U=-2.8, p=.004).


4.4       Post Annotation Interview

The qualitative analysis of the open-ended questions and the interviews led to multiple
insights. Participants validated the low agreement metrics by describing the task as
challenging and open to interpretation no matter the annotation type. For the textual
annotation, participants explicitly criticized the lack of the video channel and the re-
sulting missing context. Problems for the live annotation included how to interpret dif-
fering sentiments from different characters on the screen and how to react on fast
                                           219




changes. The textual annotation was described as “very boring” and “exhausting”.
While the feedback for the live annotation was not that negative, participants did note
that the annotation needs “a lot of concentration throughout” the view. When instructed
to compare both approaches all annotators preferred the live sentiment annotation.


5      Discussion

The results of our annotation study are mixed concerning the advantages of live senti-
ment annotation. Regarding the agreement statistics we did not identify a remarkable
difference. The agreement of annotators remains very small, showing again the diffi-
culty and inherent subjectivity of sentiment annotation [1, 41, 42, 49]. While problems
of textual sentiment annotation are solved (e.g. the missing visual context) new prob-
lems arise like how to deal with fast changes of scenery and how to react to multiple
characters with different emotions. One limitation might also be that we adjusted the
agreement analysis of the live annotation to the subtitles in the sense that we took the
presentation time of the subtitles as units for the analysis neglecting passages without
subtitles. Furthermore, these time units are sometimes quite short which might cause
problems for the live annotation.
    Nevertheless, Kajava et al. [19] were able to achieve rather high annotator agree-
ments in an emotion annotation tasks on subtitles via gamification and the deliberate
removal of context showing that the solutions of the agreement problems might point
towards gamification and simplification (at least for text). It is also worth noting that
the movies annotated are quite long (around 2 hours); thus, requiring a lot of concen-
tration and are more prone to errors than shorter films might be. We plan to examine
other annotation types and shorter material to see if we can find differences in the re-
sults.
    Nevertheless, we could find significant differences in the perceived difficulty, cer-
tainty and effort for the annotation task. The live annotation was perceived as more
enjoyable than the textual annotation. Feedback of qualitative analysis validated this
finding. While both annotation types are not experienced as fun, the live sentiment an-
notation was preferred by all annotators due to the limited time necessary and the less
exhausting experience. However, the task was still experienced as “work” necessary of
constant concentration. We still feel encouraged to continue our work to investigate the
possibilities of live annotation since the annotation process was overall perceived pos-
itive. Agreement statistics are strongly dependent on the validity of the theoretical con-
cept to annotate, the training of the annotators and the clarity of the annotation instruc-
tion. Thus, we want to investigate how long-term studies can show an improvement
concerning agreements. Please also note that our study is rather small scale in means of
number of movies and annotators. Due to legal reasons the possibilities of scaling the
study up by performing a similar annotation process online are limited, thus we will
focus on public domain movies for our next investigations. Another question is if there
are still possibilities to reduce the workload of annotators even more and perform the
“annotation” fully intuitive by using physiological metrics of the movie viewer (e.g. via
skin sensors or facial recognition). For example, in other settings researcher use facial
                                              220




and voice emotion recognition to predict metrics [12, 47]. Using physiological metrics
would be a way to bypass problems concerning interpretation biases.
   In summary, we come to the conclusion that the selection of the most beneficial
annotation type is dependent of the research goal. If one is solely interested in the anal-
ysis of the spoken word and context-free sentences ([19]), the inclusion of the video
channel might not be helpful and can even be disturbing. However, in our project, we
are explicitly interested in the sentiment expressed by characters which is not always
easy to identify solely based on the text. We also want to highlight that textual analysis
neglects any scenes that do not include spoken words, which can consist of very long
time spans. Furthermore, the application of computational methods influences the de-
cision for an annotation approach as well. Textual annotation might certainly be suffi-
cient for solely textual machine learning approaches but the exploration of multimodal
approaches is dependent of multimodal annotations to keep the concept concise. While
multimodal annotation tools exist, we argue that our live annotation approach delivers
benefits for the experience of annotators and we thus plan to continue our research.


References
    1.   Alm, C.O., Roth, D., Sproat, R.: Emotions from text: machine learning for textbased
         emotion prediction. In: Proceedings of the conference on human language technology
         and empirical methods in natural language processing, pp. 579–586. Association for
         Computational Linguistics (2005).
    2.   Alm, C.O., Sproat, R.: Emotional sequencing and development in fairy tales. In: Inter-
         national Conference on Affective Computing and Intelligent Interaction, pp. 668–674.
         Springer (2005).
    3.   Benikova, D., Biemann, C., & Reznicek, M: NoSta-D Named Entity Annotation for
         German: Guidelines and Dataset. In: LREC, pp. 2524-2531. (2014).
    4.   Bird, F. & Liberman, M.: A formal framework for linguistic annotation. Speech com-
         munication, 33(1-2), 23–60 (2001).
    5.   Bosco, C., Allisio, L., Mussa, V., Patti, V., Ruffo, G.F., Sanguinetti, M., Sulis, E.: De-
         tecting happiness in italian tweets: Towards an evaluation dataset for sentiment analy-
         sis in felicitta. In: 5th International Workshop on EMOTION, SOCIAL SIGNALS,
         SENTIMENT & LINKED OPEN DATA, ES3LOD 2014, pp. 56–63. European Lan-
         guage Resources Association (2014).
    6.   Bornstein, A., Cattan, A., & Dagan, I.: CoRefi: A Crowd Sourcing Suite for Corefer-
         ence Annotation. arXiv preprint arXiv:2010.02588. (2020).
    7.   Burghardt, M.: Usability recommendations for annotation tools. In: Proceedings of the
         Sixth Linguistic Annotation Workshop, pp. 104-112. Association for Computational
         Linguistics (2012).
    8.   Burghardt, M., Heftberger, A., Pause, J., Walkowski, N. O., & Zeppelzauer, M.: Film
         and Video Analysis in the Digital Humanities–An Interdisciplinary Dialog. Digital
         Humanities Quarterly, 14(4), (2020).
    9.   Chu, E., Roy, D.: Audio-visual sentiment analysis for learning emotional arcs in mov-
         ies. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 829– 834.
         IEEE (2017).
                                          221




10. Dutta, A., Zisserman, A.: The via annotation software for images, audio and video. In:
    Proceedings of the 27th ACM International Conference on Multimedia, pp. 2276–2279
    (2019).
11. Fleiss, J. L., Levin, B., & Paik, M. C: Statistical methods for rates and proportions.
    john wiley & sons (2013).
12. Halbhuber, D., Fehle, J., Kalus, A., Seitz, K., Kocur, M., Schmidt, T. & Wolff, C.: The
    Mood Game - How to use the player’s affective state in a shoot’em up avoiding frus-
    tration and boredom. In: Alt, F., Bulling, A. & Döring, T. (Hrsg.), Mensch und Com-
    puter 2019 - Tagungsband. New York, ACM (2019).
13. Hart, S. G., & Staveland, L. E.: Development of NASA-TLX (Task Load Index): Re-
    sults of empirical and theoretical research. In: Advances in psychology, Vol. 52, pp.
    139-183. North-Holland (1988).
14. Hart, S. G.: NASA-task load index (NASA-TLX); 20 years later. In: Proceedings of
    the human factors and ergonomics society annual meeting, Vol. 50, No. 9, pp. 904-
    908. Sage CA, Los Angeles, CA, Sage publications (2006).
15. Heftberger, A. (2018). Digital Humanities and Film Studies: Visualising Dziga Ver-
    tov’s Work. Springer International Publishing (2018).
16. Hielscher, E.: The Phenomenon of Interwar City Symphonies: A Combined Method-
    ology of Digital Tools and Traditional Film Analysis Methods to Study Visual Motifs
    and Structural Patterns of Experimental-Documentary City Films. DHQ: Digital Hu-
    manities Quarterly, 14(4), (2020).
17. Hoff, K., & Preminger, M.: Usability testing of an annotation tool in a cultural heritage
    context. In Research Conference on Metadata and Semantics Research, pp. 237-248.
    Springer, Cham (2015).
18. Kahneman, D.: Thinking, fast and slow. Macmillan (2011).
19. Kajava, K., Öhman, E., Piao, H., & Tiedemann, J.: Emotion Preservation in Transla-
    tion: Evaluating Datasets for Annotation Projection. In: DHN, pp. 38-50, (2020).
20. Kakkonen, T., Kakkonen, G.G.: Sentiprofiler: Creating comparable visual profiles of
    sentimental content in texts. In: Proceedings of the Workshop on Language Technolo-
    gies for Digital Humanities and Cultural Heritage, pp. 62–69, (2011)
21. Kim, E., & Klinger, R.: An Analysis of Emotion Communication Channels in Fan Fic-
    tion: Towards Emotional Storytelling. arXiv preprint arXiv:1906.02402. (2019).
22. Kim, E., Klinger, R.: A survey on sentiment and emotion analysis for computational
    literary studies. arXiv preprint arXiv:1808.03137. (2018).
23. Kim, E., Padó, S., Klinger, R.: Prototypical emotion developments in literary genres.
    In: Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for
    Cultural Heritage, Social Sciences, Humanities and Literature, pp. 17–26, (2017).
24. Landis, J. R., & Koch, G. G.: The measurement of observer agreement for categorical
    data. biometrics, 159-174 (1977).
25. Liu, B.: Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge
    University Press (2016).
26. Martin, J.C., Kipp, M.: Annotating and measuring multimodal behaviour-tycoon met-
    rics in the anvil tool. In: LREC. Citeseer (2002).
27. Mohammad, S.: From once upon a time to happily ever after: Tracking emotions in
    novels and fairy tales. In: Proceedings of the 5th ACL-HLT Workshop on Language
    Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 105–114. As-
    sociation for Computational Linguistics (2011)
28. Mohammad, S. M.: Challenges in sentiment analysis. In: A practical guide to sentiment
    analysis, pp. 61-83. Springer, Cham (2017).
                                          222




29. Momtazi, S.: Fine-grained german sentiment analysis on social media. In: LREC, pp.
    1215–1220. Citeseer (2012)
30. Nalisnick, E.T., Baird, H.S.: Character-to-character sentiment analysis in shakespeare's
    plays. In: Proceedings of the 51st Annual Meeting of the Association for Computa-
    tional Linguistics, Volume 2: Short Papers, pp. 479–483. (2013)
31. Neves, M., & Ševa, J.: An extensive review of tools for manual annotation of docu-
    ments. Briefings in Bioinformatics. (2019)
32. Öhman, E.: Challenges in Annotation: Annotator Experiences from a Crowdsourced
    Emotion Annotation Task. In: DHN, pp. 293-301. (2020).
33. Öhman, E., Kajava, K.: Sentimentator: Gamifying fine-grained sentiment annotation.
    In: DHN, pp. 98–110 (2018).
34. Öhman, E., Kajava, K., Tiedemann, J., Honkela, T.: Creating a dataset for multilingual
    fine-grained emotion-detection using gamification-based annotation. In: Proceedings
    of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and So-
    cial Media Analysis, pp. 24–30. (2018).
35. Pianzola, F., Rebora, S., & Lauer, G.: Wattpad as a resource for literary studies. Quan-
    titative and qualitative examples of the importance of digital social reading and readers’
    comments in the margins. PloS one, 15(1), e0226708. (2020).
36. Reagan, A.J., Mitchell, L., Kiley, D., Danforth, C.M., Dodds, P.S.: The emotional arcs
    of stories are dominated by six basic shapes. EPJ Data Science, 5(1). (2016).
37. Salmi, H., Laine, K., Römpötti, T., Kallioniemi, N., & Karvo, E.: Crowdsourcing
    Metadata for Audiovisual Cultural Heritage: Finnish Full-Length Films, 1946-1985.
    In: DHN, pp. 325-332. (2020).
38. Schmidt, T.: Distant Reading Sentiments and Emotions in Historic German Plays. In:
    Abstract Booklet, DH_Budapest_2019, pp. 57-60. Budapest, Hungary (2019).
39. Schmidt, T., Burghardt, M.: An evaluation of lexicon-based sentiment analysis tech-
    niques for the plays of Gotthold Ephraim Lessing. In: Proceedings of the Second Joint
    SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sci-
    ences, Humanities and Literature, pp. 139–149. Association for Computational Lin-
    guistics (2018).
40. Schmidt, T. & Burghardt, M.: Toward a Tool for Sentiment Analysis for German His-
    toric Plays. In: Piotrowski, M. (ed.), COMHUM 2018: Book of Abstracts for the Work-
    shop on Computational Methods in the Humanities 2018, pp. 46-48. Lausanne, Swit-
    zerland: Laboratoire laussannois d'informatique et statistique textuelle (2018).
41. Schmidt, T., Burghardt, M., Dennerlein, K.: Sentiment annotation of historic german
    plays: An empirical study on annotation behavior. In: Kübler, S., Zinsmeister, H. (eds.)
    Proceedings of the Workshop for Annotation in Digital Humantities (annDH), pp. 47–
    52. Sofia, Bulgaria (2018).
42. Schmidt, T., Burghardt, M., Dennerlein, K. & Wolff, C.: Sentiment Annotation in Les-
    sing’s Plays: Towards a Language Resource for Sentiment Analysis on German Liter-
    ary Texts. In: 2nd Conference on Language, Data and Knowledge (LDK 2019). LDK
    Posters. Leipzig, Germany (2019).
43. Schmidt, T., Burghardt, M. & Wolff, C.: Towards Multimodal Sentiment Analysis of
    Historic Plays: A Case Study with Text and Audio for Lessing’s Emilia Galotti. In:
    Proceedings of the DHN (DH in the Nordic Countries) Conference, pp. 405-414. Co-
    penhagen, Denmark (2019).
44. Schmidt, T., Hartl, P., Ramsauer, D., Fischer, T., Hilzenthaler, A. & Wolff, C.: Acqui-
    sition and Analysis of a Meme Corpus to Investigate Web Culture. In: Digital Huma-
    nities Conference 2020 (DH 2020). Virtual Conference (2020).
                                         223




45. Schmidt, T., Jakob, M. & Wolff, C.: Annotator-Centered Design: Towards a Tool for
    Sentiment and Emotion Annotation. In: Draude, C., Lange, M. & Sick, B. (Eds.),
    INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesell-
    schaft (Workshop-Beiträge), pp. 77-85. Bonn: Gesellschaft für Informatik e.V. (2019).
46. Schmidt, T., Kaindl, F. & Wolff, C.: Distant Reading of Religious Online Communi-
    ties: A Case Study for Three Religious Forums on Reddit. In: Proceedings of the Dig-
    ital Humanities in the Nordic Countries 5th Conference (DHN 2020). Riga, Latvia
    (2020).
47. Schmidt, T., Schlindwein, M., Lichtner, K., & Wolff, C.: Investigating the Relationship
    Between Emotion Recognition Software and Usability Metrics. i-com, 19(2), 139-151
    (2020).
48. Schmidt, T., Winterl, B., Maul, M., Schark, A., Vlad, A. & Wolff, C.: Inter-Rater
    Agreement and Usability: A Comparative Evaluation of Annotation Tools for Senti-
    ment Annotation. In: Draude, C., Lange, M. & Sick, B. (Eds.), INFORMATIK 2019:
    50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft (Workshop-Bei-
    träge), pp. 121-133. Bonn: Gesellschaft für Informatik e.V. (2019).
49. Sprugnoli, R., Tonelli, S., Marchetti, A., Moretti, G.: Towards sentiment analysis for
    historical texts. Digital Scholarship in the Humanities 31(4), 762–772 (2016)
50. Takala, P., Malo, P., Sinha, A., Ahlgren, O.: Gold-standard for topic-specific sentiment
    analysis of economic texts. In: LREC. vol. 2014, pp. 2152–2157. (2014)
51. Tsivian, Y.: Cinemetrics, part of the humanities’ cyberinfrastructure. (2009). Retrieved
    from https://www.degruyter.com/document/doi/10.14361/9783839410233-007/html
52. Yavuz, M. C.: Analyses of Character Emotions in Dramatic Works by Using EmoLex
    Unigrams. In: Proceedings of the Seventh Italian Conference on Computational Lin-
    guistics , CLiC-it'20. (2020).