HarryMotions – Classifying Relationships in Harry Potter based on Emotion Analysis Albin Zehe Julia Arns Lena Hettinger Andreas Hotho Data Science Chair — University of Würzburg [zehe,arns,hettinger,hotho]@informatik.uni-wuerzburg.de Abstract tweets, product reviews or news articles, the task still poses a significant challenge on other domains. Sentiment Analysis has long been a topic Fictional literary texts in particular are hard to anal- of interest in natural language processing yse, since they usually do not express emotions and computational literary studies, where explicitly, but they have to be inferred from context it can be used to infer the relationships and possibly world knowledge. between fictional characters. Building on Recently, the trend in NLP has been to use large the dataset and results of Kim and Klinger transformer models that have been pre-trained for (2019), we propose a classifier based on language modelling (or similar tasks not requiring BERT that improves the results reported explicit annotations) on enormous datasets. We fol- therein and show that we can use this clas- low this trend by fine-tuning BERT (Devlin et al., sifier to determine the relation between 2019) to the task of classifying emotions in interac- characters in Harry Potter novels. Our tions between characters. We use BookNLP (Bam- proposed sentiment classifier yields an F1- man et al., 2014) to extract entity mentions and score of up to 75 % for binary classifica- co-references and then fine-tune BERT on the emo- tion of emotions. Aggregating these emo- tion dataset provided by Kim and Klinger (2019). tions over novels, we reach an F1-score of Emotions are aggregated to detect overall relations up to 68 % for the classification of a pair between characters and their development over a of characters as friendly or unfriendly. novel, as exemplified in Figure 1 (cf. Section 4). 1 Introduction Our contribution is two-fold: 1. We generally improve results on the emotion classification tasks Characters and their relations are one of the basic from Kim and Klinger (2019). 2. We track the building blocks of stories (Hettinger et al., 2015). emotional relations detected by our classifier over Detecting them automatically is therefore a highly the course of a novel and describe an easy method interesting task for the analysis of fictional texts. to aggregate them to an overall label. We evaluate While there exists a multitude of methods for the this method on the text of the well-known Harry extraction of character networks (Labatut and Bost, Potter series (Rowling, 1997). 2019), these often provide networks with unla- The remainder of this paper is structured as fol- belled edges, that is, no information about the kind lows: After giving a short introduction, we next of relationship the characters share. Following Kim present related work. In chapter 3 we describe our and Klinger (2019), we work towards the goal of approaches towards emotion and relation classifica- detecting the polarity of relations using sentiment tion as well as our results. We conclude this paper analysis. To this end, we collect all chunks of text with a discussion of results and some possible di- in a novel mentioning a pair of characters and per- rections for future work form sentiment analysis on these pieces of text. While methods for sentiment analysis perform very 2 Related Work well for certain domains, mostly short texts like Our work is situated at the intersection of sentiment Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 Interna- analysis and social network extraction. tional (CC BY 4.0) Character networks for works of fiction have Figure 1: Trajectory of emotions for different character pairs in Harry Potter as detected by our system. The points where Harry and Ron/Hermione become friends are clearly visible. Details are discussed in Section 4. The x axis corresponds to chapters in the books, with book 3 having more chapters than book 1 and thus a longer trajectory. been studied extensively in recent years (Labatut and 46 % for 5 basic emotions in the story-level and Bost, 2019). Some work has been done evaluation as described below. We extend this work on extracting networks from textual summaries by improving the sentiment analysis model and ag- (Chaturvedi et al., 2016; Srivastava et al., 2016) gregate the instance-level labels for full novels. and training large neural networks to specifically model relationships over time (Iyyer et al., 2016). 3 Classifying Emotional Relations While Harry Potter novels have been explored be- We address two tasks in this paper: mention-level fore (Vilares and Gómez-Rodrıguez, 2019; Everton emotion classification and story-level relation clas- et al., 2019), research has not yet concentrated on sification, which we see as two steps in a pipeline. emotional relations between characters. For sentiment analysis, most work has focused Emotion Classification Following Kim and on short, self-contained texts like tweets (Islam Klinger (2019), we define emotion classification as et al., 2019; Rosenthal et al., 2017) or reviews learning a classifier that, given a short piece of text (Maas et al., 2011; Xue et al., 2020; Socher et al., (roughly one sentence) containing two characters, 2013). Sentiment analysis in fictional texts has be- predicts the emotion described therein. We perform come a topic of interest, but has so far proven diffi- this task on different granularity levels, using either cult because of the lack of suitable datasets. Kim 2, 5 or 8 directed or undirected emotions. and Klinger (2018) provide an extensive overview Relation Classification We define relation clas- of papers addressing the issue of sentiment analysis sification as an aggregation of emotions discovered in fictional texts, also addressing papers that use by step 1 over a novel. In this paper, we distinguish emotions in the context of social network extrac- between “friendly” and “unfriendly” relations. tion. However, most of these works employ rather 3.1 Method simple sentiment analysis methods (e.g., Zehe et al. Emotion Classification We use a pretrained (2016) rely on a simple lookup in a sentiment lexi- BERT-model (Devlin et al., 2019), which we fine- con). Most similar to our work is Kim and Klinger tune to our task using the fast-bert library1 , mostly (2019), which we directly build upon. The authors keeping the default parameters. We train for 6 (2-, propose a new corpus of short pieces of text an- 5-class) or 12 (8-class) epochs with batch size 1. notated with the emotional relations between char- acters described in these texts. They train a GRU Relation Classification We extract all interac- (Cho et al., 2014) neural network to predict the tions from a novel mentioning a pair of characters emotions based on this corpus, showing promising 1 https://github.com/kaushaltrivedi/ results with F1 scores up to 67 % for undirected bi- fast-bert, based on https://github.com/ nary classification (positive and negative emotions) huggingface/transformers a, b, classify the emotions described therein and ag- Novel #friendly #unfriendly #disagree gregate them to an overall label. We use BookNLP HP1 64 30 2/0 (Bamman et al., 2014) to perform co-reference res- HP2 61 29 3/0 olution and extract all interactions where both a HP3 62 26 7/4 and b each appear at least 20 times in the novel. HP4 233 36 22/0 We define an interaction as a chunk of text where a HP5 144 57 19/0 and b appear with no more than 10 tokens between HP6 107 38 18/0 them, regardless of sentence boundaries, with 10 HP7 115 44 27/0 additional tokens on both sides as context. We se- lect only pairs where at least 5 interactions occur in Table 1: Character relations in Harry Potter. Middle the novel and classify the emotions in each of these columns show friendly and unfriendly relations, respec- interactions using our BERT-based classifier. For tively. Last column shows relations where a tie-breaker the aggregation of emotions to an overall relation, was used/no agreement could be reached. we count the number of positive, negative, neutral and overall emotions (Xa,b ) between a and b and calculate their difference, classifying relations as Section 4). Table 1 provides details for the resulting dataset, which we publish for future research.3 ( pos a,b friendly if α < alla,b rel(a, b) = pos 3.3 Evaluation a,b unfriendly if α ≥ alla,b . Emotion Classification We follow the evalua- tion setup from Kim and Klinger (2019) for emo- The amount α of positive emotions required for a tion classification, who use multiple settings: The friendly relationship is a hyper-parameter. dataset (cf. section 3.2) provides annotations for 3.2 Datasets sets of two, five and eight directed or undirected emotions. Additionally, they define different ways Emotion Classification For the first task, we use of representing the entities involved in the emo- the dataset provided by Kim and Klinger (2019) tions, where some add a marker to entities or com- and refer to this paper for a detailed description due pletely mask them (making it impossible for the to space constraints. The dataset consists of 1335 model to learn that, e.g., Harry always interacts samples2 , each annotated according to multiple positively with Ron). We describe these schemes schemes. These schemes differ in the number of shortly in the following and give an example for emotions that are annotated (two, five or eight) and how sentences would be represented according to whether the emotions are directed (from a causing each scheme: to an experiencing character) or undirected. Relation Classification For the second task, we • No-indicator: Entities are represented have collected our own dataset. To this end, we as in the text, the model is directly used BookNLP on all books from the Harry Potter fed the unmodified sentence (e.g., series to extract all interactions as described in Sec- Alice is angry with Bob). tion 3.1. In contrast to the first dataset, we use au- • Role: Entities are marked as causing or experi- tomatically extracted characters and co-references encing (e.g., Alice is angry here. We then manually annotated all pairs of char- with Bob), where marks acters for which we found interactions with their the experiencing character and the caus- relationship, distinguishing between friendly and ing character. unfriendly relationships. We collected two sets of independent annotations and, where the two anno- • MRole: Entities are only identified by their tators disagreed, collected a third annotation as a role ( is angry with ), tie-breaker. The tie-breaker was given the option and as above. to note that there is no (clear) relation between the two characters. This was the case in the third novel • Entity: Entities are marked as entities with no for the relation between Harry and Sirius Black (cf. indication as to whether they cause or experi- 2 3 1742 overall, but following Kim and Klinger (2019) we http://professor-x.de/datasets/ use only the subset annotated with a causing character harrymotions. ence the emotion (e.g., Alice fewer samples. b) BERT is a bi-directional model, is angry with Bob). while the GRU used here is uni-directional. Since the GRU reads sentences in the right order, while • MEntity: Entities are masked by entity- BERT reads in both directions, it might be easier markers (e.g., is angry with for the GRU to model directed relations. ). Dataset Collection As mentioned in Section 3.2, Table 2 shows our results in comparison to those we excluded some relations during the annotation from Kim and Klinger (2019), reporting what they process. This is due to two reasons: a) errors in define as story-level F1 score. Our classifier out- named entity recognition and b) changing relation- performs theirs in most settings, as discussed in ships. For the first category, BookNLP returned Section 4. the entity “Felix Felicis”, which is a luck potion. Relation Classification In our second experi- We excluded all relationships involving the potion, ment, we use the emotions detected in the previous but kept collective entities like ”Hogwarts”. In the step to detect overall relationships between charac- second category we find the relationship between ters in the Harry Potter series by aggregating over Sirius Black and most other characters in the third emotions as described in Section 3.1. In Table 3, novel. For the majority of the book, Sirius is re- we report macro-averaged F1-scores as well as ac- garded as a villain intent on killing Harry, which is curacies for aggregating emotions as classified in revealed to be wrong at the end of the novel, turn- the Entity and MEntity settings for 2 and 5 emotion ing the relation very positive. Since the label here classes, since we do not have role labels for the is unclear, we excluded it from the dataset. Harry Potter corpus and the emotion classification Developing Relations As described before, re- for 8 emotions did not perform well. Note that the lationships can change drastically within a novel. number of emotions only pertains to the emotion Two prominent examples of this in the Harry Pot- classification setting, relations are always classified ter novels are the relations between Harry and as friendly or unfriendly. For the 5 class setting, we Hermione in the first novel (where they become define anger, disgust and sadness as negative emo- friends) and between Harry and Sirius Black in tions, joy as positive and anticipation as neutral. the third novel (see prev. paragraph). We can use The parameter α was optimised on hp1 and is set the emotions detected by our classifier to plot a to 0.4, except for 5-MEntity (α = 0.75). Lacking a trajectory over the novel. The polarity for char- directly comparable approach, we report sampling acters a and b in chapter i is then calculated as from the true label distribution per novel as a base- pi = pi−1 + posa,b,i − nega,b,i , where pos/nega,b,i line (which performs better than majority vote in counts positive/negative emotions between a and our setting). We find that, on average, 2 classes b in chapter i, respectively, and p0 := 0. We show lead to better results and we always outperform the plots for three examples in Figure 1, using predic- baseline. tions from the 2-class MEntity classification. In 4 Discussion all cases, the trajectory matches our expectation: For Harry and Hermione, the relation starts neu- In this section, we discuss our findings along with tral with a very clear upper trend after they become some of the decisions involved in the dataset col- friends. For Ron, the relation quickly becomes very lection and provide some insight regarding the de- positive. For Sirius, the relation is mostly negative, velopment of emotions over the course of a novel. while improving clearly in the final chapters. BERT vs. GRU Our BERT-based classifier out- 5 Conclusion performs the GRU in all undirected, but not all directed settings. Specifically, in the 8-class di- We have presented an improved approach for the rected evaluation, the GRU usually performs better classification of emotional relations between fic- than BERT. We hypothesise two possible reasons: tional characters. By aggregating sentence level a) the rather low amount of training data available emotions, we have built a classifier for novel- for each of the 8 emotion classes, especially in the wide character relations based on emotion anal- directed case. We assume that the GRU’s lower ysis. While our experiments show that aggrega- number of parameters makes it easier to tune on tion yields promising results, future work includes GRU BERT Undirected Directed Undirected Directed Setting 8c 5c 2c 8c 5c 2c 8c 5c 2c 8c 5c 2c NoInd 33 41 66 25 23 37 34 52 74 21 29 41 Role 19 34 55 33 35 56 19 51 65 21 23 34 MRole 32 44 67 39 44 65 39 59 75 30 55 75 Entity 21 31 57 22 18 30 28 44 70 18 30 46 MEntity 33 46 65 28 30 39 34 55 74 31 36 48 Table 2: Comparison of story-avg F1-scores between our classifier (BERT) and the GRU from Kim and Klinger (2019). Results for the GRU are taken from the original paper. The best result for each setting is marked in bold. F1-score Accuracy 2-class 5-class 2-class 5-class Novel En MEn En MEn Base En MEn En MEn hp1* 53 61 64 60 46 55 64 69 61 hp2 57 56 37 52 41 62 59 38 56 hp3 62 60 68 56 45 64 61 70 56 hp4 60 68 56 55 60 72 77 67 64 hp5 62 56 57 62 47 64 58 61 65 hp6 60 58 60 60 46 62 62 63 63 hp7 58 57 63 49 46 62 60 68 50 avg 59 59 58 56 47 63 63 62 59 Table 3: Macro-averaged F1-scores and accuracies for the classification of relations according to different emotion annotation schemes. En refers to the Entity annotation scheme, MEn to the MEntity scheme. Base refers to the stratified random baseline. hp1* was used as a development set to determine the value of α. the development of a stronger classifier for story- dynamic relationships between characters in literary level relations. We also plan on investigating the novels. Thirtieth AAAI Conference on Artificial Intelligence. influence of co-reference resolution, which is cur- rently done automatically. Using manual labels Kyunghyun Cho, Bart van Merrienboer, Caglar Gul- or improved co-references resolution should fur- cehre, Dzmitry Bahdanau, Fethi Bougares, Holger ther improve our results: First experiments indicate Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder better performance for frequent characters, where for statistical machine translation. In Proceedings of resolution errors are more easily smoothed out. the 2014 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP). Acknowledgements Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Many thanks to Darleen Pappelau for helpfully pro- Kristina Toutanova. 2019. BERT: Pre-training of viding the tie breaker annotations for the dataset. deep bidirectional transformers for language under- standing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language References Technologies, Volume 1 (Long and Short Papers), David Bamman, Ted Underwood, and Noah A Smith. pages 4171–4186, Minneapolis, Minnesota. Associ- 2014. A bayesian mixed effects model of literary ation for Computational Linguistics. character. In Proceedings of the 52nd Annual Meet- ing of the Association for Computational Linguistics Sean Everton, Tara Everton, Aaron Green, Cassie Ham- (Volume 1: Long Papers), pages 370–379. blin, and Rob Schroeder. 2019. Strong ties and where to find them: Or, why Neville (and Ginny and Snigdha Chaturvedi, Shashank Srivastava, Hal Seamus) and Bellatrix (and Lucius) might be more Daume III, and Chris Dyer. 2016. Modeling important than Harry and Tom. SSRN. Lena Hettinger, Martin Becker, Isabella Reger, Fotis Empirical Methods in Natural Language Processing, Jannidis, and Andreas Hotho. 2015. Genre classifi- pages 1631–1642. cation on german novels. In Proceedings of the 12th International Workshop on Text-based Information Shashank Srivastava, Snigdha Chaturvedi, and Tom Retrieval. Mitchell. 2016. Inferring interpersonal relations in narrative summaries. In Thirtieth AAAI Conference Jumayel Islam, Robert E. Mercer, and Lu Xiao. 2019. on Artificial Intelligence. Multi-channel convolutional neural network for twit- ter emotion and sentiment recognition. In Proceed- David Vilares and Carlos Gómez-Rodrıguez. 2019. ings of the 2019 Conference of the North American Harry Potter and the action prediction challenge Chapter of the Association for Computational Lin- from natural language. In Proceedings of NAACL- guistics: Human Language Technologies, Volume 1 HLT, pages 2124–2130. (Long and Short Papers), pages 1355–1365, Min- neapolis, Minnesota. Qianming Xue, Wei Zhang, and Hongyuan Zha. 2020. Improving domain-adapted sentiment classification Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jor- by deep adversarial mutual learning. Accepted to dan Boyd-Graber, and Hal Daumé III. 2016. Feud- appear in AAAI’20. ing families and former friends: Unsupervised learn- Albin Zehe, Martin Becker, Lena Hettinger, Andreas ing for dynamic fictional relationships. In Proceed- Hotho, Isabella Reger, and Fotis Jannidis. 2016. Pre- ings of the 2016 Conference of the North Ameri- diction of happy endings in german novels. In can Chapter of the Association for Computational Proceedings of the Workshop on Interactions be- Linguistics: Human Language Technologies, pages tween Data Mining and Natural Language Process- 1534–1544. ing 2016, pages 9–16. Evgeny Kim and Roman Klinger. 2018. A survey on sentiment and emotion analysis for computational literary studies. Submitted for review to DHQ (http://www.digitalhumanities.org/dhq/). Evgeny Kim and Roman Klinger. 2019. Frowning Frodo, wincing Leia, and a seriously great friend- ship: Learning to classify emotional relationships of fictional characters. In Proceedings of the 2019 Con- ference of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Lan- guage Technologies, Volume 1 (Long and Short Pa- pers), pages 647–653, Minneapolis, Minnesota. As- sociation for Computational Linguistics. Vincent Labatut and Xavier Bost. 2019. Extraction and analysis of fictional character networks: A survey. ACM Computing Surveys (CSUR), 52(5):1–40. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150. Sara Rosenthal, Noura Farra, and Preslav Nakov. 2017. SemEval-2017 task 4: Sentiment analysis in twit- ter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518. J. K. Rowling. 1997. Harry Potter and the Philoso- pher’s Stone, 1 edition, volume 1. Bloomsbury Pub- lishing, London. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep mod- els for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on