One Graph to Rule them All: Using NLP and Graph Neural Networks to analyse Tolkien’s Legendarium Vincenzo Perri1,∗,† , Lisi Qarkaxhija2,∗,† , Albin Zehe3,∗,† , Andreas Hotho3,∗ and Ingo Scholtes2,1 1 Data Analytics Group, Department of Informatics(IfI), Universität Zürich, CH-8050 Zürich, Switzerland 2 Chair of Informatics XV - Machine Learning for Complex Networks, Center for Arti昀椀cial Intelligence and Data Science (CAIDAS), Julius-Maximilians-Universität Würzburg, D-97074 Würzburg, Germany 3 Chair of Informatics X - Data Science, Center for Arti昀椀cial Intelligence and Data Science (CAIDAS), Julius-Maximilians-Universität Würzburg, D-97074 Würzburg, Germany Abstract Natural Language Processing and Machine Learning have considerably advanced Computational Lit- erary Studies. Similarly, the construction of co-occurrence networks of literary characters, and their analysis using methods from social network analysis and network science, have provided insights into the micro- and macro-level structure of literary texts. Combining these perspectives, in this work we study character networks extracted from a text corpus of J.R.R. Tolkien’s Legendarium. We show that this perspective helps us to analyse and visualise the narrative style that characterises Tolkien’s works. Addressing character classi昀椀cation, embedding and co-occurrence prediction, we further investigate the advantages of state-of-the-art Graph Neural Networks over a popular word embedding method. Our results highlight the large potential of graph learning in Computational Literary Studies. Keywords computational literary studies, character networks, network analysis, graph neural networks, NLP 1. Motivation and Background Computational Literary Studies (CLS) have recently taken advantage of the latest developments in Natural Language Processing (NLP), with deep learning techniques bringing major improve- ments for tasks relevant to literature analysis: Examples include named entity recognition [28], anaphora and coreference resolution [44], sentiment analysis [12], scene detection [54] or genre classi昀椀cation [52]. Apart from NLP, machine learning has recently shown great po- tential in data with complex relational structure that can be represented as graph or network �㔺 = (�㕉 , �㔸), consisting of nodes ÿ, Ā, … ∈ �㕉 and links (ÿ, Ā) ∈ �㔸. In CLS, this abstraction is frequently used to study character networks, i.e. graphs where nodes represent literary charac- ters and links represent relationships such as, e.g., their co-occurrence in a sentence or scene, CHR 2022: Computational Humanities Research Conference, December 12 – 14, 2022, Antwerp, Belgium ∗ Corresponding author. † These authors contributed equally. £ perri@ifi.uzh.ch (V. Perri); lisi.qarkaxhija@uni-wuerzburg.de (L. Qarkaxhija); zehe@informatik.uni-wuerzburg.de (A. Zehe); hotho@informatik.uni-wuerzburg.de (A. Hotho); ingo.scholtes@uni-wuerzburg.de (I. Scholtes) ȉ 0000-0002-9472-0783 (A. Zehe); 0000-0002-0483-5772 (A. Hotho); 0000-0003-2253-0216 (I. Scholtes) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 291 or dialogue interactions [23]. Building on this abstraction, several works in CLS used (social) network analysis to study the narrative structure of literary works [32, 47]: Considering 19th century novels, Elson, McKeown, and Dames [11] analysed macroscopic properties of conver- sation networks, i.e. 昀椀ctional characters engaging in dialogue. Beveridge and Shan [7] applied centrality measures and community detection to a network of characters that occur within close proximity in the text of A Storm of Swords, the third novel in George R.R. Martin’s series Song of Ice of Fire. Using the same network extraction, Bonato, D’Angelo, Elenberg, Gleich, and Hou [8] studied statistical properties of character networks in three popular novels. Sev- eral authors applied centrality measures to identify important characters, e.g. in works by Shakespeare [53], Alice in Wonderland [1], or the 昀椀rst novel of the Harry Potter series [42]. In a recent work, Agarwal, Vijay, et al. [2] used character networks to facilitate the automated classi昀椀cation of literary genres. Using J.R.R. Tolkien’s The Lord of the Rings, which we also consider in our manuscript, Ribeiro, Vosgerau, Andruchiw, and Pinto [38] applied social net- work analysis to its network of characters. Li, Zhang, Tan, and Li [27] studied small-world and scale-free properties of character co-occurrence networks in movie scripts, among them the movie adaptation of The Lord of the Rings. Existing studies of character networks mainly used methods from (social) network analysis to gain insights into the narrative structure of literary texts. At the same time, recent advances in geometric deep learning [9] and Graph Neural Networks (GNNs) provide new ways to apply deep learning to graph-structured data, which creates interesting opportunities for the study of character networks. A major advantage of GNNs over other network analysis or machine learn- ing techniques is their ability to leverage relational patterns in the topology of a graph, while at the same time incorporating additional node or link features. This facilitates unsupervised and (semi-)supervised learning tasks at the node, link, or graph level. Examples include graph representation learning [34, 17], community detection [10], node and link classi昀椀cation [21, 49], link prediction [55], or graph classi昀椀cation [56]. To the best of our knowledge, no works have combined recent advances in (i) natural language processing, e.g. to recognize entities, re- solve coreferences, extract meaningful character networks, or generate word embeddings, and (ii) graph neural networks, e.g. to compute latent space representations of 昀椀ctional characters or address node- and link-level learning tasks. In CLS, the combination of these paradigms can help with several tasks, e.g. (semi-)supervised character classi昀椀cation, semantic analyses of character relationships, comparisons of character constellations in di昀昀erent works, or auto- mated genre classi昀椀cation. Addressing this gap, we combine NLP, network analysis and graph learning to analyse J.R.R. Tolkien’s Legendarium, namely The Silmarillion, The Hobbit, and the three volumes of The Lord of the Rings. Our contributions are: • We apply entity recognition and coreference resolution to detect and disambiguate char- acters in Tolkien’s Legendarium. We report key statistics like character mentions via pronouns, nominal mentions or explicit references and compare them across di昀昀erent works in the Legendarium. We use sequential character mentions to generate narrative charts of the 昀椀ve considered works, which highlight Tolkien’s interlaced narrative style. • We extract character networks for each work in our corpus, i.e. graphs �㔺 = (�㕉 , �㔸) where nodes Ā ∈ �㕉 capture characters in the Legendarium, while undirected edges (Ā, ā) repre- sent the co-occurrence of two characters in the same sentence. Apart from generating character networks for the 昀椀ve works in our corpus, we generate a single network that 292 captures all characters in the works that constitute Tolkien’s Legendarium. We use this to perform a macroscopic, graph-based characterisation of Tolkien’s works. • We address the question to what extent graph learning techniques can leverage the topol- ogy of automatically extracted character networks, and which advantages they provide over word embeddings that just consider word-context pairs. To this end, we evaluate the performance of di昀昀erent methods for the (i) latent space representation of charac- ters, (ii) (semi-)supervised classi昀椀cation of characters, and (iii) prediction of character co-occurrences. The results con昀椀rm that the inclusion of topological information from character networks considerably improves the performance of these tasks. Analysing a single graph representation of multiple literary works unfolding in the same 昀椀ctional universe, our work demonstrates the potential of graph neural networks for compu- tational literary studies. To facilitate the use of our data and methods in the (digital) study of Tolkien’s works1 , we make the results of our entity recognition, coreference resolution and character network extraction available. To foster the application of our methods to other cor- pora, we also provide a set of jupyter notebooks that reproduce our 昀椀ndings.2 2. Text Corpus and Data Processing We consider the English full text of The Lord of the Rings (consisting of the volumes The Fel- lowship of the Ring, The Two Towers and The Return of the King, each split into two books), The Hobbit and The Silmarillion. We used the python-based NLP pipeline BookNLP3 to extract lin- guistic and literary features. BookNLP uses the NLP service spaCy [20] to perform tokenisation, that is, splitting text into a list of words and special characters (like punctuation) and to split text into sentences. This 昀椀rst step in the NLP pipeline is the basis for all further processing steps. Entity Recognition and Coreference Resolution Entity recognition refers to the task of detecting all references to entities (e.g., characters, location) in a text corpus. These references can either be explicitly named references (e.g., “Bilbo Baggins”, “Smaug”), noun phrases (e.g., “the hobbit”, “the dragon“) or pronouns (e.g., “she”, “they”). BookNLP uses an entity annotation model that has been trained on a large annotated data set [5] to identify named entities, noun phrases as well as pronoun references. A昀琀er these references have been detected, in a next step coreference resolution can be applied, which is a very hard task in general [45] and is especially hard in the context of literary texts due to the high variation of references used and the very long texts [40, 22]. Con昀椀rming this view, our initial analyses revealed that the performance of BookNLP’s coreference resolution, which was trained on a data set of annotated coreferences [4] was not satisfactory when applying it to our corpus. We thus decided to focus on named references, and resolve these using a set of simple manually-created disambiguation rules (e.g., “Sam” → “Sam Gamgee”, “Peregrin” → “Pippin”).4 Although this approach may yield 1 https://digitaltolkien.com/. 2 https://github.com/LSX-UniWue/one-graph-to-rule-them-all. 3 https://github.com/booknlp/booknlp. 4 The full list of disambiguation rules can be found in our repository. 293 a low recall (i.e. there are many unidenti昀椀ed coreferences since pronouns and noun phrases are not considered), we 昀椀nd that this coreference resolution yields high precision (i.e. almost all resolved coreferences that we inspected manually were correct). We found this approach preferable over a “full” coreference resolution for two reasons: First, considering our focus on character networks, a coreference resolution with high recall but lower precision would give rise to many spurious character co-occurrences that would harm our analyses of graph learning techniques. Second, our corpus of Tolkien’s Legendarium is special in the sense that it has a large number of named references, which give rise to rich character networks despite limiting our view to named references. Extraction of Character Co-Occurrences A昀琀er 昀椀nding references to characters, we next extract co-occurrences of pairs of characters that can be used to build character networks. While the co-occurrence of characters does not necessarily imply an interaction between them, due to its simplicity it is a frequently used approach to construct character networks in CLS [7, 23, 8]. A昀琀er evaluating di昀昀erent strategies (cf. Appendix A), we decided to extract each co-occurrence of two characters in the same sentence. Descriptive Statistics of Text Corpus In Table 1, we provide key summary statistics that characterize the di昀昀erent works in our corpus. Apart from di昀昀erences in terms of tokens, we 昀椀nd striking di昀昀erences between character mentions in the 昀椀ve texts: The Silmarillion uses considerably fewer pronoun mentions than the other four works, while using more explicit references by name. We attribute this to the compressed writing style of The Silmarillion, which rather resembles a 昀椀ctional historical record that lists character names and locations, and gives a chronology of events compared to the more conventional prose style of The Hobbit and The Lord of the Rings. Table 1 Summary statistics of the text corpus used in our analysis. Hobbit TLotR Vol. I TLotR Vol. II TLotR Vol. III Silmarillion #tokens 113,235 224,156 189,539 163,191 147,833 #character mentions 15,477 31,770 27,576 24,766 26,468 % Pronouns 56.19 54.11 56.27 51.65 37.48 % Nominal mentions 27.7 22.76 22.3 25.93 29.15 % Explicit references 16.11 23.13 21.42 22.42 33.38 Character Co-occurrences 431 886 615 844 1,674 We calculate the number of character co-occurrences for each work in our corpus, which we indicate in Table 1 and visualise in Figure 1a – Figure 1f. The overall number of co-occurrences is in昀氀uenced by the number of explicit references (cf. Table 1), since we only extract co- occurrences if both characters are mentioned by name in the same sentence. Character-based Visualization of Narrative Structure We 昀椀nally use the character men- tions in conjunction with the chapter in which they occurred to automatically produce a nar- rative chart for the three volumes of The Lord of the Rings. To avoid occurrences of characters 294 that are only mentioned while not being present, we excluded mentions of characters within dialogues, as detected by BookNLP. The narrative charts for the three volumes of The Lord of the Rings are shown in the three panels of 昀椀g. 2. The columns within each panel represent chapters. To ease the presentation, we focus on a selected set of main characters shown in the rows. The color of each row-column cell represents the fraction of the number of times a characters is mentioned in a given chapter, relative to the number of mentions of other key characters presented in the narrative chart. While it is easy to generate those charts, we can use them to identify major plot lines and reveal a narrative structure that is characteristic for The Lord of the Rings: In volume I, we see that Frodo and the Hobbits maintain a chief role throughout book I and II, i.e. both parts of volume I of The Lord of the Rings. A shi昀琀 is visible for the second half (book II), which coincides with the Hobbits’ arrival in Rivendell. In volume II, we see a clear transition in narrative structure that is due to Tolkien’s adoption of an interlacing narrative style [51]. The 昀椀rst half of volume II (book III) focuses on the main characters Aragorn, Legolas, and Gimli, and their attempt to rescue Merry and Pippin from the Uruk-Hai. Leaping back in time, the second half of volume III (book IV) focuses on the journey of Frodo, Sam, Gollum and their encounter with Faramir, which coincides with a brief absence of Gollum as he hides from the rangers. Following the original titles suggested for the six books that constitute the three volumes of The Lord of the Rings [41], these interlaces can be called The Treason of Isengard and The Journey of the Ring-bearers. In volume III, we see a similar but less marked separation (a) TLoTR, Vol. I (b) TLoTR, Vol. II (c) TLoTR, Vol. III (d) The Hobbit (e) The Silmarillion (f) Legendarium Figure 1: Number of co-occurrences between the ten most frequently mentioned characters in the five works of our corpus (a) – (e) and the whole Legendarium (f). 295 Fellowship Two Towers Return of the King Aragorn 1.0 Bilbo Boromir Faramir 0.8 Frodo Gandalf Gimli 0.6 Gollum Legolas Merry 0.4 Pippin Sam Saruman 0.2 Sauron Théoden 0.0 Figure 2: Narrative charts for the three volumes of The Lord of the Rings. Rows indicate key characters, while columns represent chapters. Colours capture the relative frequency of character mentions relative to other key characters captured in the charts. In Vol. I (le昀琀 panel) Frodo and the Hobbits maintain a central role across the whole volume. For Vol. II and III (middle and right panel) clear transitions in the narrative structure are visible, which are due to an interlacing narrative style. between two interlaced plot lines: The 昀椀rst half (book V) focuses on the War of the Ring in Gondor, while book VI continues the story of Frodo’s and Sam’s journey to Mount Doom, followed by the Scouring of the Shire, which explains Merry’s and Saruman’s reappearance in the last part. The original titles referring to those interlaces are The War of the Ring and The End of the Third Age. The non-linear narrative style expressed in the narrative charts in Figure 2 is common in early medieval literature [26]. Tolkien likely adopted it due to his familiarity with medieval literature, to increase suspense and to achieve a sense of historicity [3]. Since these charts capture the narrative structure that we would expect, we can assume that our character extraction, even though using only limited coreference resolution, captures the overall occurrences of key characters rather well. In Appendix B we include additional narrative charts for The Hobbit and The Silmarillion. 3. Analysis of Character Co-occurrence Networks Building on Section 2 we now use character co-occurrences to construct character networks for J.R.R. Tolkien’s Legendarium. A graph or network is de昀椀ned as tuple �㔺 = (�㕉 , �㔸), where ÿ, Ā, ... ∈ �㕉 represent nodes and (ÿ, Ā) ∈ �㔸 ⊆ �㕉 ×�㕉 is a set of directed or undirected links. For our analysis, we model characters as nodes �㕉 and their co-occurrences within a sentence as undirected links �㔸, i.e. (ÿ, Ā) ∈ �㔸 ⟹ (Ā, ÿ) ∈ �㔸 for all ÿ, Ā ∈ �㕉 . We further add link weights ā ∶ �㔸 → ℕ, i.e. we assign a value ā((ÿ, Ā)) to each link (ÿ, Ā) that counts the co-occurrences of ÿ and Ā. We adopt this de昀椀nition to construct character networks for each of the four works in our corpus. A particularly interesting aspect of our corpus is that all works refer to a single Legendarium, with frequent cross-reference, and thus, overlaps in terms of characters. We use this to construct a single network of characters across all works, which we call Legendarium Graph. Except for a single disconnected pair of nodes that we removed, the resulting graph has a single connected 296 component with 238 nodes, which enables a macroscopic analysis of Tolkien’s Legendarium. Figure 3 shows a visualisation of the character network that has been generated using the force-directed layout algorithm by Fruchterman and Reingold [14]. This algorithm simulates attractive forces between connected nodes (and repulsive forces between all nodes) such that their positions in the stable state of the resulting dynamical system highlight patterns in the topology. Table 2 Key network metrics for the character networks in our corpus, where Legendarium refers to the graph spanning all works. nodes edges density mean degree diam. avg. shortest path The Hobbit 30 119 0.14 8.0 4 1.9 The Lord of the Rings Vol. I 65 217 0.052 6.68 5 2.31 The Lord of the Rings Vol. II 55 169 0.057 6.13 5 2.4 The Lord of the Rings Vol. III 45 159 0.08 7.07 5 2.26 The Silmarillion 136 803 0.044 11.81 5 2.41 Legendarium 238 1233 0.023 10.48 6 2.84 Network Size, Density, and Mean Degree Modelling character networks enables us to compute metrics from social network analysis, which we report in Table 2. The 昀椀rst columns ÿ of Table 2 show the number of nodes Ā = |�㕉 | and links ÿ = |�㔸| and the density �㔌 = Ā(Ā−1) of each character network, where the latter captures which fraction of possible links actually occurred. We 昀椀nd that The Hobbit and The Silmarillion have the smallest and largest number of nodes respectively. The Hobbit has a higher link density, which is likely because the plot is focused on a small number of strongly interacting characters. For The Lord of the Rings we see that the number of nodes for Vol. III is smaller than for the previous two volumes, while the density is larger. This re昀氀ects the fact that the last volume is strongly focused on interactions between small groups of characters, e.g. Frodo, Sam and Gollum. The degree of a node Ā ∈ �㕉 is de昀椀ned as the number of other nodes to which it is connected by links, i.e. ā(Ā) ∶= |{ÿ ∶ (ÿ, Ā) ∈ �㔸}|. The fourth column in Table 2 reports the mean node degree of characters, where larger mean degrees are associated with higher link density. Shortest paths, Diameter and Betweenness Centrality An important feature of (social) networks is the structure of shortest paths between nodes, which provide a topological notion of pair-wise distances [33]. We calculate shortest paths between all pairs of nodes and report the average shortest path length across all character pairs. We further calculate the diameter, i.e. the maximum shortest path length between any pair of nodes. The results in Table 2 show that, on the one hand, the average shortest path length is associated with the number of nodes (smallest average shortest path length for the smallest networks The Hobbit and The Lord of the Rings Vol. III). On the other hand, it is in昀氀uenced by mean degree and link density, which explains why The Lord of the Rings Vol. III and The Silmarillion have similar values despite the latter having more than twice as many nodes. The shortest paths between characters allows us 297 Dain Déagol Arod Esgaroth Girion Bard Smaug the Eagles Thrain Bert William Snowmane Bifur Fili Hama Gloin Kili Hasufel Thengel Eorl Oin Thorin Nori Thror Ingold Pimple Maggot Dori Farmer Cotton Dale Meriadoc Eomer the Prince Imrahil Balin Dwalin Helm the Entwives Éowyn Bob Fatty Bolger the Shadow Durin Shadowfax Drogo Gollum Grima Wormtongue Ioreth Théoden Erkenbrand Cirith Ungol Erestor Elladan Barliman Butterbur Troll the Company Halbarad Bilbo Tom Bombadil Nob Grishnakh Hirgon the Great Goblin Bill Bill Ferny Elrohir Gandalf the Dead Anborn Gamling Imrahil Celeborn Merry Gil-galad Bergil the Shirriffs Beregond Rosie Cotton Black Riders Aragorn Farmer Maggot the Rohirrim Bregalad Isildur Shelob Caradhras Boromir Pippin Gwaihir Gimli Ents Goldberry Sauron Glorfindel Legolas Hobbits Faramir Huorns Ugluk Frodo Arathorn Dwarves the Isengarders Lotho G lóin Sam Treebeard the Galadhrim Denethor Elrond Saruman Quickbeam Lothlórien Shagrat Elendil the Gaffer Damrod the Sackville-Bagginses Almaren t he D úneda in Snaga Gorbag Haldir Galdor Bregolas Nienna Orcs T inúviel Lórien King Finrod Felagund Galadriel Lobelia Elves Anar M´ıriel the Firstborn t he N úm enór ea ns Varda Beren the Dwarves Mablung Estë Elwë the Haladin Elros L út hien Otho C´ırdan Barahir Belegund the Edain Oromë Tilion Nienor Huor Hador Maglor Mˆım Felagund ElwingMorwen Daeron the Trees Melian Amandil Aegnor FingonE¨ arendil Aulë Melkor Maedhros Nimloth the Valar Finduilas T úr in the Vanyar Bëor Gwindor Fëanor Morgoth Menegroth H úr in Dior Huan Dorlas Caranthir Beleg Fingolfin Eönwë Angrod Thingol Valinor Ingwë Aredhel Finrod Celegorm the Teleri Arien Turambar Orodreth King Thingol Ulmo Turgon the Naugrim Finarfin Taniquetil Tuor Nogrod Amras the Noldor Curufin Thorondor Idril Amrod Finrod Felagund Ar-Pharazôn Laurelin Mandos Manwë Maeglin Brandir Draugluin Ossë L út hien T inúviel H úr in T ha lion Gelmir Gothmog Eöl I lúva t ar Tirion Marach N´ıniel Bereg Figure 3: Force-directed visualisation of automatically extracted character network for Tolkien’s Leg- endarium with 238 nodes and 1233 links. Node colours indicate the works in which a character occurs most frequently (red: The Hobbit, green: The Lord of the Rings, blue: The Silmarillion). Node sizes are logarithmically scaled based on node degrees, edge thickness are logarithmically scaled based on edge weights, i.e. number of character co-occurrences. Visualisation was created using the Open Source net- work analysis and visualisation library pathpy [18]. This character networks facilitates a macroscopic, graph-based analysis that considers intertextual aspects in J.R.R. Tolkien’s Legendarium. 298 Table 3 Ranking of characters in Tolkien’s Legendarium based on degree and betweenness centrality. degree betweennes 1 Frodo Elves 2 Sam Frodo 3 Gandalf Gandalf 4 Aragorn Bilbo 5 Pippin Galadriel 6 Merry Aragorn 7 Morgoth Morgoth 8 Bilbo Elrond 9 Legolas Lúthien 10 Gimli Orcs 11 Elrond Sam 12 Gollum Saruman 13 Elves Pippin 14 Túrin Merry 15 Húrin Húrin to de昀椀ne a path-based notion of centrality. The betweenness centrality [50] of node Ā captures the number of shortest paths between any pair of nodes that pass through Ā, i.e. nodes have higher betweenness centrality if they are important for the “communication” between other nodes. The second column of table 3 reports characters with the highest betweenness centrality for the single Legendarium graph. We note the high betweenness centrality Galadriel, who despite an ephemeral appearance in The Lord of the Rings and absence in The Hobbit is one of few characters that link the narrative across di昀昀erent ages in Tolkien’s mythology. 4. Application of Graph Neural Networks We now turn our attention to the application of state-of-the-art graph learning techniques to the character network of Tolkien’s Legendarium, which has been introduced and characterised in the previous section. Our analysis is focused on our guiding research question outlined in Section 1, i.e. what additional information we can draw from the topology of the character network, compared to an application of standard machine learning to a word embedding tech- nique. A major hurdle for the application of machine learning to character networks is that standard techniques like, e.g., logistic regression, support vector machines, or neural networks require features in a continuous vector space. Their application to discrete objects like graphs typically requires a two-step procedure that consists of (i) a representation learning or embed- ding step to extract vectorial features of nodes and/or links, and (ii) a downstream application of machine learning to those features. This approach is limited by the fact that graphs with complex topologies are fundamentally non-Euclidean objects [9], which limits our ability to 昀椀nd a generic vector space representation that is suitable for di昀昀erent learning tasks. Addressing this limitations, recent works in the 昀椀eld of Geometric Deep Learning [9] and 299 Graph Learning have generalized deep learning to graph-structured data. Among those works, Graph Neural Networks (GNNs) [43, 16, 39, 15] have developed into a particularly successful paradigm. A major advantage of GNNs over other network analysis or machine learning tech- niques is their ability to capture both relational patterns in the topology of a graph as well as additional (vectorial) features of nodes or links. A common concept of GNNs is their use of hidden layers with neural message passing, i.e. nodes repeatedly exchange and update feature vectors with neighbouring nodes thus incorporating information from their neighbourhood. The (hidden) features generated in this way can be used to address learning tasks by means of a perceptron model with a non-linear activation function. The gradient-based optimization of GNNs can be thought of as an implicit way to generate a topology- and feature-aware latent space representation of a graph that facilitates node-, link- or graph-level learning tasks [19]. Moving beyond the social network analysis techniques applied in Section 3, in the follow- ing we apply two state-of-the-art GNN architectures to the character network of Tolkien’s Legendarium. We use them to address three unsupervised and supervised learning tasks: (i) representation learning, i.e. 昀椀nding a meaningful latent space embedding of characters, (ii) (semi-)supervised node classi昀椀cation, i.e. assigning characters to di昀昀erent works in the Legen- darium, and (iii) link prediction, i.e. using a subset of links in the graph to predict missing links in a holdout set. Latent Space Representation of Characters in Tolkien’s Legendarium Representation learning is a common challenge both in natural language processing and graph learning. In NLP, word or text embedding techniques are commonly used to generate vector space repre- sentations that can then be used to apply downstream machine learning techniques to literary texts. A popular word embedding technique is word2vec [31, 30], which we use as a baseline in our analysis that only uses the text corpus while being agnostic to the topology of the char- acter network. We speci昀椀cally use the SkipGram architecture to train a neural network with a single hidden layer with ā neurons that captures the context of words in the text corpus, i.e. a concatenation of all works in our corpus. The weights of neurons in the hidden layer are then interpreted as positions of words in a ā-dimensional latent vector space. For our analysis, we used the word2vec implementation in the package gensim with default parameters, i.e. we use a latent space with ā = 300 dimensions. Apart from this standard NLP approach to generate latent space embeddings, we apply two graph representation learning techniques to generate character embeddings based on the topol- ogy of the character network: The 昀椀rst approach, Laplacian Eigenmaps, uses eigenvectors cor- responding to the leading eigenvalues of the Laplacian matrix of a graph [6]. This can be seen as a factorization of an eigenmatrix [37] that yields a representation of nodes in a ā-dimensional latent space, where ā is chosen to be smaller than the number of nodes. To determine a rea- sonable choice for the number of dimensions ā, we performed an experiment in which we evaluated the average performance of a logistic regression model for node classi昀椀cation (see detailed description below) for di昀昀erent dimensions ā of the latent space. The results of this experiment are shown in Figure 4. As expected, we observe tendency to under昀椀t for a very small (ā < 10) number of dimensions, while the performance saturates as we increase ā be- yond a “reasonable” value that allows to capture the topological patterns in the network. This 300 analysis informed our choice of ā = 20 for the subsequent experiments. As second approach we adopt the popular graph representation learning method node2vec, which applies the Skip- Gram architecture to sequences of nodes generated by a biased second-order random walk (i.e. a walk with one-step memory) in a graph [17]. We again use ā = 20 hidden dimensions to make it comparable with the previous method. We note that choosing a value of ā = 20 for the graph representation learning techniques, which is substantially smaller than the default value of ā = 300 for word2vec, is justi昀椀ed because the number of nodes in the Legendarium graph (Ā = 238) is much smaller than the vocabulary used by word2vec (Ā = 18, 430). Predicting character classes with 1.0 Laplacian Eigenmaps 0.8 f1 score 0.6 0.4 0.2 2 5 8 11 14 17 20 23 26 29 32 35 38 number of dimensions Figure 4: Performance of node classification (y-axis) for a Laplacian Eigenmap graph embedding of characters in Tolkien’s Legendarium (with subsequent logistic regression) using di昀昀erent numbers of dimensions (x-axis).The shaded area indicates the standard deviation of the f1-score. We 昀椀nally adopt two Graph Neural Networks, namely Graph Convolutional Networks (GCNs) [21] and Graph Attention Networks (GATs) [49]. Both of these deep graph learning techniques use a variant of neural message passing, the main di昀昀erence being that GATs use a learnable attention mechanism that can place di昀昀erent weights on nearby nodes [57]. We trained both architectures to address the graph learning tasks, i.e. node classi昀椀cation and link prediction, outlined below, using a hidden layer with ā = 20 neurons. We then use the hidden layer acti- vations of both architectures to infer a representation of characters in a 20-dimensional latent space. To exclusively focus on the graph topology, for our experiments we treated the network as an unweighted graph. Additional results for experiments with weighted graphs are included in Appendix D. Leveraging the ability of GNNs to consider additional node attributes, we compare three di昀昀erent approaches: First, we use GNNs without additional node features, which we emulate by initializing the message passing layer with a one-hot-encoding (OHE) of characters. In other words, for a network with Ā nodes, each node �㕖 = 0, … , Ā−1 we assign a “dummy feature” vector �㕓�㕖 ∈ ℝĀ de昀椀ned as: �㕓�㕖 = (0, … , 0, 1, 0, ⏟⏟⏟⏟⏟⏟⏟ …,0 ) ⏟⏟⏟⏟⏟⏟⏟ �㕖 times Ā−�㕖−1 times Second, we use the node embeddings generated by node2vec as additional node features that are used in the message passing layers. Third, we assign the word2vec as additional node 301 features thus combining NLP and graph neural networks. In Figure 5 we illustrate two latent space representations of characters generated by (a) word2vec and (b) the combination of GCN with word2vec features. For both 昀椀gures, we used t-SNE [48] to reduce the latent space embed- ding to two dimensions, nodes are coloured according to the works in which the corresponding characters occur most frequently. A comparison of the two embeddings clearly highlights the advantage of graph learning over a mere application of word2vec: Di昀昀erent from the word embedding, the combination of GCN with word2vec generates a latent space representation that captures the distinction of characters across di昀昀erent works in Tolkien’s Legendarium. We argue that this visualisation highlights the additional information that graph neural net- works can leverage from the topology of character networks, as opposed to mere word-context pair statistics. In the following, we more thoroughly investigate this interesting aspect in two clearly de昀椀ned learning tasks. Predicting Character Classes We use the methods outlined above to address a supervised node classi昀椀cation problem in the Legendarium Graph, i.e. the single character network cap- turing all works in Tolkien’s Legendarium. We assign three labels to nodes that correspond to the work (i.e. The Silmarillion, The Hobbit, and The Lord of the Rings), in which the corre- sponding character is most prominent. We extract these labels automatically as argmaxÿāāý count(Ā,ÿāāý)/∑Ā∈�㔶 count(Ā,ÿāāý), where count(Ā, ÿāāý) is the number of mentions of character Ā in ÿāāý and �㔶 are all characters. We verify the labels manually, 昀椀nding them to be reasonable in all cases. The resulting three classes contain 113 (The Silmarillion), 30 (The Hobbit) and 99 (The Lord of the Rings) characters, i.e. there is class imbalance that we address by using the macro- averaged f1-score, precision and recall. We highlight that the information on the di昀昀erent works is withheld from all methods, i.e. for the word embedding and character network con- struction we concatenate all works to a single text corpus. We train our models on a training set of labelled characters and use them to predict the unknown classes of unlabelled characters in a test data set. As a baseline that does not utilise the topology of the character network, we 昀椀rst train a logistic regression model using the embeddings generated by word2vec. Similarly, we train a logistic regression model on the embeddings generated by Laplacian Eigenmaps and node2vec. For node2vec we use three di昀昀erent sets of hyper-parameters Ă and ā, where for Ă = ă = 1 node2vec is equivalent to the graph embedding technique DeepWalk [34]. We trained the model for 200 epochs. We 昀椀nally train the GCN and GAT model either using one- hot encoding (OHE) or assigning additional node features generated by the word and graph embeddings as explained above. For both we used two message passing layers and we trained the models for 5000 epochs using an Adam optimizer with learning rate þĄ = 0.0001. We eval- uate all models using a 10−fold cross-validation. Average results with standard deviation are shown in Table 4. Interestingly, we 昀椀nd that the performance of character classi昀椀cation based on the word em- bedding word2vec is worse than any of the graph learning techniques, thus highlighting the added value of the graph perspective. The graph embedding technique node2vec, which uses a SkipGram architecture to embed nodes, clearly outperforms the graph-agnostic word2vec, despite both using the same logistic regression model for the class prediction. Moreover, we 昀椀nd that a simple application of Laplacian Eigenmaps, i.e. a mathematically principled matrix 302 (a) Latent space embedding of characters obtained by applying the word embedding word2vec to the whole text corpus; edges represent character co-occurrences in the text. (b) Graph Convolutional Network using word2vec character embeddings as additional node features Figure 5: Comparison of two latent space embeddings of characters in Tolkien’s Legendarium using word2vec (a) and the hidden layer activations of a Graph Convolutional Network, where nodes carry additional features that correspond to word2vec embeddings (b). Node colours indicate the work in which characters appear most frequently (red: The Hobbit, green: The Lord of the Rings, blue: The Silmarillion). Two-dimensional visualisations were generated using the dimensionality reduction tech- nique t-SNE [48]. 303 Table 4 Performance of word embedding (word2vec), graph representation learning (Laplacian Eigenmap and node2vec), and graph neural networks (GCN and GAT) in supervised character classification and link prediction for a character network capturing Tolkien’s Legendarium. For the word and graph embedding techniques, a logistic regression model was used for classification and link prediction. Character classification Link Prediction F1-score Precision Recall ROC/AUC word2vec300 79.45 ± 0.12 80.67 ± 0.13 80.08 ± 0.12 78.16 ± 0.79 Laplacian Eigenmap20 (LE) 81.52 ± 0.14 86.59 ± 0.09 84.13 ± 0.14 64.03 ± 2.28 node2vec20 Ă = 1, ă = 4 82.10 ± 0.12 84.54 ± 0.12 84.85 ± 0.11 79.06 ± 0.01 node2vec20 Ă = 4, ă = 1 81.75 ± 0.12 81.69 ± 0.12 84.51 ± 0.12 80.71 ± 0.01 node2vec20 Ă = 1, ă = 1 87.07 ± 0.13 89.10 ± 0.12 87.30 ± 0.13 80.15 ± 0.02 GCN20 LE 71.89 ± 0.20 72.40 ± 0.22 73.24 ± 0.18 88.53 ± 1.25 GCN20 node2vecĂ = 1, ă = 1 92.17 ± 0.10 96.69 ± 0.04 91.08 ± 0.11 88.03 ± 1.43 GCN20 OHE 90.55 ± 0.13 92.71 ± 0.12 89.95 ± 0.13 80.78 ± 1.16 GCN20 word2vec 92.32 ± 0.10 95.27 ± 0.08 91.42 ± 0.10 81.67 ± 1.43 GAT20 LE 53.60 ± 0.08 51.72 ± 0.05 57.78 ± 0.08 79.41 ± 1.83 GAT20 node2vecĂ = 1, ă = 1 91.18 ± 0.09 95.57 ± 0.04 90.37 ± 0.10 86.23 ± 0.99 GAT20 OHE 83.52 ± 0.14 85.59 ± 0.15 84.68 ± 0.14 80.52 ± 1.76 GAT20 word2vec 89.41 ± 0.07 93.98 ± 0.08 89.13 ± 0.10 82.88 ± 2.08 decomposition, yields comparable node classi昀椀cation performance than the neural network- based node2vec embedding. The best precision is achieved for a Graph Convolutional Network (GCN) with additional node features generated by node2vec, while the best f1-score and recall are achieved for GCN with additional word2vec embeddings. We attribute this to the fact that the combination of word embeddings and graph neural networks integrates two complemen- tary sources of information, thus highlighting the advantages of methods that leverage both NLP and graph learning. In Figure 6 we further demonstrate the ability of Graph Neural Networks to perform a semi- supervised classi昀椀cation, i.e. their ability to accurately predict classes based on a very small number of labelled examples. Figure 6(a) shows the training network with three randomly coloured characters, one for each ground-truth class. Nodes in the test set are shown in grey and node positions are chosen based on the latent space representation shown in Figure 5(b). We use these three labelled characters to train a GCN with additional word2vec embeddings as node features. We then use the trained model to predict the character classes in the test set. The predicted classes are shown as coloured nodes in Figure 6(b), where the three training nodes are shown in grey. A visual comparison of Figure 6(b) and 5 (b) allows to evaluate the prediction against the ground truth. Despite the very sparse labelled examples, and thanks to its use of the graph topology of the unlabelled nodes in the training set, the GCN model is able to accurately predict character classes, reaching an f1-score of ≈ 79.7%, a precision of ≈ 78.6% and a recall of ≈ 82.6%. Remarkably, this shows that the combination of a GCN model with word2vec node features yields a higher f1-score and recall with only three labelled examples than a word embedding alone, even when all characters in the training set are labelled. 304 (a) Training network with three labelled characters Galadriel, Orcs, and Balin (shown as coloured nodes). (b) Character classes predicted by Graph Convolutional Network (GCN) using word2vec character em- beddings as additional node features. Figure 6: Illustration of semi-supervised character classification, where the training set contains only three randomly chosen labelled characters. Node colours indicate ground truth classes (a) and class predictions by Graph Convolutional Network (GCN) (b) (red: The Hobbit, green: The Lord of the Rings, blue: The Silmarillion). Two-dimensional visualisations were generated using the dimensionality re- duction technique t-SNE [48]. Node positions correspond to Figure 5 (b), which can be used for a comparison with ground truth character classes. Remarkably, a GCN model with additional word2vec features achieves an f1-score of ≈ 79.7% with an associated precision of ≈ 78.6% and recall of ≈ 82.6%. 305 Predicting Character Interactions We 昀椀nally address link prediction, which refers to the task of predicting “missing” links in a graph, i.e. links that are either “missing” in incomplete data or that likely form in the future. Link prediction is a well-studied graph learning problem, with important applications in social network analysis and recommender systems [29]. In the context of character networks, it is relevant because it could be used to alleviate the low recall of the rigid, sentence-based character network extraction that we employed in Section 3. Adopting a supervised approach, we split the edges of the Legendarium Graph in a training and set set, where we withhold 10 % of the edges during training to test our model. We use word2vec, Laplacian Eigenmaps and node2vec to generate embeddings �㕓Ā ∈ ℝā of characters. For word2vec embeddings are generated using the full text corpus. For the graph embedding techniques Laplacian Eigenmaps and node2vec we only use links in the training set, poten- tially putting them at a disadvantage in terms of training data. We use the resulting feature vectors to calculate the element-wise (Hadamard) product �㕓Ā ∘ �㕓ā ∈ ℝā for character pairs Ā, ā, which yields ā-dimensional features for all character pairs. We then use features of positive instances (i.e. pairs Ā, ā connected by a link (Ā, ā) in the training set) and negative instances (pairs Ā, ā not connected by a link in the training set) to train a (binary) logistic regression classi昀椀er and use the trained model to predict links in the test set. We use negative sampling to mitigate the imbalance between negative and positive instances in the training set. For the two GNN architectures we adopt the common approach to add a decoding step that computes the Hadamard product of node features a昀琀er the last message passing layer. We use a Binary Cross Entropy with Logits Loss function and train the models for 15000 epochs using an Adam Optimizer with learning rate 0.001. We use the Receiver-Operating Characteristic (ROC) to evaluate our models, i.e. we com- pute ROC curves that give the true and false positive rates across all discrimination thresholds. An example (and explanation) is included in Appendix E. We compute the Area Under Curve (AUC) of ROC curves within the unit square, which range from 0 to 1. 0.5 corresponds to the performance of a random classi昀椀er, values < 0.5 indicate worse and values > 0.5 better than random performance. We again evaluate all models using a 10−fold cross-validation. Table 4 reports average results and standard deviations of the AUC for all models. With the exception of Laplacian Eigenmaps, we 昀椀nd that graph methods generally perform better than word2vec. We further observe that GNNs perform considerably better than node2vec, where the best performance is achieved when coupling GCN with Laplacian Eigenmaps or node2vec. 5. Conclusion In summary, we used natural language processing techniques like named entity recognition and coreference resolution to construct a single character network from a corpus of works that constitute J.R.R. Tolkien’s Legendarium. Apart from characterising the network based on social network analysis, we adopt state-of-the-art graph learning techniques to (i) generate la- tent space embeddings of characters, (ii) automatically classify characters based on the work to which they belong, and (iii) predict character co-occurrences. For all three tasks, we 昀椀nd a signi昀椀cant advantage of Graph Neural Networks (GNNs) over a common word embedding technique and we 昀椀nd that a combination of both yields the best performance. Our approach 306 to construct a single graph for multiple literary texts could be interesting to analyze other corpora of works with overlapping characters (e.g. mythology, historical novels, etc.). We fur- ther believe that our results on the application of GNNs to address a link prediction task have interesting implications for computational literary studies. Considering the di昀케culty of coref- erence resolution, and the low coverage of the resulting character networks that we observed in our experiments, we expect that link prediction could potentially be used as an approach to address the low recall observed for the sentence-based co-occurrence networks. In future work, we will further consider the modelling of character co-occurrences as temporal character co-occurrence networks, where the temporal ordering of sentences determines the “time stamps” of edges in the resulting dynamic graph. This promises the application of recently developed higher-order graph modelling, visualization, and learning techniques [24, 35, 46, 25, 36], which capture patterns in the temporal ordering of edges and can thus provide insights beyond the mere graph topology. Acknowledgements Vincenzo Perri and Ingo Scholtes acknowledge support by the Swiss National Science Founda- tion, grant 176938. References [1] A. Agarwal, A. Corvalan, J. Jensen, and O. Rambow. “Social network analysis of alice in wonderland”. In: Proceedings of the NAACL-HLT 2012 Workshop on computational linguis- tics for literature. 2012, pp. 88–96. [2] D. Agarwal, D. Vijay, et al. “Genre Classi昀椀cation using Character Networks”. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). Ieee. 2021, pp. 216–222. [3] E. E. Auger. “The Lord of the Rings’ interlace: Tolkien’s narrative and Lee’s illustrations”. In: Journal of the Fantastic in the Arts 19.1 (2008), p. 70. [4] D. Bamman, O. Lewke, and A. Mansoor. “An annotated dataset of coreference in English literature”. In: arXiv preprint arXiv:1912.01140 (2019). [5] D. Bamman, S. Popat, and S. Shen. “An annotated dataset of literary entities”. In: Proceed- ings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, pp. 2138–2144. [6] M. Belkin and P. Niyogi. “Laplacian eigenmaps for dimensionality reduction and data representation”. In: Neural computation 15.6 (2003), pp. 1373–1396. [7] A. Beveridge and J. Shan. “Network of thrones”. In: Math Horizons 23.4 (2016), pp. 18–22. [8] A. Bonato, D. R. D’Angelo, E. R. Elenberg, D. F. Gleich, and Y. Hou. “Mining and modeling character networks”. In: International workshop on algorithms and models for the web- graph. Springer. 2016, pp. 100–114. 307 [9] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. “Geometric deep learning: going beyond euclidean data”. In: IEEE Signal Processing Magazine 34.4 (2017), pp. 18–42. [10] J. Bruna and X. Li. “Community detection with graph neural networks”. In: stat 1050 (2017), p. 27. [11] D. K. Elson, K. McKeown, and N. J. Dames. “Extracting social networks from literary 昀椀ction”. In: (2010). [12] R. Feldman. “Techniques and applications for sentiment analysis”. In: Communications of the ACM 56.4 (2013), pp. 82–89. [13] M. Fey and J. E. Lenssen. “Fast graph representation learning with PyTorch Geometric”. In: arXiv preprint arXiv:1903.02428 (2019). [14] T. M. Fruchterman and E. M. Reingold. “Graph drawing by force-directed placement”. In: So昀琀ware: Practice and experience 21.11 (1991), pp. 1129–1164. [15] C. Gallicchio and A. Micheli. “Graph Echo State Networks”. In: 2010, pp. 1–8. doi: 10.11 09/ijcnn.2010.5596796. [16] M. Gori, G. Monfardini, and F. Scarselli. “A new model for learning in graph domains”. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. 2 (2005), 729–734 vol. 2. [17] A. Grover and J. Leskovec. “node2vec: Scalable feature learning for networks”. In: Pro- ceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016, pp. 855–864. [18] J. Hackl, I. Scholtes, L. V. Petrović, V. Perri, L. Verginer, and C. Gote. “Analysis and Vi- sualisation of Time Series Data on Networks with Pathpy”. In: Companion Proceedings of the Web Conference 2021. Www ’21. Ljubljana, Slovenia: Association for Computing Machinery, 2021, pp. 530–532. doi: 10.1145/3442442.3452052. url: https://doi.org/10.11 45/3442442.3452052. [19] W. L. Hamilton. Graph Representation Learning. Vol. 14. 3. Morgan & Claypool Publishers, 2020, pp. 1–159. [20] M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd. “spaCy: Industrial-strength Natural Language Processing in Python”. In: (2020). doi: 10.5281/zenodo.1212303. [21] T. N. Kipf and M. Welling. “Semi-Supervised Classi昀椀cation with Graph Convolutional Networks”. In: Proceedings of the 5th International Conference on Learning Representations (Iclr). Iclr ’17. Palais des Congrès Neptune, Toulon, France, 2017. url: https://openrevie w.net/forum?id=SJU4ayYgl. [22] M. Krug. “Techniques for the Automatic Extraction of Character Networks in German Historic Novels”. doctoralthesis. Universität Würzburg, 2020. doi: 10.25972/opus-20918. [23] V. Labatut and X. Bost. “Extraction and analysis of 昀椀ctional character networks: A sur- vey”. In: ACM Computing Surveys (CSUR) 52.5 (2019), pp. 1–40. 308 [24] R. Lambiotte, M. Rosvall, and I. Scholtes. “From networks to optimal higher-order models of complex systems”. In: Nature physics 15.4 (2019), pp. 313–320. [25] T. LaRock, I. Scholtes, and T. Eliassi-Rad. “Sequential motifs in observed walks”. In: Jour- nal of Complex Networks 10.5 (2022), cnac036. [26] J. Leyerle. “The interlace structure of Beowulf”. In: University of Toronto Quarterly 37.1 (1967), pp. 1–17. [27] J. Li, C. Zhang, H. Tan, and C. Li. “Complex Networks of Characters in Fictional Novels”. In: 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS). 2019, pp. 417–420. doi: 10.1109/icis46139.2019.8940174. [28] J. Li, A. Sun, J. Han, and C. Li. “A survey on deep learning for named entity recognition”. In: IEEE Transactions on Knowledge and Data Engineering 34.1 (2020), pp. 50–70. [29] L. Lü and T. Zhou. “Link prediction in complex networks: A survey”. In: Physica A: sta- tistical mechanics and its applications 390.6 (2011), pp. 1150–1170. [30] T. Mikolov, K. Chen, G. Corrado, and J. Dean. E昀케cient Estimation of Word Representations in Vector Space. 2013. url: http://arxiv.org/abs/1301.3781. [31] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. 2013. url: http://arxiv.org/abs/1310.4546. [32] F. Moretti. Network theory, plot analysis. Literary Lab Pamphlet 2. 2011. [33] M. Newman. Networks. Oxford university press, 2018. [34] B. Perozzi, R. Al-Rfou, and S. Skiena. “Deepwalk: Online learning of social representa- tions”. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014, pp. 701–710. [35] V. Perri and I. Scholtes. “HOTVis: Higher-order time-aware visualisation of dynamic graphs”. In: International Symposium on Graph Drawing and Network Visualization. Springer. 2020, pp. 99–114. [36] L. Qarkaxhija, V. Perri, and I. Scholtes. “De Bruijn goes Neural: Causality-Aware Graph Neural Networks for Time Series Data on Dynamic Graphs”. In: arXiv preprint arXiv:2209.08311 (2022). [37] J. Qiu, Y. Dong, H. Ma, J. Li, C. Wang, K. Wang, and J. Tang. “Netsmf: Large-scale network embedding as sparse matrix factorization”. In: The World Wide Web Conference. 2019, pp. 1509–1520. [38] M. Ribeiro, R. Vosgerau, M. Andruchiw, and S. Pinto. “The complex social network from The Lord of The Rings”. In: Revista Brasileira de Ensino de Fı́sica (2016). [39] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. “The Graph Neural Network Model”. In: Trans. Neur. Netw. 20.1 (2009), pp. 61–80. doi: 10.1109/tnn.2008.200 5605. url: https://doi.org/10.1109/TNN.2008.2005605. 309 [40] F. Schröder, H. O. Hatzel, and C. Biemann. “Neural end-to-end coreference resolution for German in di昀昀erent domains”. In: Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021). 2021, pp. 170–181. url: https://aclanthology.org/2021.konv ens-1.15/. [41] T. Shippey. The road to Middle-earth: how JRR Tolkien created a new mythology. Hmh, 2014. [42] A. C. Sparavigna. “On social networks in plays and novels”. In: International Journal of Sciences 2.10 (2013). [43] A. Sperduti and A. Starita. “Supervised Neural Networks for the Classi昀椀cation of Struc- tures”. In: Trans. Neur. Netw. 8.3 (1997), pp. 714–735. doi: 10.1109/72.572108. url: https: //doi.org/10.1109/72.572108. [44] R. Sukthanker, S. Poria, E. Cambria, and R. Thirunavukarasu. “Anaphora and coreference resolution: A review”. In: Information Fusion 59 (2020), pp. 139–162. [45] R. Sukthanker, S. Poria, E. Cambria, and R. Thirunavukarasu. “Anaphora and coreference resolution: A review”. In: Information Fusion 59 (2020), pp. 139–162. doi: https://doi.org /10.1016/j.inffus.2020.01.010. [46] L. Torres, A. S. Blevins, D. Bassett, and T. Eliassi-Rad. “The why, how, and when of rep- resentations for complex systems”. In: SIAM Review 63.3 (2021), pp. 435–485. [47] P. Trilcke. “Social network analysis (SNA) als Methode einer textempirischen Literatur- wissenscha昀琀”. In: Empirie in der Literaturwissenscha昀琀. Brill mentis, 2013, pp. 201–247. [48] L. Van der Maaten and G. Hinton. “Visualizing data using t-SNE.” In: Journal of machine learning research 9.11 (2008). [49] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. “Graph Atten- tion Networks”. In: 6th International Conference on Learning Representations (2017). [50] S. Wasserman, K. Faust, et al. “Social network analysis: Methods and applications”. In: (1994). [51] R. C. West. “The Interlace Structure of The Lord of the Rings”. In: A Tolkien Compass 2 (1975), pp. 75–91. [52] J. Worsham and J. Kalita. “Genre Identi昀椀cation and the Compositional E昀昀ect of Genre in Literature”. In: Proceedings of the 27th International Conference on Computational Lin- guistics. Santa Fe, New Mexico, USA: Association for Computational Linguistics, 2018, pp. 1963–1973. url: https://aclanthology.org/C18-1167. [53] M. C. Yavuz. “Analyses of character networks in dramatic works by using graphs”. In: 2020 7th International Conference on Behavioural and Social Computing (BESC). Ieee. 2020, pp. 1–4. 310 [54] A. Zehe, L. Konle, L. K. Dümpelmann, E. Gius, A. Hotho, F. Jannidis, L. Kaufmann, M. Krug, F. Puppe, N. Reiter, A. Schreiber, and N. Wiedmer. “Detecting Scenes in Fiction: A new Segmentation Task”. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online: Association for Computational Linguistics, 2021, pp. 3167–3177. doi: 10.18653/v1/2021.eacl- main.276. url: https://aclanthology.org/2021.eacl-main.276. [55] M. Zhang and Y. Chen. “Link prediction based on graph neural networks”. In: Advances in neural information processing systems 31 (2018). [56] M. Zhang, Z. Cui, M. Neumann, and Y. Chen. “An End-to-End Deep Learning Architec- ture for Graph Classi昀椀cation”. In: Proceedings of the Thirty-Second AAAI Conference on Arti昀椀cial Intelligence and Thirtieth Innovative Applications of Arti昀椀cial Intelligence Con- ference and Eighth AAAI Symposium on Educational Advances in Arti昀椀cial Intelligence. Aaai’18/iaai’18/eaai’18. New Orleans, Louisiana, USA: AAAI Press, 2018. [57] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun. “Graph neural networks: A review of methods and applications”. In: AI Open 1 (2020), pp. 57–81. 311 A. Alternative Approaches to Detect Character Co-Occurrences As an alternative to the sentence-based co-occurrence approach described in Section 2, we also evaluated two other approaches in initial experiments: detection based on (i) the parse tree and (ii) a sliding text window. For (i), we marked all sentences as an “interaction” between two characters if they had the same head word in the parse tree. This is the most strict version of our character network construction, as it only captures explicit interactions (e.g., “Frodo saw Sam”). Due to the small number of detected interactions, we discarded this strategy. On the other hand, (ii) is more lenient than our 昀椀nal sentence-based approach, only requiring two characters to be mentioned within a window of a 昀椀xed number of characters (letters). This ap- proach of extracting character co-occurrences within a sliding text window has been adopted in a number of prior works [7, 8]. It allows to capture interactions between characters that are not mentioned in the same sentence, but introduces the risk of detecting a large number of spurious interactions. In our experiments, we chose a window size of 2000 characters, with the additional restriction that chapter borders may not be crossed. Aiming for character net- works that maintain a balance between recall (i.e. detecting all meaningful character links) and precision (i.e. limiting the number of spurious links), we decided to use the sentence-based in- teraction detection. B. Narrative Charts for The Hobbit and The Silmarillion Complementing the results in Section 2, in Figure 7 we present two additional narrative charts that we generated for The Hobbit and The Silmarillion. Hobbit Silmarillion Balin Arda Bard Celegorm Bert Elves Bilbo Felagund Dain Fingolfin Dori Fingon Elrond Fëanor Fili Húrin Gandalf Lúthien Gollum Maedhros Kili Melkor Smaug Morgoth Thorin Turgon Tom Túrin William the Valar Figure 7: Narrative charts for The Hobbit (le昀琀) and The Silmarillion (right) 312 C. Visualisation of Additional Latent Space Embeddings In Figure 8 and Figure 9 we include additional visual representations of latent space embeddings of characters obtained by those methods that have not been included in the main text. For all methods, we employed t-SNE [48] to reduce the dimensionality of the latent space embeddings to two dimensions. D. Node Classification and Link Prediction with Weighted GNNs Complementing the results discussed in the main article, in Table 5 we include additional results for which we applied Graph Neural Networks to weighted graphs, i.e. di昀昀erent from Table 4 we consider a character network with link weights ā((Ā, ā)) that capture the number of co- occurrences of two characters Ā, ā. Due to time constraints, we performed this analysis only for the best performing Graph Neural Network, i.e. GCN. These additional results suggests that, at least for our corpus, the inclusion of link weights does not signi昀椀cantly improve the performance of models for character classi昀椀cation and link prediction. Table 5 Additional results for node classification in the Legendarium graph where links carry weights that cap- ture the number of character co-occurrences . Character classification Link Prediction F1-score Precision Recall ROC/AUC GCN20 LE 60.85 ± 0.12 60.84 ± 0.16 63.40 ± 0.10 79.41 ± 1.83 GCN20 node2vecĂ = 1, ă = 1 92.14 ± 0.11 95.87 ± 0.07 91.28 ± 0.11 86.23 ± 0.99 GCN20 OHE 87.92 ± 0.10 93.02 ± 0.08 86.36 ± 0.11 80.52 ± 1.76 GCN20 word2vec 92.96 ± 0.10 95.67 ± 0.07 92.03 ± 0.10 82.88 ± 2.08 E. Explanation of ROC/AUC Evaluation of Link Prediction In Section 4 we use the Area Under Curve (AUC) of a Receiver-Operating Characteristic (ROC) curve to evaluate the performance of our models in link prediction, which is a common ap- proach to evaluate the diagnostic quality of binary classi昀椀ers in information retrieval and ma- chine learning. A key advantage of this approach is that it enables us to evaluate the perfor- mance of a binary classi昀椀er across all possible discrimination thresholds, which can be adapted to tune the sensitivity/speci昀椀city of the prediction depending on application requirements. To assist the reader to follow this evaluation approach, below we explain one exemplary ROC curve obtained for a link prediction in the Legendarium graph using the a node2vec embed- ding of characters and a logistic regression model. To generate this curve, we 昀椀rst consider the prediction scores (i.e. in the case of logistic regression the positive class probability) assigned to each node pair in the test set, where a link is predicted whenever the score is above a given discrimination threshold �㔖. For each value of �㔖 we can now calculate the true and false positive 313 (a) node2vec (b) Graph Attention Networks with additional word2vec features Figure 8: Latent space embedding of characters in Tolkien’s Legendarium using node2vec (a) and Graph Attention Networks with additional node2vec features (b). 314 (a) Graph Convolutional Networks with one-hot-encoding (OHE) (b) Laplacian Eigenmap Figure 9: Latent space embedding of characters in Tolkien’s Legendarium using GCN with one-hot encoding (OHE) (a) and Laplacian Eigenmap (b). 315 rate (TPR and FPR), i.e. the fraction of those predicted links for which the prediction is cor- rect and the fraction of unconnected node pairs for which a link is predicted errorneously. A sweep over all possible discrimination thresholds �㔖 now yields a ROC curve in the unit square. In Figure 10 we show the ROC curve of a logistic regression model using node2vec features. A classi昀椀er that perfectly classi昀椀es all instances in the data will assume initial values of FPR=0 and TPR=0 only for the maximal discrimination threshold of �㔖 = 1, where all instances are assigned to the negative class. For any �㔖 smaller than the maximum and larger than the mini- mum value of zero, a perfect classi昀椀er correctly predicts all instances, which yields TPR=1 and FPR=0. For the minimum value of �㔖 = 0, the classi昀椀er necessarily predicts the positive class for all instances, which yields TPR=1 and FPR=1. We thus 昀椀nd that the ROC curve of a per- fect classi昀椀er follows the le昀琀 and upper border of the unit square, which yields an Area Under Curve (AUC) of one. Conversely, the ROC curve of a classi昀椀er that consistently predicts the opposite of the true class follows the bottom and right border of the unit square, which yields an Area Under Curve (AUC) of zero. For a classi昀椀er with no diagnostic ability, the FPR and TPR are expected to increase equally as we lower the discrimination threshold �㔖, i.e. the ROC curve follows the so-called diagonal of no-discrimination (see red dashed line in Figure 10) and the Area Under Curve is expected to be close to 0.5. Predicting character interactions with node2vec 1.0 0.8 0.6 TPR 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR Figure 10: Exemplary ROC curve for link prediction using node2vec embedding and logistic regression. F. Evaluation of Link Prediction for Individual Works In the main text we present and discuss the performance of di昀昀erent techniques to predict links in a single character network that spans Tolkien’s Legendarium. Apart from this analysis, we additionally evaluated link prediction in character co-occurrence networks that have been generated for the three works of our corpus, i.e. The Silmarillion, The Hobbit, and The Lord of the Rings separately. For the sake of completeness, we include these results in Table 6 below. Like in Appendix D, we applied the Graph Convolutional Network (GCN) model on a weighted 316 graph, while Graph Attention Networks (GAT) were applied to an unweighted graph.5 Table 6 Additional results for link prediction in character co-occurrence networks constructed for individual works in our corpus. The Silmarillion The Hobbit The Lord of the Rings ROC/AUC ROC/AUC ROC/AUC GCN20 LE 80.38 ± 1.97 71.87 ± 5.94 76.25 ± 2.05 GCN20 node2vecĂ = 1, ă = 1 81.74 ± 2.3 63.86 ± 5.57 76.61 ± 2.87 GCN20 OHE 74.78 ± 1.9 70.22 ± 8.76 63.15 ± 4.37 GCN20 word2vec 76.82 ± 2.68 70.23 ± 7.15 67.42 ± 2.77 GAT20 LE 68.99 ± 3.46 76.88 ± 4.53 61.3 ± 7.38 GAT20 node2vecĂ = 1, ă = 1 79.37 ± 2.13 71.21 ± 6.53 71.92 ± 2.69 GAT20 OHE 70.88 ± 3.12 71.52 ± 8.14 63.53 ± 3.68 GAT20 word2vec 73.72 ± 4.13 72.29 ± 7.78 64.52 ± 4.29 GCN20 LE (weighted) 80.4 ± 1.44 69.84 ± 7.23 81.31 ± 1.96 GCN20 node2vecĂ = 1, ă = 1 (weighted) 82.7 ± 1.92 68.09 ± 7.42 76.46 ± 3.56 GCN20 OHE (weighted) 74.78 ± 3.3 67.25 ± 9.41 63.15 ± 3.12 GCN20 word2vec (weighted) 78.8 ± 1.8 62.97 ± 22.35 64.53 ± 1.79 5 We resorted to an unweighted graph due to a supposed implementation error in the weighted GAT implementation in version 2.0.4 of the graph learning library pytorch-geometric [13]. 317