One Graph to Rule them All: Using NLP and Graph
Neural Networks to analyse Tolkien’s Legendarium
Vincenzo Perri1,∗,† , Lisi Qarkaxhija2,∗,† , Albin Zehe3,∗,† , Andreas Hotho3,∗ and
Ingo Scholtes2,1
1
  Data Analytics Group, Department of Informatics(IfI), Universität Zürich, CH-8050 Zürich, Switzerland
2
  Chair of Informatics XV - Machine Learning for Complex Networks, Center for Arti昀椀cial Intelligence and Data
Science (CAIDAS), Julius-Maximilians-Universität Würzburg, D-97074 Würzburg, Germany
3
  Chair of Informatics X - Data Science, Center for Arti昀椀cial Intelligence and Data Science (CAIDAS),
Julius-Maximilians-Universität Würzburg, D-97074 Würzburg, Germany


                                         Abstract
                                         Natural Language Processing and Machine Learning have considerably advanced Computational Lit-
                                         erary Studies. Similarly, the construction of co-occurrence networks of literary characters, and their
                                         analysis using methods from social network analysis and network science, have provided insights into
                                         the micro- and macro-level structure of literary texts. Combining these perspectives, in this work we
                                         study character networks extracted from a text corpus of J.R.R. Tolkien’s Legendarium. We show that
                                         this perspective helps us to analyse and visualise the narrative style that characterises Tolkien’s works.
                                         Addressing character classi昀椀cation, embedding and co-occurrence prediction, we further investigate the
                                         advantages of state-of-the-art Graph Neural Networks over a popular word embedding method. Our
                                         results highlight the large potential of graph learning in Computational Literary Studies.

                                         Keywords
                                         computational literary studies, character networks, network analysis, graph neural networks, NLP


1. Motivation and Background
Computational Literary Studies (CLS) have recently taken advantage of the latest developments
in Natural Language Processing (NLP), with deep learning techniques bringing major improve-
ments for tasks relevant to literature analysis: Examples include named entity recognition
[28], anaphora and coreference resolution [44], sentiment analysis [12], scene detection [54]
or genre classi昀椀cation [52]. Apart from NLP, machine learning has recently shown great po-
tential in data with complex relational structure that can be represented as graph or network
�㔺 = (�㕉 , �㔸), consisting of nodes ÿ, Ā, … ∈ �㕉 and links (ÿ, Ā) ∈ �㔸. In CLS, this abstraction is
frequently used to study character networks, i.e. graphs where nodes represent literary charac-
ters and links represent relationships such as, e.g., their co-occurrence in a sentence or scene,
CHR 2022: Computational Humanities Research Conference, December 12 – 14, 2022, Antwerp, Belgium
∗
  Corresponding author.
†
  These authors contributed equally.
£ perri@ifi.uzh.ch (V. Perri); lisi.qarkaxhija@uni-wuerzburg.de (L. Qarkaxhija);
zehe@informatik.uni-wuerzburg.de (A. Zehe); hotho@informatik.uni-wuerzburg.de (A. Hotho);
ingo.scholtes@uni-wuerzburg.de (I. Scholtes)
ȉ 0000-0002-9472-0783 (A. Zehe); 0000-0002-0483-5772 (A. Hotho); 0000-0003-2253-0216 (I. Scholtes)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                        291
or dialogue interactions [23]. Building on this abstraction, several works in CLS used (social)
network analysis to study the narrative structure of literary works [32, 47]: Considering 19th
century novels, Elson, McKeown, and Dames [11] analysed macroscopic properties of conver-
sation networks, i.e. 昀椀ctional characters engaging in dialogue. Beveridge and Shan [7] applied
centrality measures and community detection to a network of characters that occur within
close proximity in the text of A Storm of Swords, the third novel in George R.R. Martin’s series
Song of Ice of Fire. Using the same network extraction, Bonato, D’Angelo, Elenberg, Gleich,
and Hou [8] studied statistical properties of character networks in three popular novels. Sev-
eral authors applied centrality measures to identify important characters, e.g. in works by
Shakespeare [53], Alice in Wonderland [1], or the 昀椀rst novel of the Harry Potter series [42]. In
a recent work, Agarwal, Vijay, et al. [2] used character networks to facilitate the automated
classi昀椀cation of literary genres. Using J.R.R. Tolkien’s The Lord of the Rings, which we also
consider in our manuscript, Ribeiro, Vosgerau, Andruchiw, and Pinto [38] applied social net-
work analysis to its network of characters. Li, Zhang, Tan, and Li [27] studied small-world and
scale-free properties of character co-occurrence networks in movie scripts, among them the
movie adaptation of The Lord of the Rings.
    Existing studies of character networks mainly used methods from (social) network analysis
to gain insights into the narrative structure of literary texts. At the same time, recent advances
in geometric deep learning [9] and Graph Neural Networks (GNNs) provide new ways to apply
deep learning to graph-structured data, which creates interesting opportunities for the study of
character networks. A major advantage of GNNs over other network analysis or machine learn-
ing techniques is their ability to leverage relational patterns in the topology of a graph, while
at the same time incorporating additional node or link features. This facilitates unsupervised
and (semi-)supervised learning tasks at the node, link, or graph level. Examples include graph
representation learning [34, 17], community detection [10], node and link classi昀椀cation [21,
49], link prediction [55], or graph classi昀椀cation [56]. To the best of our knowledge, no works
have combined recent advances in (i) natural language processing, e.g. to recognize entities, re-
solve coreferences, extract meaningful character networks, or generate word embeddings, and
(ii) graph neural networks, e.g. to compute latent space representations of 昀椀ctional characters
or address node- and link-level learning tasks. In CLS, the combination of these paradigms
can help with several tasks, e.g. (semi-)supervised character classi昀椀cation, semantic analyses
of character relationships, comparisons of character constellations in di昀昀erent works, or auto-
mated genre classi昀椀cation. Addressing this gap, we combine NLP, network analysis and graph
learning to analyse J.R.R. Tolkien’s Legendarium, namely The Silmarillion, The Hobbit, and the
three volumes of The Lord of the Rings. Our contributions are:
      • We apply entity recognition and coreference resolution to detect and disambiguate char-
        acters in Tolkien’s Legendarium. We report key statistics like character mentions via
        pronouns, nominal mentions or explicit references and compare them across di昀昀erent
        works in the Legendarium. We use sequential character mentions to generate narrative
        charts of the 昀椀ve considered works, which highlight Tolkien’s interlaced narrative style.
      • We extract character networks for each work in our corpus, i.e. graphs �㔺 = (�㕉 , �㔸) where
        nodes Ā ∈ �㕉 capture characters in the Legendarium, while undirected edges (Ā, ā) repre-
        sent the co-occurrence of two characters in the same sentence. Apart from generating
        character networks for the 昀椀ve works in our corpus, we generate a single network that


                                               292
       captures all characters in the works that constitute Tolkien’s Legendarium. We use this
       to perform a macroscopic, graph-based characterisation of Tolkien’s works.
     • We address the question to what extent graph learning techniques can leverage the topol-
       ogy of automatically extracted character networks, and which advantages they provide
       over word embeddings that just consider word-context pairs. To this end, we evaluate
       the performance of di昀昀erent methods for the (i) latent space representation of charac-
       ters, (ii) (semi-)supervised classi昀椀cation of characters, and (iii) prediction of character
       co-occurrences. The results con昀椀rm that the inclusion of topological information from
       character networks considerably improves the performance of these tasks.
   Analysing a single graph representation of multiple literary works unfolding in the same
昀椀ctional universe, our work demonstrates the potential of graph neural networks for compu-
tational literary studies. To facilitate the use of our data and methods in the (digital) study of
Tolkien’s works1 , we make the results of our entity recognition, coreference resolution and
character network extraction available. To foster the application of our methods to other cor-
pora, we also provide a set of jupyter notebooks that reproduce our 昀椀ndings.2


2. Text Corpus and Data Processing
We consider the English full text of The Lord of the Rings (consisting of the volumes The Fel-
lowship of the Ring, The Two Towers and The Return of the King, each split into two books), The
Hobbit and The Silmarillion. We used the python-based NLP pipeline BookNLP3 to extract lin-
guistic and literary features. BookNLP uses the NLP service spaCy [20] to perform tokenisation,
that is, splitting text into a list of words and special characters (like punctuation) and to split
text into sentences. This 昀椀rst step in the NLP pipeline is the basis for all further processing
steps.

Entity Recognition and Coreference Resolution Entity recognition refers to the task of
detecting all references to entities (e.g., characters, location) in a text corpus. These references
can either be explicitly named references (e.g., “Bilbo Baggins”, “Smaug”), noun phrases (e.g.,
“the hobbit”, “the dragon“) or pronouns (e.g., “she”, “they”). BookNLP uses an entity annotation
model that has been trained on a large annotated data set [5] to identify named entities, noun
phrases as well as pronoun references. A昀琀er these references have been detected, in a next
step coreference resolution can be applied, which is a very hard task in general [45] and is
especially hard in the context of literary texts due to the high variation of references used
and the very long texts [40, 22]. Con昀椀rming this view, our initial analyses revealed that the
performance of BookNLP’s coreference resolution, which was trained on a data set of annotated
coreferences [4] was not satisfactory when applying it to our corpus. We thus decided to focus
on named references, and resolve these using a set of simple manually-created disambiguation
rules (e.g., “Sam” → “Sam Gamgee”, “Peregrin” → “Pippin”).4 Although this approach may yield

1
  https://digitaltolkien.com/.
2
  https://github.com/LSX-UniWue/one-graph-to-rule-them-all.
3
  https://github.com/booknlp/booknlp.
4
  The full list of disambiguation rules can be found in our repository.


                                                         293
a low recall (i.e. there are many unidenti昀椀ed coreferences since pronouns and noun phrases
are not considered), we 昀椀nd that this coreference resolution yields high precision (i.e. almost
all resolved coreferences that we inspected manually were correct). We found this approach
preferable over a “full” coreference resolution for two reasons: First, considering our focus
on character networks, a coreference resolution with high recall but lower precision would
give rise to many spurious character co-occurrences that would harm our analyses of graph
learning techniques. Second, our corpus of Tolkien’s Legendarium is special in the sense that
it has a large number of named references, which give rise to rich character networks despite
limiting our view to named references.

Extraction of Character Co-Occurrences A昀琀er 昀椀nding references to characters, we next
extract co-occurrences of pairs of characters that can be used to build character networks.
While the co-occurrence of characters does not necessarily imply an interaction between them,
due to its simplicity it is a frequently used approach to construct character networks in CLS
[7, 23, 8]. A昀琀er evaluating di昀昀erent strategies (cf. Appendix A), we decided to extract each
co-occurrence of two characters in the same sentence.

Descriptive Statistics of Text Corpus In Table 1, we provide key summary statistics that
characterize the di昀昀erent works in our corpus. Apart from di昀昀erences in terms of tokens, we
昀椀nd striking di昀昀erences between character mentions in the 昀椀ve texts: The Silmarillion uses
considerably fewer pronoun mentions than the other four works, while using more explicit
references by name. We attribute this to the compressed writing style of The Silmarillion, which
rather resembles a 昀椀ctional historical record that lists character names and locations, and gives
a chronology of events compared to the more conventional prose style of The Hobbit and The
Lord of the Rings.

Table 1
Summary statistics of the text corpus used in our analysis.
                               Hobbit    TLotR Vol. I   TLotR Vol. II   TLotR Vol. III   Silmarillion
 #tokens                       113,235       224,156        189,539          163,191         147,833
 #character mentions            15,477        31,770          27,576          24,766          26,468
 % Pronouns                      56.19         54.11           56.27           51.65           37.48
 % Nominal mentions               27.7         22.76            22.3           25.93           29.15
 % Explicit references           16.11         23.13           21.42           22.42           33.38
 Character Co-occurrences          431           886             615             844           1,674

   We calculate the number of character co-occurrences for each work in our corpus, which we
indicate in Table 1 and visualise in Figure 1a – Figure 1f. The overall number of co-occurrences
is in昀氀uenced by the number of explicit references (cf. Table 1), since we only extract co-
occurrences if both characters are mentioned by name in the same sentence.

Character-based Visualization of Narrative Structure We 昀椀nally use the character men-
tions in conjunction with the chapter in which they occurred to automatically produce a nar-
rative chart for the three volumes of The Lord of the Rings. To avoid occurrences of characters


                                                 294
that are only mentioned while not being present, we excluded mentions of characters within
dialogues, as detected by BookNLP. The narrative charts for the three volumes of The Lord of
the Rings are shown in the three panels of 昀椀g. 2. The columns within each panel represent
chapters. To ease the presentation, we focus on a selected set of main characters shown in
the rows. The color of each row-column cell represents the fraction of the number of times
a characters is mentioned in a given chapter, relative to the number of mentions of other key
characters presented in the narrative chart.
   While it is easy to generate those charts, we can use them to identify major plot lines and
reveal a narrative structure that is characteristic for The Lord of the Rings: In volume I, we
see that Frodo and the Hobbits maintain a chief role throughout book I and II, i.e. both parts
of volume I of The Lord of the Rings. A shi昀琀 is visible for the second half (book II), which
coincides with the Hobbits’ arrival in Rivendell. In volume II, we see a clear transition in
narrative structure that is due to Tolkien’s adoption of an interlacing narrative style [51]. The
昀椀rst half of volume II (book III) focuses on the main characters Aragorn, Legolas, and Gimli,
and their attempt to rescue Merry and Pippin from the Uruk-Hai. Leaping back in time, the
second half of volume III (book IV) focuses on the journey of Frodo, Sam, Gollum and their
encounter with Faramir, which coincides with a brief absence of Gollum as he hides from the
rangers. Following the original titles suggested for the six books that constitute the three
volumes of The Lord of the Rings [41], these interlaces can be called The Treason of Isengard
and The Journey of the Ring-bearers. In volume III, we see a similar but less marked separation


        (a) TLoTR, Vol. I                (b) TLoTR, Vol. II              (c) TLoTR, Vol. III


          (d) The Hobbit                (e) The Silmarillion              (f) Legendarium
Figure 1: Number of co-occurrences between the ten most frequently mentioned characters in the five
works of our corpus (a) – (e) and the whole Legendarium (f).


                                               295
                     Fellowship                Two Towers            Return of the King
     Aragorn                                                                                      1.0
         Bilbo
     Boromir
     Faramir                                                                                      0.8
        Frodo
     Gandalf
        Gimli                                                                                     0.6
      Gollum
     Legolas
       Merry                                                                                      0.4
       Pippin
         Sam
    Saruman
                                                                                                  0.2
      Sauron
    Théoden                                                                                       0.0
Figure 2: Narrative charts for the three volumes of The Lord of the Rings. Rows indicate key characters,
while columns represent chapters. Colours capture the relative frequency of character mentions relative
to other key characters captured in the charts. In Vol. I (le昀琀 panel) Frodo and the Hobbits maintain a
central role across the whole volume. For Vol. II and III (middle and right panel) clear transitions in the
narrative structure are visible, which are due to an interlacing narrative style.


between two interlaced plot lines: The 昀椀rst half (book V) focuses on the War of the Ring in
Gondor, while book VI continues the story of Frodo’s and Sam’s journey to Mount Doom,
followed by the Scouring of the Shire, which explains Merry’s and Saruman’s reappearance
in the last part. The original titles referring to those interlaces are The War of the Ring and
The End of the Third Age. The non-linear narrative style expressed in the narrative charts
in Figure 2 is common in early medieval literature [26]. Tolkien likely adopted it due to his
familiarity with medieval literature, to increase suspense and to achieve a sense of historicity
[3]. Since these charts capture the narrative structure that we would expect, we can assume
that our character extraction, even though using only limited coreference resolution, captures
the overall occurrences of key characters rather well. In Appendix B we include additional
narrative charts for The Hobbit and The Silmarillion.


3. Analysis of Character Co-occurrence Networks
Building on Section 2 we now use character co-occurrences to construct character networks for
J.R.R. Tolkien’s Legendarium. A graph or network is de昀椀ned as tuple �㔺 = (�㕉 , �㔸), where ÿ, Ā, ... ∈
�㕉 represent nodes and (ÿ, Ā) ∈ �㔸 ⊆ �㕉 ×�㕉 is a set of directed or undirected links. For our analysis,
we model characters as nodes �㕉 and their co-occurrences within a sentence as undirected links
�㔸, i.e. (ÿ, Ā) ∈ �㔸 ⟹ (Ā, ÿ) ∈ �㔸 for all ÿ, Ā ∈ �㕉 . We further add link weights ā ∶ �㔸 → ℕ, i.e.
we assign a value ā((ÿ, Ā)) to each link (ÿ, Ā) that counts the co-occurrences of ÿ and Ā. We
adopt this de昀椀nition to construct character networks for each of the four works in our corpus. A
particularly interesting aspect of our corpus is that all works refer to a single Legendarium, with
frequent cross-reference, and thus, overlaps in terms of characters. We use this to construct a
single network of characters across all works, which we call Legendarium Graph. Except for a
single disconnected pair of nodes that we removed, the resulting graph has a single connected


                                                   296
component with 238 nodes, which enables a macroscopic analysis of Tolkien’s Legendarium.
Figure 3 shows a visualisation of the character network that has been generated using the
force-directed layout algorithm by Fruchterman and Reingold [14]. This algorithm simulates
attractive forces between connected nodes (and repulsive forces between all nodes) such that
their positions in the stable state of the resulting dynamical system highlight patterns in the
topology.

Table 2
Key network metrics for the character networks in our corpus, where Legendarium refers to the graph
spanning all works.
                                  nodes   edges   density   mean degree   diam.   avg. shortest path
 The Hobbit                          30   119        0.14           8.0       4                  1.9
 The Lord of the Rings Vol. I        65   217       0.052          6.68       5                 2.31
 The Lord of the Rings Vol. II       55   169       0.057          6.13       5                  2.4
 The Lord of the Rings Vol. III      45   159        0.08          7.07       5                 2.26
 The Silmarillion                   136   803       0.044         11.81       5                 2.41
 Legendarium                        238   1233      0.023         10.48       6                 2.84


Network Size, Density, and Mean Degree Modelling character networks enables us to
compute metrics from social network analysis, which we report in Table 2. The 昀椀rst columns
                                                                                        ÿ
of Table 2 show the number of nodes Ā = |�㕉 | and links ÿ = |�㔸| and the density �㔌 = Ā(Ā−1) of each
character network, where the latter captures which fraction of possible links actually occurred.
We 昀椀nd that The Hobbit and The Silmarillion have the smallest and largest number of nodes
respectively. The Hobbit has a higher link density, which is likely because the plot is focused
on a small number of strongly interacting characters. For The Lord of the Rings we see that the
number of nodes for Vol. III is smaller than for the previous two volumes, while the density is
larger. This re昀氀ects the fact that the last volume is strongly focused on interactions between
small groups of characters, e.g. Frodo, Sam and Gollum. The degree of a node Ā ∈ �㕉 is de昀椀ned
as the number of other nodes to which it is connected by links, i.e. ā(Ā) ∶= |{ÿ ∶ (ÿ, Ā) ∈ �㔸}|.
The fourth column in Table 2 reports the mean node degree of characters, where larger mean
degrees are associated with higher link density.

Shortest paths, Diameter and Betweenness Centrality An important feature of (social)
networks is the structure of shortest paths between nodes, which provide a topological notion
of pair-wise distances [33]. We calculate shortest paths between all pairs of nodes and report
the average shortest path length across all character pairs. We further calculate the diameter,
i.e. the maximum shortest path length between any pair of nodes. The results in Table 2 show
that, on the one hand, the average shortest path length is associated with the number of nodes
(smallest average shortest path length for the smallest networks The Hobbit and The Lord of
the Rings Vol. III). On the other hand, it is in昀氀uenced by mean degree and link density, which
explains why The Lord of the Rings Vol. III and The Silmarillion have similar values despite the
latter having more than twice as many nodes. The shortest paths between characters allows us


                                                  297
                                                                                                                                                   Dain


                                                                                                                                                                         Déagol

                                                                                                                       Arod

                                                                                                                                      Esgaroth

                                                                                                                                                                                                      Girion

                                                                                                                                Bard                                      Smaug
                                                                 the Eagles
                                                                                                              Thrain

                                                                                                                                         Bert                               William
                                                                                                                                                                                                                                Snowmane
                                                                       Bifur
                                                                                                       Fili                                                                           Hama
                                                                                       Gloin
                                                                                                 Kili                                                                                             Hasufel                                   Thengel
                                                                                                                                                                 Eorl
                                                                               Oin                                Thorin
                                                                                        Nori                                                                             Thror                                 Ingold           Pimple
                                                                                                                                                                                      Maggot
                                                                          Dori
                                                                                                                                                                                                                                                                   Farmer Cotton
                                                                                                                                                                                                                                                       Dale
                                                                                                                                              Meriadoc                               Eomer
                                                                                                                                                          the Prince Imrahil
                                                                                                                           Balin
                                                                       Dwalin
                                                                        Helm                                                                                                                                                                                                                                    the Entwives
                                                                                                                                                                                            Éowyn                                                                     Bob
                                                                                                                       Fatty Bolger                                                                                                 the Shadow
                                                                                               Durin
                                                                                                                                       Shadowfax
                                                                                                                                                           Drogo
                                                                                                                                                                                            Gollum Grima Wormtongue
                                                                                                                  Ioreth                                                            Théoden
                   Erkenbrand                                                                                                                             Cirith Ungol                                                                                                                                                 Erestor
                                                                                                                                         Elladan                                                                    Barliman Butterbur
                                                                                                                                                                                                                                                                                               Troll
the Company                                                                                              Halbarad             Bilbo                                     Tom Bombadil                                                     Nob
                                                                                                                                                                                                     Grishnakh
                       Hirgon           the Great Goblin                                        Bill                                                                                                                                             Bill Ferny
                                                                                                                                Elrohir
                                                                                                                                              Gandalf                                         the Dead
                                                                                                                                                                                                                                                                       Anborn          Gamling
                                                                                                                                                                         Imrahil
                                                                                                                                                 Celeborn                              Merry                             Gil-galad
                                   Bergil                                                              the Shirriffs                                                                             Beregond                                  Rosie Cotton
                                                                                                                                                     Black Riders
                                                                                                                                                                          Aragorn                                                                                                                Farmer Maggot
                                                                                                                                                                                                       the Rohirrim                                                                                                       Bregalad
                                                                                                                                                                                       Isildur                                                            Shelob
                                                                                           Caradhras                     Boromir                                        Pippin
           Gwaihir                                                                                                                              Gimli                                                                               Ents
                                               Goldberry                                                                                                                                      Sauron             Glorfindel
                                                                                                                                                        Legolas                                                                              Hobbits
                                                                                                                                                                   Faramir                                                                                                                             Huorns
                                                                                                                                            Ugluk                                       Frodo
                                                                                                                          Arathorn                                                                      Dwarves
                                                               the Isengarders                                                                                                                                                                            Lotho
                                                                                                                                                                          G lóin     Sam                                       Treebeard

                                                                               the Galadhrim
                                                                                                                               Denethor
                                                                                                                                              Elrond                                          Saruman                                                                                                       Quickbeam
                                                                                                                        Lothlórien                                                                                                                                                Shagrat

                                                                                                                                                                                                  Elendil
                                                                                                                                                                                                                                             the Gaffer

                                                                                                                                                                                                                                  Damrod                                                     the Sackville-Bagginses
                       Almaren
                                                                                                t he D úneda in                                                                                                                                                                                                          Snaga
                                                                                                                                                                                                                                                              Gorbag
                                                                                                                                       Haldir     Galdor
                   Bregolas
                                                      Nienna                                                                                            Orcs
                                                                                                                                                                                                               T inúviel
                                            Lórien
               King Finrod Felagund
                                                                                                                       Galadriel                                                                                                                                   Lobelia
                                                                                                                                            Elves                            Anar
M´ıriel                                                                                                                                                 the Firstborn
                                                t he N úm enór ea ns                   Varda                                        Beren
                                                                                                              the Dwarves
                                                                                                                                                                          Mablung
           Estë
                                             Elwë                                                the Haladin
                                                                                                                                      Elros L út hien                                                                                                                  Otho
                                                                                                                               C´ırdan
                         Barahir                                               Belegund
                                                                                                        the Edain
                                               Oromë             Tilion                                                       Nienor
                                                                                                              Huor
                                                                                        Hador                        Maglor                                                                    Mˆım
                                                                                Felagund                    ElwingMorwen
                                                                                                                                                                                                                                             Daeron
                                                          the Trees                                                     Melian                                                                                       Amandil
                                                 Aegnor                                       FingonE¨
                                                                                                     arendil
                                                                          Aulë        Melkor              Maedhros            Nimloth
                                                                                                the Valar           Finduilas          T úr in
                                                 the Vanyar                    Bëor                                     Gwindor
                                                                                            Fëanor Morgoth Menegroth H úr in
                                                                                          Dior
                                                                Huan                                                                                                                                                                                      Dorlas
                                                                                                          Caranthir                                                                   Beleg
                                                                                 Fingolfin                                   Eönwë
                                                       Angrod                                          Thingol                                                 Valinor
      Ingwë                                                 Aredhel             Finrod Celegorm
                                               the Teleri                                                                                           Arien                                        Turambar
                                                                  Orodreth                    King Thingol
                                                              Ulmo                   Turgon            the Naugrim
                                                                            Finarfin
                   Taniquetil                                                          Tuor                Nogrod                                                Amras
                                                                the Noldor                     Curufin
                                                          Thorondor            Idril                                                                    Amrod
                                                                                      Finrod Felagund
                                                               Ar-Pharazôn                 Laurelin      Mandos                                                 Manwë
                                                                                          Maeglin
                                                                                                                                                                                                                            Brandir
     Draugluin
                                                                                                        Ossë                                             L út hien T inúviel
                                                                                                                                   H úr in T ha lion
                                                                                                                                                                                                      Gelmir


                                                                                  Gothmog
                                                                Eöl                                                                              I lúva t ar
                                                                                                                 Tirion
                                                                                                                                                                                 Marach                               N´ıniel


                                                                                                                                       Bereg


Figure 3: Force-directed visualisation of automatically extracted character network for Tolkien’s Leg-
endarium with 238 nodes and 1233 links. Node colours indicate the works in which a character occurs
most frequently (red: The Hobbit, green: The Lord of the Rings, blue: The Silmarillion). Node sizes are
logarithmically scaled based on node degrees, edge thickness are logarithmically scaled based on edge
weights, i.e. number of character co-occurrences. Visualisation was created using the Open Source net-
work analysis and visualisation library pathpy [18]. This character networks facilitates a macroscopic,
graph-based analysis that considers intertextual aspects in J.R.R. Tolkien’s Legendarium.


                                                                                                                                                                        298
Table 3
Ranking of characters in Tolkien’s Legendarium based on degree and betweenness centrality.
                                        degree         betweennes
                                   1    Frodo          Elves
                                   2    Sam            Frodo
                                   3    Gandalf        Gandalf
                                   4    Aragorn        Bilbo
                                   5    Pippin         Galadriel
                                   6    Merry          Aragorn
                                   7    Morgoth        Morgoth
                                   8    Bilbo          Elrond
                                   9    Legolas        Lúthien
                                   10   Gimli          Orcs
                                   11   Elrond         Sam
                                   12   Gollum         Saruman
                                   13   Elves          Pippin
                                   14   Túrin          Merry
                                   15   Húrin          Húrin


to de昀椀ne a path-based notion of centrality. The betweenness centrality [50] of node Ā captures
the number of shortest paths between any pair of nodes that pass through Ā, i.e. nodes have
higher betweenness centrality if they are important for the “communication” between other
nodes. The second column of table 3 reports characters with the highest betweenness centrality
for the single Legendarium graph. We note the high betweenness centrality Galadriel, who
despite an ephemeral appearance in The Lord of the Rings and absence in The Hobbit is one of
few characters that link the narrative across di昀昀erent ages in Tolkien’s mythology.


4. Application of Graph Neural Networks
We now turn our attention to the application of state-of-the-art graph learning techniques to
the character network of Tolkien’s Legendarium, which has been introduced and characterised
in the previous section. Our analysis is focused on our guiding research question outlined in
Section 1, i.e. what additional information we can draw from the topology of the character
network, compared to an application of standard machine learning to a word embedding tech-
nique. A major hurdle for the application of machine learning to character networks is that
standard techniques like, e.g., logistic regression, support vector machines, or neural networks
require features in a continuous vector space. Their application to discrete objects like graphs
typically requires a two-step procedure that consists of (i) a representation learning or embed-
ding step to extract vectorial features of nodes and/or links, and (ii) a downstream application
of machine learning to those features. This approach is limited by the fact that graphs with
complex topologies are fundamentally non-Euclidean objects [9], which limits our ability to
昀椀nd a generic vector space representation that is suitable for di昀昀erent learning tasks.
   Addressing this limitations, recent works in the 昀椀eld of Geometric Deep Learning [9] and


                                                 299
Graph Learning have generalized deep learning to graph-structured data. Among those works,
Graph Neural Networks (GNNs) [43, 16, 39, 15] have developed into a particularly successful
paradigm. A major advantage of GNNs over other network analysis or machine learning tech-
niques is their ability to capture both relational patterns in the topology of a graph as well as
additional (vectorial) features of nodes or links. A common concept of GNNs is their use of
hidden layers with neural message passing, i.e. nodes repeatedly exchange and update feature
vectors with neighbouring nodes thus incorporating information from their neighbourhood.
The (hidden) features generated in this way can be used to address learning tasks by means of
a perceptron model with a non-linear activation function. The gradient-based optimization of
GNNs can be thought of as an implicit way to generate a topology- and feature-aware latent
space representation of a graph that facilitates node-, link- or graph-level learning tasks [19].
   Moving beyond the social network analysis techniques applied in Section 3, in the follow-
ing we apply two state-of-the-art GNN architectures to the character network of Tolkien’s
Legendarium. We use them to address three unsupervised and supervised learning tasks: (i)
representation learning, i.e. 昀椀nding a meaningful latent space embedding of characters, (ii)
(semi-)supervised node classi昀椀cation, i.e. assigning characters to di昀昀erent works in the Legen-
darium, and (iii) link prediction, i.e. using a subset of links in the graph to predict missing links
in a holdout set.

Latent Space Representation of Characters in Tolkien’s Legendarium Representation
learning is a common challenge both in natural language processing and graph learning. In
NLP, word or text embedding techniques are commonly used to generate vector space repre-
sentations that can then be used to apply downstream machine learning techniques to literary
texts. A popular word embedding technique is word2vec [31, 30], which we use as a baseline
in our analysis that only uses the text corpus while being agnostic to the topology of the char-
acter network. We speci昀椀cally use the SkipGram architecture to train a neural network with a
single hidden layer with ā neurons that captures the context of words in the text corpus, i.e. a
concatenation of all works in our corpus. The weights of neurons in the hidden layer are then
interpreted as positions of words in a ā-dimensional latent vector space. For our analysis, we
used the word2vec implementation in the package gensim with default parameters, i.e. we use
a latent space with ā = 300 dimensions.
   Apart from this standard NLP approach to generate latent space embeddings, we apply two
graph representation learning techniques to generate character embeddings based on the topol-
ogy of the character network: The 昀椀rst approach, Laplacian Eigenmaps, uses eigenvectors cor-
responding to the leading eigenvalues of the Laplacian matrix of a graph [6]. This can be seen
as a factorization of an eigenmatrix [37] that yields a representation of nodes in a ā-dimensional
latent space, where ā is chosen to be smaller than the number of nodes. To determine a rea-
sonable choice for the number of dimensions ā, we performed an experiment in which we
evaluated the average performance of a logistic regression model for node classi昀椀cation (see
detailed description below) for di昀昀erent dimensions ā of the latent space. The results of this
experiment are shown in Figure 4. As expected, we observe tendency to under昀椀t for a very
small (ā < 10) number of dimensions, while the performance saturates as we increase ā be-
yond a “reasonable” value that allows to capture the topological patterns in the network. This


                                                300
analysis informed our choice of ā = 20 for the subsequent experiments. As second approach
we adopt the popular graph representation learning method node2vec, which applies the Skip-
Gram architecture to sequences of nodes generated by a biased second-order random walk (i.e.
a walk with one-step memory) in a graph [17]. We again use ā = 20 hidden dimensions to
make it comparable with the previous method. We note that choosing a value of ā = 20 for
the graph representation learning techniques, which is substantially smaller than the default
value of ā = 300 for word2vec, is justi昀椀ed because the number of nodes in the Legendarium
graph (Ā = 238) is much smaller than the vocabulary used by word2vec (Ā = 18, 430).

                                    Predicting character classes with
                                1.0
                                          Laplacian Eigenmaps

                                0.8
                         f1 score


                                0.6
                                0.4
                                0.2
                                      2 5 8 11 14 17 20 23 26 29 32 35 38
                                            number of dimensions
Figure 4: Performance of node classification (y-axis) for a Laplacian Eigenmap graph embedding of
characters in Tolkien’s Legendarium (with subsequent logistic regression) using di昀昀erent numbers of
dimensions (x-axis).The shaded area indicates the standard deviation of the f1-score.

     We 昀椀nally adopt two Graph Neural Networks, namely Graph Convolutional Networks (GCNs)
[21] and Graph Attention Networks (GATs) [49]. Both of these deep graph learning techniques
use a variant of neural message passing, the main di昀昀erence being that GATs use a learnable
attention mechanism that can place di昀昀erent weights on nearby nodes [57]. We trained both
architectures to address the graph learning tasks, i.e. node classi昀椀cation and link prediction,
outlined below, using a hidden layer with ā = 20 neurons. We then use the hidden layer acti-
vations of both architectures to infer a representation of characters in a 20-dimensional latent
space. To exclusively focus on the graph topology, for our experiments we treated the network
as an unweighted graph. Additional results for experiments with weighted graphs are included
in Appendix D.
     Leveraging the ability of GNNs to consider additional node attributes, we compare three
di昀昀erent approaches: First, we use GNNs without additional node features, which we emulate
by initializing the message passing layer with a one-hot-encoding (OHE) of characters. In other
words, for a network with Ā nodes, each node �㕖 = 0, … , Ā−1 we assign a “dummy feature” vector
�㕓�㕖 ∈ ℝĀ de昀椀ned as:
                                   �㕓�㕖 = (0, … , 0, 1, 0,
                                           ⏟⏟⏟⏟⏟⏟⏟         …,0 )
                                                        ⏟⏟⏟⏟⏟⏟⏟
                                             �㕖 times     Ā−�㕖−1 times
Second, we use the node embeddings generated by node2vec as additional node features that
are used in the message passing layers. Third, we assign the word2vec as additional node


                                                    301
features thus combining NLP and graph neural networks. In Figure 5 we illustrate two latent
space representations of characters generated by (a) word2vec and (b) the combination of GCN
with word2vec features. For both 昀椀gures, we used t-SNE [48] to reduce the latent space embed-
ding to two dimensions, nodes are coloured according to the works in which the corresponding
characters occur most frequently. A comparison of the two embeddings clearly highlights the
advantage of graph learning over a mere application of word2vec: Di昀昀erent from the word
embedding, the combination of GCN with word2vec generates a latent space representation
that captures the distinction of characters across di昀昀erent works in Tolkien’s Legendarium.
We argue that this visualisation highlights the additional information that graph neural net-
works can leverage from the topology of character networks, as opposed to mere word-context
pair statistics. In the following, we more thoroughly investigate this interesting aspect in two
clearly de昀椀ned learning tasks.

Predicting Character Classes We use the methods outlined above to address a supervised
node classi昀椀cation problem in the Legendarium Graph, i.e. the single character network cap-
turing all works in Tolkien’s Legendarium. We assign three labels to nodes that correspond
to the work (i.e. The Silmarillion, The Hobbit, and The Lord of the Rings), in which the corre-
sponding character is most prominent. We extract these labels automatically as argmaxÿāāý
count(Ā,ÿāāý)/∑Ā∈�㔶 count(Ā,ÿāāý), where count(Ā, ÿāāý) is the number of mentions of character Ā in
ÿāāý and �㔶 are all characters. We verify the labels manually, 昀椀nding them to be reasonable in
all cases. The resulting three classes contain 113 (The Silmarillion), 30 (The Hobbit) and 99 (The
Lord of the Rings) characters, i.e. there is class imbalance that we address by using the macro-
averaged f1-score, precision and recall. We highlight that the information on the di昀昀erent
works is withheld from all methods, i.e. for the word embedding and character network con-
struction we concatenate all works to a single text corpus. We train our models on a training
set of labelled characters and use them to predict the unknown classes of unlabelled characters
in a test data set. As a baseline that does not utilise the topology of the character network, we
昀椀rst train a logistic regression model using the embeddings generated by word2vec. Similarly,
we train a logistic regression model on the embeddings generated by Laplacian Eigenmaps and
node2vec. For node2vec we use three di昀昀erent sets of hyper-parameters Ă and ā, where for
Ă = ă = 1 node2vec is equivalent to the graph embedding technique DeepWalk [34]. We
trained the model for 200 epochs. We 昀椀nally train the GCN and GAT model either using one-
hot encoding (OHE) or assigning additional node features generated by the word and graph
embeddings as explained above. For both we used two message passing layers and we trained
the models for 5000 epochs using an Adam optimizer with learning rate þĄ = 0.0001. We eval-
uate all models using a 10−fold cross-validation. Average results with standard deviation are
shown in Table 4.
   Interestingly, we 昀椀nd that the performance of character classi昀椀cation based on the word em-
bedding word2vec is worse than any of the graph learning techniques, thus highlighting the
added value of the graph perspective. The graph embedding technique node2vec, which uses
a SkipGram architecture to embed nodes, clearly outperforms the graph-agnostic word2vec,
despite both using the same logistic regression model for the class prediction. Moreover, we
昀椀nd that a simple application of Laplacian Eigenmaps, i.e. a mathematically principled matrix


                                               302
(a) Latent space embedding of characters obtained by applying the word embedding word2vec to the
    whole text corpus; edges represent character co-occurrences in the text.


 (b) Graph Convolutional Network using word2vec character embeddings as additional node features
Figure 5: Comparison of two latent space embeddings of characters in Tolkien’s Legendarium using
word2vec (a) and the hidden layer activations of a Graph Convolutional Network, where nodes carry
additional features that correspond to word2vec embeddings (b). Node colours indicate the work in
which characters appear most frequently (red: The Hobbit, green: The Lord of the Rings, blue: The
Silmarillion). Two-dimensional visualisations were generated using the dimensionality reduction tech-
nique t-SNE [48].


                                                303
Table 4
Performance of word embedding (word2vec), graph representation learning (Laplacian Eigenmap
and node2vec), and graph neural networks (GCN and GAT) in supervised character classification and
link prediction for a character network capturing Tolkien’s Legendarium. For the word and graph
embedding techniques, a logistic regression model was used for classification and link prediction.
                                            Character classification            Link Prediction
                                 F1-score        Precision       Recall         ROC/AUC
    word2vec300                  79.45 ± 0.12     80.67 ± 0.13   80.08 ± 0.12   78.16 ± 0.79
    Laplacian Eigenmap20 (LE)    81.52 ± 0.14     86.59 ± 0.09   84.13 ± 0.14   64.03 ± 2.28
    node2vec20 Ă = 1, ă = 4      82.10 ± 0.12     84.54 ± 0.12   84.85 ± 0.11   79.06 ± 0.01
    node2vec20 Ă = 4, ă = 1      81.75 ± 0.12     81.69 ± 0.12   84.51 ± 0.12   80.71 ± 0.01
    node2vec20 Ă = 1, ă = 1      87.07 ± 0.13     89.10 ± 0.12   87.30 ± 0.13   80.15 ± 0.02
    GCN20 LE                     71.89 ± 0.20     72.40 ± 0.22   73.24 ± 0.18   88.53 ± 1.25
    GCN20 node2vecĂ = 1, ă = 1   92.17 ± 0.10     96.69 ± 0.04   91.08 ± 0.11   88.03 ± 1.43
    GCN20 OHE                    90.55 ± 0.13     92.71 ± 0.12   89.95 ± 0.13   80.78 ± 1.16
    GCN20 word2vec               92.32 ± 0.10     95.27 ± 0.08   91.42 ± 0.10   81.67 ± 1.43
    GAT20 LE                     53.60 ± 0.08     51.72 ± 0.05   57.78 ± 0.08   79.41 ± 1.83
    GAT20 node2vecĂ = 1, ă = 1   91.18 ± 0.09     95.57 ± 0.04   90.37 ± 0.10   86.23 ± 0.99
    GAT20 OHE                    83.52 ± 0.14     85.59 ± 0.15   84.68 ± 0.14   80.52 ± 1.76
    GAT20 word2vec               89.41 ± 0.07     93.98 ± 0.08   89.13 ± 0.10   82.88 ± 2.08


decomposition, yields comparable node classi昀椀cation performance than the neural network-
based node2vec embedding. The best precision is achieved for a Graph Convolutional Network
(GCN) with additional node features generated by node2vec, while the best f1-score and recall
are achieved for GCN with additional word2vec embeddings. We attribute this to the fact that
the combination of word embeddings and graph neural networks integrates two complemen-
tary sources of information, thus highlighting the advantages of methods that leverage both
NLP and graph learning.
   In Figure 6 we further demonstrate the ability of Graph Neural Networks to perform a semi-
supervised classi昀椀cation, i.e. their ability to accurately predict classes based on a very small
number of labelled examples. Figure 6(a) shows the training network with three randomly
coloured characters, one for each ground-truth class. Nodes in the test set are shown in grey
and node positions are chosen based on the latent space representation shown in Figure 5(b).
We use these three labelled characters to train a GCN with additional word2vec embeddings
as node features. We then use the trained model to predict the character classes in the test
set. The predicted classes are shown as coloured nodes in Figure 6(b), where the three training
nodes are shown in grey. A visual comparison of Figure 6(b) and 5 (b) allows to evaluate the
prediction against the ground truth. Despite the very sparse labelled examples, and thanks to
its use of the graph topology of the unlabelled nodes in the training set, the GCN model is able
to accurately predict character classes, reaching an f1-score of ≈ 79.7%, a precision of ≈ 78.6%
and a recall of ≈ 82.6%. Remarkably, this shows that the combination of a GCN model with
word2vec node features yields a higher f1-score and recall with only three labelled examples
than a word embedding alone, even when all characters in the training set are labelled.


                                                  304
(a) Training network with three labelled characters Galadriel, Orcs, and Balin (shown as coloured nodes).


(b) Character classes predicted by Graph Convolutional Network (GCN) using word2vec character em-
    beddings as additional node features.
Figure 6: Illustration of semi-supervised character classification, where the training set contains only
three randomly chosen labelled characters. Node colours indicate ground truth classes (a) and class
predictions by Graph Convolutional Network (GCN) (b) (red: The Hobbit, green: The Lord of the Rings,
blue: The Silmarillion). Two-dimensional visualisations were generated using the dimensionality re-
duction technique t-SNE [48]. Node positions correspond to Figure 5 (b), which can be used for a
comparison with ground truth character classes. Remarkably, a GCN model with additional word2vec
features achieves an f1-score of ≈ 79.7% with an associated precision of ≈ 78.6% and recall of ≈ 82.6%.


                                                  305
Predicting Character Interactions We 昀椀nally address link prediction, which refers to the
task of predicting “missing” links in a graph, i.e. links that are either “missing” in incomplete
data or that likely form in the future. Link prediction is a well-studied graph learning problem,
with important applications in social network analysis and recommender systems [29]. In the
context of character networks, it is relevant because it could be used to alleviate the low recall
of the rigid, sentence-based character network extraction that we employed in Section 3.
   Adopting a supervised approach, we split the edges of the Legendarium Graph in a training
and set set, where we withhold 10 % of the edges during training to test our model. We use
word2vec, Laplacian Eigenmaps and node2vec to generate embeddings �㕓Ā ∈ ℝā of characters.
For word2vec embeddings are generated using the full text corpus. For the graph embedding
techniques Laplacian Eigenmaps and node2vec we only use links in the training set, poten-
tially putting them at a disadvantage in terms of training data. We use the resulting feature
vectors to calculate the element-wise (Hadamard) product �㕓Ā ∘ �㕓ā ∈ ℝā for character pairs Ā, ā,
which yields ā-dimensional features for all character pairs. We then use features of positive
instances (i.e. pairs Ā, ā connected by a link (Ā, ā) in the training set) and negative instances
(pairs Ā, ā not connected by a link in the training set) to train a (binary) logistic regression
classi昀椀er and use the trained model to predict links in the test set. We use negative sampling
to mitigate the imbalance between negative and positive instances in the training set. For the
two GNN architectures we adopt the common approach to add a decoding step that computes
the Hadamard product of node features a昀琀er the last message passing layer. We use a Binary
Cross Entropy with Logits Loss function and train the models for 15000 epochs using an Adam
Optimizer with learning rate 0.001.
   We use the Receiver-Operating Characteristic (ROC) to evaluate our models, i.e. we com-
pute ROC curves that give the true and false positive rates across all discrimination thresholds.
An example (and explanation) is included in Appendix E. We compute the Area Under Curve
(AUC) of ROC curves within the unit square, which range from 0 to 1. 0.5 corresponds to the
performance of a random classi昀椀er, values < 0.5 indicate worse and values > 0.5 better than
random performance. We again evaluate all models using a 10−fold cross-validation. Table 4
reports average results and standard deviations of the AUC for all models. With the exception
of Laplacian Eigenmaps, we 昀椀nd that graph methods generally perform better than word2vec.
We further observe that GNNs perform considerably better than node2vec, where the best
performance is achieved when coupling GCN with Laplacian Eigenmaps or node2vec.


5. Conclusion
In summary, we used natural language processing techniques like named entity recognition
and coreference resolution to construct a single character network from a corpus of works
that constitute J.R.R. Tolkien’s Legendarium. Apart from characterising the network based on
social network analysis, we adopt state-of-the-art graph learning techniques to (i) generate la-
tent space embeddings of characters, (ii) automatically classify characters based on the work
to which they belong, and (iii) predict character co-occurrences. For all three tasks, we 昀椀nd
a signi昀椀cant advantage of Graph Neural Networks (GNNs) over a common word embedding
technique and we 昀椀nd that a combination of both yields the best performance. Our approach


                                              306
to construct a single graph for multiple literary texts could be interesting to analyze other
corpora of works with overlapping characters (e.g. mythology, historical novels, etc.). We fur-
ther believe that our results on the application of GNNs to address a link prediction task have
interesting implications for computational literary studies. Considering the di昀케culty of coref-
erence resolution, and the low coverage of the resulting character networks that we observed
in our experiments, we expect that link prediction could potentially be used as an approach
to address the low recall observed for the sentence-based co-occurrence networks. In future
work, we will further consider the modelling of character co-occurrences as temporal character
co-occurrence networks, where the temporal ordering of sentences determines the “time stamps”
of edges in the resulting dynamic graph. This promises the application of recently developed
higher-order graph modelling, visualization, and learning techniques [24, 35, 46, 25, 36], which
capture patterns in the temporal ordering of edges and can thus provide insights beyond the
mere graph topology.


Acknowledgements
Vincenzo Perri and Ingo Scholtes acknowledge support by the Swiss National Science Founda-
tion, grant 176938.


References
 [1] A. Agarwal, A. Corvalan, J. Jensen, and O. Rambow. “Social network analysis of alice in
     wonderland”. In: Proceedings of the NAACL-HLT 2012 Workshop on computational linguis-
     tics for literature. 2012, pp. 88–96.
 [2] D. Agarwal, D. Vijay, et al. “Genre Classi昀椀cation using Character Networks”. In: 2021
     5th International Conference on Intelligent Computing and Control Systems (ICICCS). Ieee.
     2021, pp. 216–222.
 [3] E. E. Auger. “The Lord of the Rings’ interlace: Tolkien’s narrative and Lee’s illustrations”.
     In: Journal of the Fantastic in the Arts 19.1 (2008), p. 70.
 [4] D. Bamman, O. Lewke, and A. Mansoor. “An annotated dataset of coreference in English
     literature”. In: arXiv preprint arXiv:1912.01140 (2019).
 [5] D. Bamman, S. Popat, and S. Shen. “An annotated dataset of literary entities”. In: Proceed-
     ings of the 2019 Conference of the North American Chapter of the Association for Computa-
     tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019,
     pp. 2138–2144.
 [6] M. Belkin and P. Niyogi. “Laplacian eigenmaps for dimensionality reduction and data
     representation”. In: Neural computation 15.6 (2003), pp. 1373–1396.
 [7] A. Beveridge and J. Shan. “Network of thrones”. In: Math Horizons 23.4 (2016), pp. 18–22.
 [8] A. Bonato, D. R. D’Angelo, E. R. Elenberg, D. F. Gleich, and Y. Hou. “Mining and modeling
     character networks”. In: International workshop on algorithms and models for the web-
     graph. Springer. 2016, pp. 100–114.


                                              307
 [9] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. “Geometric deep
     learning: going beyond euclidean data”. In: IEEE Signal Processing Magazine 34.4 (2017),
     pp. 18–42.
[10]   J. Bruna and X. Li. “Community detection with graph neural networks”. In: stat 1050
       (2017), p. 27.
[11]   D. K. Elson, K. McKeown, and N. J. Dames. “Extracting social networks from literary
       昀椀ction”. In: (2010).
[12]   R. Feldman. “Techniques and applications for sentiment analysis”. In: Communications
       of the ACM 56.4 (2013), pp. 82–89.
[13]   M. Fey and J. E. Lenssen. “Fast graph representation learning with PyTorch Geometric”.
       In: arXiv preprint arXiv:1903.02428 (2019).
[14]   T. M. Fruchterman and E. M. Reingold. “Graph drawing by force-directed placement”. In:
       So昀琀ware: Practice and experience 21.11 (1991), pp. 1129–1164.
[15]   C. Gallicchio and A. Micheli. “Graph Echo State Networks”. In: 2010, pp. 1–8. doi: 10.11
       09/ijcnn.2010.5596796.
[16]   M. Gori, G. Monfardini, and F. Scarselli. “A new model for learning in graph domains”. In:
       Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. 2 (2005),
       729–734 vol. 2.
[17]   A. Grover and J. Leskovec. “node2vec: Scalable feature learning for networks”. In: Pro-
       ceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and
       data mining. 2016, pp. 855–864.
[18]   J. Hackl, I. Scholtes, L. V. Petrović, V. Perri, L. Verginer, and C. Gote. “Analysis and Vi-
       sualisation of Time Series Data on Networks with Pathpy”. In: Companion Proceedings
       of the Web Conference 2021. Www ’21. Ljubljana, Slovenia: Association for Computing
       Machinery, 2021, pp. 530–532. doi: 10.1145/3442442.3452052. url: https://doi.org/10.11
       45/3442442.3452052.
[19]   W. L. Hamilton. Graph Representation Learning. Vol. 14. 3. Morgan & Claypool Publishers,
       2020, pp. 1–159.
[20]   M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd. “spaCy: Industrial-strength
       Natural Language Processing in Python”. In: (2020). doi: 10.5281/zenodo.1212303.
[21]   T. N. Kipf and M. Welling. “Semi-Supervised Classi昀椀cation with Graph Convolutional
       Networks”. In: Proceedings of the 5th International Conference on Learning Representations
       (Iclr). Iclr ’17. Palais des Congrès Neptune, Toulon, France, 2017. url: https://openrevie
       w.net/forum?id=SJU4ayYgl.
[22]   M. Krug. “Techniques for the Automatic Extraction of Character Networks in German
       Historic Novels”. doctoralthesis. Universität Würzburg, 2020. doi: 10.25972/opus-20918.
[23]   V. Labatut and X. Bost. “Extraction and analysis of 昀椀ctional character networks: A sur-
       vey”. In: ACM Computing Surveys (CSUR) 52.5 (2019), pp. 1–40.


                                               308
[24]   R. Lambiotte, M. Rosvall, and I. Scholtes. “From networks to optimal higher-order models
       of complex systems”. In: Nature physics 15.4 (2019), pp. 313–320.
[25]   T. LaRock, I. Scholtes, and T. Eliassi-Rad. “Sequential motifs in observed walks”. In: Jour-
       nal of Complex Networks 10.5 (2022), cnac036.
[26]   J. Leyerle. “The interlace structure of Beowulf”. In: University of Toronto Quarterly 37.1
       (1967), pp. 1–17.
[27]   J. Li, C. Zhang, H. Tan, and C. Li. “Complex Networks of Characters in Fictional Novels”.
       In: 2019 IEEE/ACIS 18th International Conference on Computer and Information Science
       (ICIS). 2019, pp. 417–420. doi: 10.1109/icis46139.2019.8940174.
[28]   J. Li, A. Sun, J. Han, and C. Li. “A survey on deep learning for named entity recognition”.
       In: IEEE Transactions on Knowledge and Data Engineering 34.1 (2020), pp. 50–70.
[29]   L. Lü and T. Zhou. “Link prediction in complex networks: A survey”. In: Physica A: sta-
       tistical mechanics and its applications 390.6 (2011), pp. 1150–1170.
[30]   T. Mikolov, K. Chen, G. Corrado, and J. Dean. E昀케cient Estimation of Word Representations
       in Vector Space. 2013. url: http://arxiv.org/abs/1301.3781.
[31]   T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed Representations of
       Words and Phrases and their Compositionality. 2013. url: http://arxiv.org/abs/1310.4546.
[32]   F. Moretti. Network theory, plot analysis. Literary Lab Pamphlet 2. 2011.
[33]   M. Newman. Networks. Oxford university press, 2018.
[34]   B. Perozzi, R. Al-Rfou, and S. Skiena. “Deepwalk: Online learning of social representa-
       tions”. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge
       discovery and data mining. 2014, pp. 701–710.
[35]   V. Perri and I. Scholtes. “HOTVis: Higher-order time-aware visualisation of dynamic
       graphs”. In: International Symposium on Graph Drawing and Network Visualization. Springer.
       2020, pp. 99–114.
[36]   L. Qarkaxhija, V. Perri, and I. Scholtes. “De Bruijn goes Neural: Causality-Aware Graph
       Neural Networks for Time Series Data on Dynamic Graphs”. In: arXiv preprint arXiv:2209.08311
       (2022).
[37]   J. Qiu, Y. Dong, H. Ma, J. Li, C. Wang, K. Wang, and J. Tang. “Netsmf: Large-scale network
       embedding as sparse matrix factorization”. In: The World Wide Web Conference. 2019,
       pp. 1509–1520.
[38]   M. Ribeiro, R. Vosgerau, M. Andruchiw, and S. Pinto. “The complex social network from
       The Lord of The Rings”. In: Revista Brasileira de Ensino de Fı́sica (2016).
[39]   F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. “The Graph Neural
       Network Model”. In: Trans. Neur. Netw. 20.1 (2009), pp. 61–80. doi: 10.1109/tnn.2008.200
       5605. url: https://doi.org/10.1109/TNN.2008.2005605.


                                               309
[40]   F. Schröder, H. O. Hatzel, and C. Biemann. “Neural end-to-end coreference resolution for
       German in di昀昀erent domains”. In: Proceedings of the 17th Conference on Natural Language
       Processing (KONVENS 2021). 2021, pp. 170–181. url: https://aclanthology.org/2021.konv
       ens-1.15/.
[41]   T. Shippey. The road to Middle-earth: how JRR Tolkien created a new mythology. Hmh,
       2014.
[42]   A. C. Sparavigna. “On social networks in plays and novels”. In: International Journal of
       Sciences 2.10 (2013).
[43]   A. Sperduti and A. Starita. “Supervised Neural Networks for the Classi昀椀cation of Struc-
       tures”. In: Trans. Neur. Netw. 8.3 (1997), pp. 714–735. doi: 10.1109/72.572108. url: https:
       //doi.org/10.1109/72.572108.
[44]   R. Sukthanker, S. Poria, E. Cambria, and R. Thirunavukarasu. “Anaphora and coreference
       resolution: A review”. In: Information Fusion 59 (2020), pp. 139–162.
[45]   R. Sukthanker, S. Poria, E. Cambria, and R. Thirunavukarasu. “Anaphora and coreference
       resolution: A review”. In: Information Fusion 59 (2020), pp. 139–162. doi: https://doi.org
       /10.1016/j.inffus.2020.01.010.
[46]   L. Torres, A. S. Blevins, D. Bassett, and T. Eliassi-Rad. “The why, how, and when of rep-
       resentations for complex systems”. In: SIAM Review 63.3 (2021), pp. 435–485.
[47]   P. Trilcke. “Social network analysis (SNA) als Methode einer textempirischen Literatur-
       wissenscha昀琀”. In: Empirie in der Literaturwissenscha昀琀. Brill mentis, 2013, pp. 201–247.
[48]   L. Van der Maaten and G. Hinton. “Visualizing data using t-SNE.” In: Journal of machine
       learning research 9.11 (2008).
[49]   P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. “Graph Atten-
       tion Networks”. In: 6th International Conference on Learning Representations (2017).
[50]   S. Wasserman, K. Faust, et al. “Social network analysis: Methods and applications”. In:
       (1994).
[51]   R. C. West. “The Interlace Structure of The Lord of the Rings”. In: A Tolkien Compass 2
       (1975), pp. 75–91.
[52]   J. Worsham and J. Kalita. “Genre Identi昀椀cation and the Compositional E昀昀ect of Genre
       in Literature”. In: Proceedings of the 27th International Conference on Computational Lin-
       guistics. Santa Fe, New Mexico, USA: Association for Computational Linguistics, 2018,
       pp. 1963–1973. url: https://aclanthology.org/C18-1167.
[53]   M. C. Yavuz. “Analyses of character networks in dramatic works by using graphs”. In:
       2020 7th International Conference on Behavioural and Social Computing (BESC). Ieee. 2020,
       pp. 1–4.


                                               310
[54]   A. Zehe, L. Konle, L. K. Dümpelmann, E. Gius, A. Hotho, F. Jannidis, L. Kaufmann, M.
       Krug, F. Puppe, N. Reiter, A. Schreiber, and N. Wiedmer. “Detecting Scenes in Fiction: A
       new Segmentation Task”. In: Proceedings of the 16th Conference of the European Chapter
       of the Association for Computational Linguistics: Main Volume. Online: Association for
       Computational Linguistics, 2021, pp. 3167–3177. doi: 10.18653/v1/2021.eacl- main.276.
       url: https://aclanthology.org/2021.eacl-main.276.
[55]   M. Zhang and Y. Chen. “Link prediction based on graph neural networks”. In: Advances
       in neural information processing systems 31 (2018).
[56]   M. Zhang, Z. Cui, M. Neumann, and Y. Chen. “An End-to-End Deep Learning Architec-
       ture for Graph Classi昀椀cation”. In: Proceedings of the Thirty-Second AAAI Conference on
       Arti昀椀cial Intelligence and Thirtieth Innovative Applications of Arti昀椀cial Intelligence Con-
       ference and Eighth AAAI Symposium on Educational Advances in Arti昀椀cial Intelligence.
       Aaai’18/iaai’18/eaai’18. New Orleans, Louisiana, USA: AAAI Press, 2018.
[57]   J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun. “Graph
       neural networks: A review of methods and applications”. In: AI Open 1 (2020), pp. 57–81.


                                               311
A. Alternative Approaches to Detect Character Co-Occurrences
As an alternative to the sentence-based co-occurrence approach described in Section 2, we also
evaluated two other approaches in initial experiments: detection based on (i) the parse tree
and (ii) a sliding text window. For (i), we marked all sentences as an “interaction” between
two characters if they had the same head word in the parse tree. This is the most strict version
of our character network construction, as it only captures explicit interactions (e.g., “Frodo
saw Sam”). Due to the small number of detected interactions, we discarded this strategy. On
the other hand, (ii) is more lenient than our 昀椀nal sentence-based approach, only requiring two
characters to be mentioned within a window of a 昀椀xed number of characters (letters). This ap-
proach of extracting character co-occurrences within a sliding text window has been adopted
in a number of prior works [7, 8]. It allows to capture interactions between characters that
are not mentioned in the same sentence, but introduces the risk of detecting a large number
of spurious interactions. In our experiments, we chose a window size of 2000 characters, with
the additional restriction that chapter borders may not be crossed. Aiming for character net-
works that maintain a balance between recall (i.e. detecting all meaningful character links) and
precision (i.e. limiting the number of spurious links), we decided to use the sentence-based in-
teraction detection.


B. Narrative Charts for The Hobbit and The Silmarillion
Complementing the results in Section 2, in Figure 7 we present two additional narrative charts
that we generated for The Hobbit and The Silmarillion.

                         Hobbit                                                 Silmarillion
  Balin                                                   Arda
   Bard                                              Celegorm
   Bert                                                   Elves
  Bilbo                                              Felagund
   Dain                                               Fingolfin
    Dori                                                Fingon
 Elrond                                                 Fëanor
     Fili                                                Húrin
Gandalf                                                Lúthien
Gollum                                               Maedhros
     Kili                                               Melkor
 Smaug                                                Morgoth
 Thorin                                                 Turgon
    Tom                                                   Túrin
William                                              the Valar

Figure 7: Narrative charts for The Hobbit (le昀琀) and The Silmarillion (right)


                                                  312
C. Visualisation of Additional Latent Space Embeddings
In Figure 8 and Figure 9 we include additional visual representations of latent space embeddings
of characters obtained by those methods that have not been included in the main text. For all
methods, we employed t-SNE [48] to reduce the dimensionality of the latent space embeddings
to two dimensions.


D. Node Classification and Link Prediction with Weighted
   GNNs
Complementing the results discussed in the main article, in Table 5 we include additional results
for which we applied Graph Neural Networks to weighted graphs, i.e. di昀昀erent from Table 4
we consider a character network with link weights ā((Ā, ā)) that capture the number of co-
occurrences of two characters Ā, ā. Due to time constraints, we performed this analysis only
for the best performing Graph Neural Network, i.e. GCN. These additional results suggests
that, at least for our corpus, the inclusion of link weights does not signi昀椀cantly improve the
performance of models for character classi昀椀cation and link prediction.

Table 5
Additional results for node classification in the Legendarium graph where links carry weights that cap-
ture the number of character co-occurrences .
                                                Character classification            Link Prediction
                                     F1-score        Precision       Recall         ROC/AUC
        GCN20 LE                     60.85 ± 0.12     60.84 ± 0.16   63.40 ± 0.10   79.41 ± 1.83
        GCN20 node2vecĂ = 1, ă = 1   92.14 ± 0.11     95.87 ± 0.07   91.28 ± 0.11   86.23 ± 0.99
        GCN20 OHE                    87.92 ± 0.10     93.02 ± 0.08   86.36 ± 0.11   80.52 ± 1.76
        GCN20 word2vec               92.96 ± 0.10     95.67 ± 0.07   92.03 ± 0.10   82.88 ± 2.08


E. Explanation of ROC/AUC Evaluation of Link Prediction
In Section 4 we use the Area Under Curve (AUC) of a Receiver-Operating Characteristic (ROC)
curve to evaluate the performance of our models in link prediction, which is a common ap-
proach to evaluate the diagnostic quality of binary classi昀椀ers in information retrieval and ma-
chine learning. A key advantage of this approach is that it enables us to evaluate the perfor-
mance of a binary classi昀椀er across all possible discrimination thresholds, which can be adapted
to tune the sensitivity/speci昀椀city of the prediction depending on application requirements. To
assist the reader to follow this evaluation approach, below we explain one exemplary ROC
curve obtained for a link prediction in the Legendarium graph using the a node2vec embed-
ding of characters and a logistic regression model. To generate this curve, we 昀椀rst consider the
prediction scores (i.e. in the case of logistic regression the positive class probability) assigned
to each node pair in the test set, where a link is predicted whenever the score is above a given
discrimination threshold �㔖. For each value of �㔖 we can now calculate the true and false positive


                                                         313
                                         (a) node2vec


                 (b) Graph Attention Networks with additional word2vec features
Figure 8: Latent space embedding of characters in Tolkien’s Legendarium using node2vec (a) and
Graph Attention Networks with additional node2vec features (b).


                                              314
                 (a) Graph Convolutional Networks with one-hot-encoding (OHE)


                                    (b) Laplacian Eigenmap
Figure 9: Latent space embedding of characters in Tolkien’s Legendarium using GCN with one-hot
encoding (OHE) (a) and Laplacian Eigenmap (b).


                                             315
rate (TPR and FPR), i.e. the fraction of those predicted links for which the prediction is cor-
rect and the fraction of unconnected node pairs for which a link is predicted errorneously. A
sweep over all possible discrimination thresholds �㔖 now yields a ROC curve in the unit square.
In Figure 10 we show the ROC curve of a logistic regression model using node2vec features.
A classi昀椀er that perfectly classi昀椀es all instances in the data will assume initial values of FPR=0
and TPR=0 only for the maximal discrimination threshold of �㔖 = 1, where all instances are
assigned to the negative class. For any �㔖 smaller than the maximum and larger than the mini-
mum value of zero, a perfect classi昀椀er correctly predicts all instances, which yields TPR=1 and
FPR=0. For the minimum value of �㔖 = 0, the classi昀椀er necessarily predicts the positive class
for all instances, which yields TPR=1 and FPR=1. We thus 昀椀nd that the ROC curve of a per-
fect classi昀椀er follows the le昀琀 and upper border of the unit square, which yields an Area Under
Curve (AUC) of one. Conversely, the ROC curve of a classi昀椀er that consistently predicts the
opposite of the true class follows the bottom and right border of the unit square, which yields
an Area Under Curve (AUC) of zero. For a classi昀椀er with no diagnostic ability, the FPR and
TPR are expected to increase equally as we lower the discrimination threshold �㔖, i.e. the ROC
curve follows the so-called diagonal of no-discrimination (see red dashed line in Figure 10) and
the Area Under Curve is expected to be close to 0.5.

                                              Predicting character interactions with
                                                               node2vec


                                 1.0


                                 0.8


                                 0.6
                           TPR


                                 0.4


                                 0.2


                                 0.0


                                       0.0   0.2         0.4           0.6        0.8   1.0

                                                                 FPR


Figure 10: Exemplary ROC curve for link prediction using node2vec embedding and logistic regression.


F. Evaluation of Link Prediction for Individual Works
In the main text we present and discuss the performance of di昀昀erent techniques to predict
links in a single character network that spans Tolkien’s Legendarium. Apart from this analysis,
we additionally evaluated link prediction in character co-occurrence networks that have been
generated for the three works of our corpus, i.e. The Silmarillion, The Hobbit, and The Lord of
the Rings separately. For the sake of completeness, we include these results in Table 6 below.
Like in Appendix D, we applied the Graph Convolutional Network (GCN) model on a weighted


                                                               316
graph, while Graph Attention Networks (GAT) were applied to an unweighted graph.5

Table 6
Additional results for link prediction in character co-occurrence networks constructed for individual
works in our corpus.
                                                  The Silmarillion   The Hobbit      The Lord of the Rings
                                                  ROC/AUC            ROC/AUC         ROC/AUC
          GCN20 LE                                80.38 ± 1.97       71.87 ± 5.94    76.25 ± 2.05
          GCN20 node2vecĂ = 1, ă = 1              81.74 ± 2.3        63.86 ± 5.57    76.61 ± 2.87
          GCN20 OHE                               74.78 ± 1.9        70.22 ± 8.76    63.15 ± 4.37
          GCN20 word2vec                          76.82 ± 2.68       70.23 ± 7.15    67.42 ± 2.77
          GAT20 LE                                68.99 ± 3.46       76.88 ± 4.53    61.3 ± 7.38
          GAT20 node2vecĂ = 1, ă = 1              79.37 ± 2.13       71.21 ± 6.53    71.92 ± 2.69
          GAT20 OHE                               70.88 ± 3.12       71.52 ± 8.14    63.53 ± 3.68
          GAT20 word2vec                          73.72 ± 4.13       72.29 ± 7.78    64.52 ± 4.29
          GCN20 LE (weighted)                     80.4 ± 1.44        69.84 ± 7.23    81.31 ± 1.96
          GCN20 node2vecĂ = 1, ă = 1 (weighted)   82.7 ± 1.92        68.09 ± 7.42    76.46 ± 3.56
          GCN20 OHE (weighted)                    74.78 ± 3.3        67.25 ± 9.41    63.15 ± 3.12
          GCN20 word2vec (weighted)               78.8 ± 1.8         62.97 ± 22.35   64.53 ± 1.79


5
    We resorted to an unweighted graph due to a supposed implementation error in the weighted GAT implementation
    in version 2.0.4 of the graph learning library pytorch-geometric [13].


                                                           317