=Paper=
{{Paper
|id=Vol-3834/paper80
|storemode=property
|title=The GOLEM-Knowledge Graph and Search Interface: Perspectives into Narrative and Fiction
|pdfUrl=https://ceur-ws.org/Vol-3834/paper80.pdf
|volume=Vol-3834
|authors=Franziska Pannach,Luotong Cheng,Federico Pianzola
|dblpUrl=https://dblp.org/rec/conf/chr/PannachCP24
}}
==The GOLEM-Knowledge Graph and Search Interface: Perspectives into Narrative and Fiction==
The GOLEM-Knowledge Graph and Search Interface:
Perspectives into Narrative and Fiction
Franziska Pannach1,∗ , Luotong Cheng1 and Federico Pianzola1
1
Centre for Language and Cognition, University of Groningen, The Netherlands
Abstract
This contribution presents the GOLEM Knowledge Graph and interface, offering different perspectives
into content-related data and metadata from the domain of fanfiction narratives. The Knowledge Graph
is aligned with common ontologies and vocabularies from the domains of narrative and cultural heritage.
In this short paper, we outline how narrative organization and characters’ features are modelled in the
GOLEM knowledge graph. The GOLEM UI is also presented, a user-friendly access point to the data
that allows to browse the knowledge graph even without knowledge of SPARQL.
Keywords
narrative structure, knowledge graphs, literature and fiction, semantic web technology, fanfiction
1. Introduction
One of the main aims of the GOLEM (Graph Ontologies for Literary Evolution Models) project
is to build an ontology1 that can be used to model narratives independent of their domain of ap-
plication (e.g. fiction or news), their association with a literary tradition, or their geographical
and cultural contexts. The formal semantic model is also designed to be language-agnostic and
independent of the format of the stories. Such a model should be able to express how narra-
tive elements (e.g. events), characters, and their individual representations, as well as readers’
engagement and literary evolution, are related to each other. The theoretical framework guid-
ing the creation of the model is grounded in literary theory, narratology, and best practices of
formal ontology design.
The ontology is used to model the data of the GOLEM triple store [10], which contains over
eight million stories, into a knowledge graph (KG). The KG contains a subset of ca. 19.000
stories from the original triple store, and will be extended continuously in the near future. At
the moment, the modelled data are fanfiction stories from the popular online-platform Archive
of Our Own (AO3) [3]2 . This particular genre of stories holds immense potential, not only as
a case study for modeling the literary domain, but also for in-depth study of user-produced
narratives, reader response, semantic and narrative modeling approaches, and for the devel-
opment of natural language processing (NLP) tools. Within the communities associated with
CHR 2024: Computational Humanities Research Conference, December 4–6, 2024, Aarhus, Denmark
∗
Corresponding author.
£ f.a.pannach@rug.nl (F. Pannach); skylarcheng585@gmail.com (L. Cheng); f.pianzola@rug.nl (F. Pianzola)
ȉ 0000-0003-4216-8410 (F. Pannach); 0009-0002-6567-8923 (L. Cheng); 0000-0001-6634-121X (F. Pianzola)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
1
https://ontology.golemlab.eu/
2
https://archiveofourown.org
462
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
a specific fandom, fanfiction works are polyvocal interpretations of shared stories, internet
folklore that is produced at a growing pace [11]. The works are polyvocal in the sense that,
through a work, reader and writer are in dialogue with a canonical narrative universe that in-
spired it, but they are also uniquely engaged with each other. Not only in the traditional cycle
of production-reception, but also through active and immediate engagement via comments or
other interactions (e.g. kudos). This engagement can be measured and subsequently used to
study the evolution of (cultural) traits of literary works, e.g. the appearance (and disappear-
ance) of certain character traits, or the change in character roles (villain-to-hero, enemy-to-
lover, secondary-to-main character).
In order to make these valuable data more easily accessible to interested researchers and
other stakeholders, we created the GOLEM-UI, an easy-to-use interface based on the SAMPO
framework [7, 6].3 This is an example of good practice also adopted by other Digital Humanities
projects [4].
2. Domain Modelling
In the knowledge graph, a subset of the triple store data has been modelled. Fanfiction
stories and canonical works are represented as instances of lrm:F1_Work [12]. Characters
have two different class representations: gc:G1_Character (a crm:E89_Propositional_Object)
for instances that appear in a specific story, and gc:G0_Character-Stoff [17, 16] (a
crm:E28_Conceptual_Object) that refers to all the possible variations (Stoff ) of a character. This
allows modelling the relationship between an instance of a character in a specific version of a
narrative material, e.g. Harry Potter in the novel Harry Potter and the Philosopher’s Stone, and
the general idea of the character that appears in different books and is thus a set of various
physical or biographical features, and personality traits.
The second main aspect to be modelled is that of social relations. In fanfiction, romantic or
sexual pairings are expressed through “Character/Character”-relationships (so-called “slash-
ing”). This category is of special interest, because it allows users to investigate the recurrent
use of gender-specific features and character pairings, as well as their influence on the pop-
ularity of a story both within a specific fandom and across fandoms in comparison with the
canonical relationships [13].
Additional metadata are modelled with DCMI Metadata Terms4 , e.g. dct:title or dct:creator. At
the moment, domain-specific terms, such as the number of kudos – a form of user-interaction
similar to ‘likes’ – are modelled using project-specific categories, e.g. golem:numberOfKudos.
This will be updated in the future once the KG makes full use of the GOLEM ontology. For
copyright reasons, the KG (as well as the triple store) does not provide access to the fulltexts
of the fanfiction stories. This is consistent with the goal of a large scale study of fiction using
derived features [9, 5, 14].
3
The interface is available at http://search.golemlab.eu:3006/.
4
https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
463
Figure 1: Distribution of stories per language
3. Search Perspectives
SAMPO offers a framework to create different focused perspectives on the data. The queries
are predefined. Therefore, end-users do not need to be proficient in SPARQL. Table 1 gives
an overview of the current state of the KG, while Figure 1 provides a breakdown of stories by
language.
Table 1
Current statistics of the knowledge graph (2024-10-15)
Class Number of Instances
Stories 18,710
Fandoms 1,698
Languages 27
Fanfiction Characters (total, distinct) 6,637
Potterverse Characters (distinct) 523
Potterverse Character Stoffe (distinct) 253
The GOLEM-UI search perspectives include four distinct views into the KG:
1. Metadata view: Overview of the stories’ metadata,
2. Fandoms view: Statistics on fandoms and associated stories,
3. Characters view: A closer look into characters and their various romantic pairings, for
now with a special focus on the Harry Potter fandom (Potterverse),
4. Literary quality view: Statistics on readability and literary quality of the stories.
464
3.1. Metadata View
The metadata view gives an overview over the common properties of the stories, such as the
word count, the language of the work, or the associated fandom (Figure 3). Facets on the left
hand side of the search perspective allow users to filter data by language and fandom.
3.2. Fandoms View
The fandoms view gives statistical overview over the representation of specific narrative uni-
verses, e.g. how many stories are associated with the respective fandom (Figure 4). Fandoms
are represented by their skos:prefLabel, which in some cases can be a translation of the title.
Sub-fandoms are connected to their more general fandom (e.g. Harry Potter and the Philoso-
pher’s Stone to the fandom Harry Potter - J.K. Rowling) by skos:broader/skos:narrower relations.
3.3. Characters View
The characters view shows a character’s aliases (e.g. Voldemort, Lord Voldemort, Tom Riddle and
so on), keywords that are associated to a certain character (i.e. all the possible variations a char-
acter can have), and the number of stories a specific version of the character appears in. In order
to link between them all the versions of a character, we introduce the class G0_Character-Stoff
a crm:E28_Conceptual_Object. This represents the general idea of a character, e.g. Voldemort.
Specific realisations of a character, e.g. Voldemort in Harry Potter and the Half Blood Prince, are
represented as individuals of G1_Character a crm:E89_Propositional_Object.
The GUI shows characters and their associated synonyms from the Potterverse fandom, as
well as the number of social relationships in which a character is involved. This statistic gives
us an insight into which characters are popular love-interests in a narrative universe or with
Figure 2: Landing page of the GOLEM-UI giving access to the four individual data perspectives
465
Figure 3: Story Metadata Perspective
Figure 4: Fandoms Perspective
whom they interact more frequently. For instance, Figure 5 shows that currently5 , the char-
acter of Albus Severus Potter has several alternate names used by authors to refer to him, he
is involved in 30 slash (romantic and/or erotic) relationships, and most commonly (17 times)
with the character Scorpius Malfoy.
5
Date of the submission: 2024-10-15.
466
Figure 5: Characters Perspective
3.4. Literary Quality View
Lastly, we calculated scores modelling the literary quality (and readability) of the texts asso-
ciated to the stories in the knowledge graph.6 These measures include: Flesch Reading Ease,
Flesch-Kincaid Grade Level, SMOG Readability Formula, Automated Readability Index, New
Dale–Chall Readability Formula, for readability; and sentence-length, type-token ratio, and
compressibility, for stylistic complexity (adapted from [2]). Figure 7 shows an overview of the
available data fields in the literary quality view. In Figure 6, we illustrate how this extracted
features can be combined with other information for data exploration, e.g. looking at the asso-
ciation of individual measures of literary quality with reader response. Kudos is the reader’s
appreciation measure used on AO3 and it seems to be positively correlated with the average
word entropy of stories [cfr. 8].
6
We adopt the term literary quality from [1], recognizing that the proposed measures are not all-encompassing to
describe literary quality.
467
Figure 6: Average word entropy plotted against the number of kudos (dot size represents the number
of stories for each value on the x-axis, colors are added only for aesthetic purpose).
4. Querying the Data
While the triple store [10] remains available at http://graph.golemlab.eu:8890/sparql, the
knowledge graph can be queried via the same endpoint and the designated graph using WITH
GRAPH .
For example, the following query will yield all the stories in the knowledge graph with their
respective authors (anonymised) and the romantic category (e.g. F/M-relationships):
p r e f i x golem : < h t t p : / / g o l e m l a b . eu / graph / >
WITH GRAPH < h t t p : / / g o l e m l a b . eu / graph / >
SELECT ∗ WHERE
{
? s dcterms : t i t l e ? t i t l e .
? s dcterms : c r e a t o r ? author .
? s golem : r o m a n t i c C a t e g o r y ? c a t e g o r y .
}
468
Figure 7: Literary Quality Perspective
The result of this query can be found at this link.
5. Discussion
The GOLEM-UI is an intuitive interface that allows different insights into the GOLEM knowl-
edge graph. Four different search perspectives allow users who are not familiar with SPARQL
queries to gain insights into the data modelled from the domain of fanfiction stories. Four
predefined perspective are currently available: Metadata, Fandoms, Characters, and Literary
Quality. This interface, together with the possibility of exporting the results of queries in CSV
format will allow researchers to easily create corpora that they can use for their analyses of
the online production and reception of fiction.
The presented user interface is one step towards presenting the full-fledged GOLEM ontol-
ogy, which will include many more features than those presented here. For instance, it will
allow modeling events and event chains according to different narrative theories, e.g. the hylis-
tic approach [17, 16], more character relationships [15], and character traits.
Acknowledgements
The authors would like to thank Heikki Rantala for his helpful comments and support in creat-
ing the GOLEM-UI. This work is part of the Graphs and Ontologies for Literary Evolution Models
(GOLEM) project funded by the European Commission.
469
References
[1] Y. Bizzoni, I. M. Lassen, T. Peura, M. R. Thomsen, and K. Nielbo. “Predicting Literary
Quality How Perspectivist Should We Be?” In: Proceedings of the 1st Workshop on Per-
spectivist Approaches to NLP LREC2022. Ed. by G. Abercrombie, V. Basile, S. Tonelli, V.
Rieser, and A. Uma. Marseille, France: European Language Resources Association, 2022,
pp. 20–25. url: https://aclanthology.org/2022.nlperspectives-1.3.
[2] P. Feldkamp, Y. Bizzoni, I. M. S. Lassen, M. Rosendahl Thomsen, and K. Nielbo. “Read-
ability and Complexity: Diachronic Evolution of Literary Language Across 9000 Novels”.
In: Proceedings of the Joint 3rd International Conference on Natural Language Processing
for Digital Humanities and 8th International Workshop on Computational Linguistics for
Uralic Languages. Ed. by M. Hämäläinen, E. Öhman, F. Pirinen, K. Alnajjar, S. Miyagawa,
Y. Bizzoni, N. Partanen, and J. Rueter. Tokyo, Japan: Association for Computational Lin-
guistics, 2023, pp. 235–247. url: https://aclanthology.org/2023.nlp4dh-1.27.
[3] C. Fiesler, S. Morrison, and A. S. Bruckman. “An Archive of Their Own: A Case Study
of Feminist HCI and Values in Design”. In: Proceedings of the 2016 CHI Conference on
Human Factors in Computing Systems. Chi ’16. San Jose, California, USA: Association for
Computing Machinery, 2016, pp. 2574–2585. doi: 10.1145/2858036.2858409.
[4] F. Fischer, I. Börner, M. Göbel, A. Hechtl, C. Kittel, C. Milling, and P. Trilcke. “Pro-
grammable Corpora: Introducing DraCor, an Infrastructure for the Research on Euro-
pean Drama”. In: Proceedings of DH2019: ”Complexities”, Utrecht, July 9–12, 2019. Utrecht
University, 2019, pp. 1–6. doi: 10.5281/zenodo.4284002.
[5] Htrc. HTRC Derived Datasets - Documentation - HTRC Docs. 2023. url: https://wiki.htrc
.illinois.edu/display/COM/HTRC+Derived+Datasets.
[6] E. Hyvönen. “Digital Humanities on the Semantic Web: Sampo Model and Portal Series”.
In: Semantic Web – Interoperability, Usability, Applicability 14.4 (2023), pp. 729–744. doi:
10.3233/sw-223034.
[7] E. Ikkala, E. Hyvönen, H. Rantala, and M. Koho. “Sampo-UI: A Full Stack JavaScript
Framework for Developing Semantic Portal User Interfaces”. In: Semantic Web – Interop-
erability, Usability, Applicability 13.1 (2022), pp. 69–84. doi: 10.3233/sw-210428.
[8] M. Jacobsen, Y. Bizzoni, P. Feldkamp, and K. Nielbo. “Patterns of Quality: Comparing
Reader Reception Across Fanfiction and Published Literature”. In: Proceedings of the Com-
putational Humanities Research 2024. 2024, pp. X–x.
[9] Oecd. Derived data element. 2005. url: https://stats.oecd.org/glossary/detail.asp?ID=513
0.
[10] F. Pannach, X. Yang, N. V. Solissa, Z. Yu, A. Van Cranenburgh, M. Van Der Ree, and
F. Pianzola. “The GOLEM Triple Store: A Graph-based Representation of Narrative and
Fiction”. In: Joint Proceedings of the ESWC 2024 Workshops and Tutorials, ESWC-JP 2024.
CEUR Workshop Proceedings (CEUR-WS. org). 2024, pp. 1–9.
470
[11] F. Pianzola, A. Acerbi, and S. Rebora. “Cultural accumulation and improvement in online
fan fiction”. In: CHR 2020: Workshop on Computational Humanities Research, November
18–20, 2020, Amsterdam, The Netherlands. Vol. 2723. CEUR Workshop Proceedings, 2020,
pp. 2–11. url: http://ceur-ws.org/Vol-2723/short8.pdf.
[12] P. Riva, M. Žumer, and T. Aalberg. LRMoo, a high-level model in an object-oriented frame-
work. https://repository.ifla.org/handle/20.500.14598/2217. 2022.
[13] K. Schneider. A Study on the Relevance of Gender within the Shipping Phenomenon in the
Worlds of Fanfiction. Analysis of Relationship Patterns in Comparison to Canon Books with
Digital Humanities Methods. Master’s thesis. Mainz, 2024.
[14] C. Schöch, M. Hinzmann, J. Röttgermann, K. Dietz, and A. Klee. “Smart Modelling for
Literary History”. In: International Journal of Humanities and Arts Computing 16.1 (2022),
pp. 78–93. doi: 10.3366/ijhac.2022.0278.
[15] X. Yang and F. Pianzola. “Exploring the Evolution of Gender Power Difference through
the Omegaverse Trope on AO3 Fanfiction”. In: Proceedings of the Computational Human-
ities Research 2024. 2024, pp. X–x.
[16] C. Zgoll. “Myths as Polymorphous and Polystratic Erzählstoffe”. In: Mythische Sphären-
wechsel: Methodisch neue Zugänge zu antiken Mythen in Orient und Okzident. Berlin,
Boston: De Gruyter, 2020, pp. 9–82. doi: 10.1515/9783110652543-002.
[17] C. Zgoll. Tractatus mythologicus: Theorie und Methodik zur Erforschung von Mythen als
Grundlegung einer allgemeinen, transmedialen und komparatistischen Stoffwissenschaft.
Berlin, Boston: De Gruyter, 2019. doi: 10.1515/9783110541588.
471