The GOLEM-Knowledge Graph and Search Interface: Perspectives into Narrative and Fiction Franziska Pannach1,∗ , Luotong Cheng1 and Federico Pianzola1 1 Centre for Language and Cognition, University of Groningen, The Netherlands Abstract This contribution presents the GOLEM Knowledge Graph and interface, offering different perspectives into content-related data and metadata from the domain of fanfiction narratives. The Knowledge Graph is aligned with common ontologies and vocabularies from the domains of narrative and cultural heritage. In this short paper, we outline how narrative organization and characters’ features are modelled in the GOLEM knowledge graph. The GOLEM UI is also presented, a user-friendly access point to the data that allows to browse the knowledge graph even without knowledge of SPARQL. Keywords narrative structure, knowledge graphs, literature and fiction, semantic web technology, fanfiction 1. Introduction One of the main aims of the GOLEM (Graph Ontologies for Literary Evolution Models) project is to build an ontology1 that can be used to model narratives independent of their domain of ap- plication (e.g. fiction or news), their association with a literary tradition, or their geographical and cultural contexts. The formal semantic model is also designed to be language-agnostic and independent of the format of the stories. Such a model should be able to express how narra- tive elements (e.g. events), characters, and their individual representations, as well as readers’ engagement and literary evolution, are related to each other. The theoretical framework guid- ing the creation of the model is grounded in literary theory, narratology, and best practices of formal ontology design. The ontology is used to model the data of the GOLEM triple store [10], which contains over eight million stories, into a knowledge graph (KG). The KG contains a subset of ca. 19.000 stories from the original triple store, and will be extended continuously in the near future. At the moment, the modelled data are fanfiction stories from the popular online-platform Archive of Our Own (AO3) [3]2 . This particular genre of stories holds immense potential, not only as a case study for modeling the literary domain, but also for in-depth study of user-produced narratives, reader response, semantic and narrative modeling approaches, and for the devel- opment of natural language processing (NLP) tools. Within the communities associated with CHR 2024: Computational Humanities Research Conference, December 4–6, 2024, Aarhus, Denmark ∗ Corresponding author. £ f.a.pannach@rug.nl (F. Pannach); skylarcheng585@gmail.com (L. Cheng); f.pianzola@rug.nl (F. Pianzola) ȉ 0000-0003-4216-8410 (F. Pannach); 0009-0002-6567-8923 (L. Cheng); 0000-0001-6634-121X (F. Pianzola) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 https://ontology.golemlab.eu/ 2 https://archiveofourown.org 462 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings a specific fandom, fanfiction works are polyvocal interpretations of shared stories, internet folklore that is produced at a growing pace [11]. The works are polyvocal in the sense that, through a work, reader and writer are in dialogue with a canonical narrative universe that in- spired it, but they are also uniquely engaged with each other. Not only in the traditional cycle of production-reception, but also through active and immediate engagement via comments or other interactions (e.g. kudos). This engagement can be measured and subsequently used to study the evolution of (cultural) traits of literary works, e.g. the appearance (and disappear- ance) of certain character traits, or the change in character roles (villain-to-hero, enemy-to- lover, secondary-to-main character). In order to make these valuable data more easily accessible to interested researchers and other stakeholders, we created the GOLEM-UI, an easy-to-use interface based on the SAMPO framework [7, 6].3 This is an example of good practice also adopted by other Digital Humanities projects [4]. 2. Domain Modelling In the knowledge graph, a subset of the triple store data has been modelled. Fanfiction stories and canonical works are represented as instances of lrm:F1_Work [12]. Characters have two different class representations: gc:G1_Character (a crm:E89_Propositional_Object) for instances that appear in a specific story, and gc:G0_Character-Stoff [17, 16] (a crm:E28_Conceptual_Object) that refers to all the possible variations (Stoff ) of a character. This allows modelling the relationship between an instance of a character in a specific version of a narrative material, e.g. Harry Potter in the novel Harry Potter and the Philosopher’s Stone, and the general idea of the character that appears in different books and is thus a set of various physical or biographical features, and personality traits. The second main aspect to be modelled is that of social relations. In fanfiction, romantic or sexual pairings are expressed through “Character/Character”-relationships (so-called “slash- ing”). This category is of special interest, because it allows users to investigate the recurrent use of gender-specific features and character pairings, as well as their influence on the pop- ularity of a story both within a specific fandom and across fandoms in comparison with the canonical relationships [13]. Additional metadata are modelled with DCMI Metadata Terms4 , e.g. dct:title or dct:creator. At the moment, domain-specific terms, such as the number of kudos – a form of user-interaction similar to ‘likes’ – are modelled using project-specific categories, e.g. golem:numberOfKudos. This will be updated in the future once the KG makes full use of the GOLEM ontology. For copyright reasons, the KG (as well as the triple store) does not provide access to the fulltexts of the fanfiction stories. This is consistent with the goal of a large scale study of fiction using derived features [9, 5, 14]. 3 The interface is available at http://search.golemlab.eu:3006/. 4 https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ 463 Figure 1: Distribution of stories per language 3. Search Perspectives SAMPO offers a framework to create different focused perspectives on the data. The queries are predefined. Therefore, end-users do not need to be proficient in SPARQL. Table 1 gives an overview of the current state of the KG, while Figure 1 provides a breakdown of stories by language. Table 1 Current statistics of the knowledge graph (2024-10-15) Class Number of Instances Stories 18,710 Fandoms 1,698 Languages 27 Fanfiction Characters (total, distinct) 6,637 Potterverse Characters (distinct) 523 Potterverse Character Stoffe (distinct) 253 The GOLEM-UI search perspectives include four distinct views into the KG: 1. Metadata view: Overview of the stories’ metadata, 2. Fandoms view: Statistics on fandoms and associated stories, 3. Characters view: A closer look into characters and their various romantic pairings, for now with a special focus on the Harry Potter fandom (Potterverse), 4. Literary quality view: Statistics on readability and literary quality of the stories. 464 3.1. Metadata View The metadata view gives an overview over the common properties of the stories, such as the word count, the language of the work, or the associated fandom (Figure 3). Facets on the left hand side of the search perspective allow users to filter data by language and fandom. 3.2. Fandoms View The fandoms view gives statistical overview over the representation of specific narrative uni- verses, e.g. how many stories are associated with the respective fandom (Figure 4). Fandoms are represented by their skos:prefLabel, which in some cases can be a translation of the title. Sub-fandoms are connected to their more general fandom (e.g. Harry Potter and the Philoso- pher’s Stone to the fandom Harry Potter - J.K. Rowling) by skos:broader/skos:narrower relations. 3.3. Characters View The characters view shows a character’s aliases (e.g. Voldemort, Lord Voldemort, Tom Riddle and so on), keywords that are associated to a certain character (i.e. all the possible variations a char- acter can have), and the number of stories a specific version of the character appears in. In order to link between them all the versions of a character, we introduce the class G0_Character-Stoff a crm:E28_Conceptual_Object. This represents the general idea of a character, e.g. Voldemort. Specific realisations of a character, e.g. Voldemort in Harry Potter and the Half Blood Prince, are represented as individuals of G1_Character a crm:E89_Propositional_Object. The GUI shows characters and their associated synonyms from the Potterverse fandom, as well as the number of social relationships in which a character is involved. This statistic gives us an insight into which characters are popular love-interests in a narrative universe or with Figure 2: Landing page of the GOLEM-UI giving access to the four individual data perspectives 465 Figure 3: Story Metadata Perspective Figure 4: Fandoms Perspective whom they interact more frequently. For instance, Figure 5 shows that currently5 , the char- acter of Albus Severus Potter has several alternate names used by authors to refer to him, he is involved in 30 slash (romantic and/or erotic) relationships, and most commonly (17 times) with the character Scorpius Malfoy. 5 Date of the submission: 2024-10-15. 466 Figure 5: Characters Perspective 3.4. Literary Quality View Lastly, we calculated scores modelling the literary quality (and readability) of the texts asso- ciated to the stories in the knowledge graph.6 These measures include: Flesch Reading Ease, Flesch-Kincaid Grade Level, SMOG Readability Formula, Automated Readability Index, New Dale–Chall Readability Formula, for readability; and sentence-length, type-token ratio, and compressibility, for stylistic complexity (adapted from [2]). Figure 7 shows an overview of the available data fields in the literary quality view. In Figure 6, we illustrate how this extracted features can be combined with other information for data exploration, e.g. looking at the asso- ciation of individual measures of literary quality with reader response. Kudos is the reader’s appreciation measure used on AO3 and it seems to be positively correlated with the average word entropy of stories [cfr. 8]. 6 We adopt the term literary quality from [1], recognizing that the proposed measures are not all-encompassing to describe literary quality. 467 Figure 6: Average word entropy plotted against the number of kudos (dot size represents the number of stories for each value on the x-axis, colors are added only for aesthetic purpose). 4. Querying the Data While the triple store [10] remains available at http://graph.golemlab.eu:8890/sparql, the knowledge graph can be queried via the same endpoint and the designated graph using WITH GRAPH . For example, the following query will yield all the stories in the knowledge graph with their respective authors (anonymised) and the romantic category (e.g. F/M-relationships): p r e f i x golem : < h t t p : / / g o l e m l a b . eu / graph / > WITH GRAPH < h t t p : / / g o l e m l a b . eu / graph / > SELECT ∗ WHERE { ? s dcterms : t i t l e ? t i t l e . ? s dcterms : c r e a t o r ? author . ? s golem : r o m a n t i c C a t e g o r y ? c a t e g o r y . } 468 Figure 7: Literary Quality Perspective The result of this query can be found at this link. 5. Discussion The GOLEM-UI is an intuitive interface that allows different insights into the GOLEM knowl- edge graph. Four different search perspectives allow users who are not familiar with SPARQL queries to gain insights into the data modelled from the domain of fanfiction stories. Four predefined perspective are currently available: Metadata, Fandoms, Characters, and Literary Quality. This interface, together with the possibility of exporting the results of queries in CSV format will allow researchers to easily create corpora that they can use for their analyses of the online production and reception of fiction. The presented user interface is one step towards presenting the full-fledged GOLEM ontol- ogy, which will include many more features than those presented here. For instance, it will allow modeling events and event chains according to different narrative theories, e.g. the hylis- tic approach [17, 16], more character relationships [15], and character traits. Acknowledgements The authors would like to thank Heikki Rantala for his helpful comments and support in creat- ing the GOLEM-UI. This work is part of the Graphs and Ontologies for Literary Evolution Models (GOLEM) project funded by the European Commission. 469 References [1] Y. Bizzoni, I. M. Lassen, T. Peura, M. R. Thomsen, and K. Nielbo. “Predicting Literary Quality How Perspectivist Should We Be?” In: Proceedings of the 1st Workshop on Per- spectivist Approaches to NLP LREC2022. Ed. by G. Abercrombie, V. Basile, S. Tonelli, V. Rieser, and A. Uma. Marseille, France: European Language Resources Association, 2022, pp. 20–25. url: https://aclanthology.org/2022.nlperspectives-1.3. [2] P. Feldkamp, Y. Bizzoni, I. M. S. Lassen, M. Rosendahl Thomsen, and K. Nielbo. “Read- ability and Complexity: Diachronic Evolution of Literary Language Across 9000 Novels”. In: Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages. Ed. by M. Hämäläinen, E. Öhman, F. Pirinen, K. Alnajjar, S. Miyagawa, Y. Bizzoni, N. Partanen, and J. Rueter. Tokyo, Japan: Association for Computational Lin- guistics, 2023, pp. 235–247. url: https://aclanthology.org/2023.nlp4dh-1.27. [3] C. Fiesler, S. Morrison, and A. S. Bruckman. “An Archive of Their Own: A Case Study of Feminist HCI and Values in Design”. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. Chi ’16. San Jose, California, USA: Association for Computing Machinery, 2016, pp. 2574–2585. doi: 10.1145/2858036.2858409. [4] F. Fischer, I. Börner, M. Göbel, A. Hechtl, C. Kittel, C. Milling, and P. Trilcke. “Pro- grammable Corpora: Introducing DraCor, an Infrastructure for the Research on Euro- pean Drama”. In: Proceedings of DH2019: ”Complexities”, Utrecht, July 9–12, 2019. Utrecht University, 2019, pp. 1–6. doi: 10.5281/zenodo.4284002. [5] Htrc. HTRC Derived Datasets - Documentation - HTRC Docs. 2023. url: https://wiki.htrc .illinois.edu/display/COM/HTRC+Derived+Datasets. [6] E. Hyvönen. “Digital Humanities on the Semantic Web: Sampo Model and Portal Series”. In: Semantic Web – Interoperability, Usability, Applicability 14.4 (2023), pp. 729–744. doi: 10.3233/sw-223034. [7] E. Ikkala, E. Hyvönen, H. Rantala, and M. Koho. “Sampo-UI: A Full Stack JavaScript Framework for Developing Semantic Portal User Interfaces”. In: Semantic Web – Interop- erability, Usability, Applicability 13.1 (2022), pp. 69–84. doi: 10.3233/sw-210428. [8] M. Jacobsen, Y. Bizzoni, P. Feldkamp, and K. Nielbo. “Patterns of Quality: Comparing Reader Reception Across Fanfiction and Published Literature”. In: Proceedings of the Com- putational Humanities Research 2024. 2024, pp. X–x. [9] Oecd. Derived data element. 2005. url: https://stats.oecd.org/glossary/detail.asp?ID=513 0. [10] F. Pannach, X. Yang, N. V. Solissa, Z. Yu, A. Van Cranenburgh, M. Van Der Ree, and F. Pianzola. “The GOLEM Triple Store: A Graph-based Representation of Narrative and Fiction”. In: Joint Proceedings of the ESWC 2024 Workshops and Tutorials, ESWC-JP 2024. CEUR Workshop Proceedings (CEUR-WS. org). 2024, pp. 1–9. 470 [11] F. Pianzola, A. Acerbi, and S. Rebora. “Cultural accumulation and improvement in online fan fiction”. In: CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The Netherlands. Vol. 2723. CEUR Workshop Proceedings, 2020, pp. 2–11. url: http://ceur-ws.org/Vol-2723/short8.pdf. [12] P. Riva, M. Žumer, and T. Aalberg. LRMoo, a high-level model in an object-oriented frame- work. https://repository.ifla.org/handle/20.500.14598/2217. 2022. [13] K. Schneider. A Study on the Relevance of Gender within the Shipping Phenomenon in the Worlds of Fanfiction. Analysis of Relationship Patterns in Comparison to Canon Books with Digital Humanities Methods. Master’s thesis. Mainz, 2024. [14] C. Schöch, M. Hinzmann, J. Röttgermann, K. Dietz, and A. Klee. “Smart Modelling for Literary History”. In: International Journal of Humanities and Arts Computing 16.1 (2022), pp. 78–93. doi: 10.3366/ijhac.2022.0278. [15] X. Yang and F. Pianzola. “Exploring the Evolution of Gender Power Difference through the Omegaverse Trope on AO3 Fanfiction”. In: Proceedings of the Computational Human- ities Research 2024. 2024, pp. X–x. [16] C. Zgoll. “Myths as Polymorphous and Polystratic Erzählstoffe”. In: Mythische Sphären- wechsel: Methodisch neue Zugänge zu antiken Mythen in Orient und Okzident. Berlin, Boston: De Gruyter, 2020, pp. 9–82. doi: 10.1515/9783110652543-002. [17] C. Zgoll. Tractatus mythologicus: Theorie und Methodik zur Erforschung von Mythen als Grundlegung einer allgemeinen, transmedialen und komparatistischen Stoffwissenschaft. Berlin, Boston: De Gruyter, 2019. doi: 10.1515/9783110541588. 471