<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Curated Datasets For Literary Tourism: A Case Study In Knowledge Graph Creation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Miriam Begliuomini</string-name>
          <email>miriam.begliuomini@unito.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marius Crisan</string-name>
          <email>marius.crisan@e-uvt.ro</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrico Daga</string-name>
          <email>enrico.daga@open.ac.uk</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rossana Damiano</string-name>
          <email>rossana.damiano@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florin Nechita</string-name>
          <email>florin.nechita@unitbv.ro</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurence Roussillon-Constanty</string-name>
          <email>laurence.roussillon-constanty@univ-pau.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Antonio Stranisci</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristina Trinchero</string-name>
          <email>cristina.trinchero@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Digital Scholarship for the Humanities (DISH) Centre, Università di Torino</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Informatica, Università di Torino</institution>
          ,
          <addr-line>Turin, corso Svizzera 185</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dipartimento di Lingue e Letterature Straniere e Culture Moderne, Università di Torino</institution>
          ,
          <addr-line>Turin, via S. Ottavio 18</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Knowledge Media Institute, The Open University</institution>
          ,
          <addr-line>Walton Hall, Milton Keynes, MK7 6AA</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Teacher Training Department, West University of Timi s</institution>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Universitatea Transilvania</institution>
          ,
          <addr-line>Bra s</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>European mountains have inspired generations of writers whose works can play a significant role in determining the touristic potential of the area. However, fragmentation of research and cultural initiatives about European mountains hinder this potential, even in the digital age. In this paper, we describe the use of the World Literature Knowledge Graph (WL-KG) to integrate the curated data sets of writers and works created by a set of research projects about European mountains as part of the CON.NE.C.T W.O.N.D.E.R.S. project using the SPARQL Anything library for triplification. The goal of the project is two-fold: on the one side, it aims at bridging local repositories of literary data, remodeling them according to a common model when needed, to overcome the fragmentation of the otherwise underrepresented research about the mountain areas across Europe. On the other side, it aims at creating applications that leverage the networked representation of literary, geographical and temporal data for the discovery and exploitation of new paths and connections in the field of literary tourism.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge graphs</kwd>
        <kwd>Triplification</kwd>
        <kwd>SPARQL Anything</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The intertwining between European mountains and their re-telling across centuries represents one
of the most significant heritages of our continent. These places have inspired generations of writers
whose works about the European mountain regions have a significant role in determining the touristic
potential of the area. Such a potential is not fully expressed, though, since fragmentation of the many
research and cultural initiatives about European mountains hinders its discovery by a wider target. In
addition, writers who have written about European mountains are often underrepresented in the oficial
narrative, and their works are little known to the general public and outside of academia. The main aim
of CON.NE.C.T W.O.N.D.E.R.S (CONnecting NEw Cultural Tourism WAys Open to Networking Digital
Experience in Representing Sites) project1 is to enhance the cultural heritage of European mountains
through a research framework that relies on the networked representation of writers, works and places.</p>
      <p>
        In CON.NE.C.T W.O.N.D.E.R.S the networked representation of relevant resources at the literary,
geographical and historical level is provided by the World Literature Knowledge Graph (WL-KG:
https://literaturegraph.di.unito.it) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The World Literature Knowledge Graph (WL-KG) currently
includes 194,346 authors and 971,120 works linked to Wikidata, Goodreads, and Open Library. Open to
the integration with literary archives of any size and scope, the WL-KG relies on the Underrepresented
Network Ontology (UR-ON) and is available through a visualization platform specifically conceived for
non-expert users that allows the discovery of writers and their work. In the context of the project it
has been adopted as a background to align existing digital resources about the project topics and to
integrate the new outputs emerging from the cooperation between the partners of the consortium.
      </p>
      <p>The ultimate goal of integrating datasets from ongoing and completed research projects about
European mountains in the WL-KG is aimed at two main objectives. First, it seeks to connect local
literary data repositories, restructuring them into a unified model when necessary, to address the
fragmentation of research on mountain regions across Europe. By doing so, common patterns and
connections between datasets are expected to emerge, with benefits for scholars that preserve, study
and disseminate Europe’s mountain heritage. Secondly, it aims to develop applications that use the
interconnected representation of literary, geographical and temporal data to discover and explore new
paths and connections able to reach a wider audience for the benefit of literary tourism.</p>
      <p>The integration of curated datasets collected by diferent research groups to research local literary
traditions can a be a bottleneck for knowledge graph maintenance, since it cannot be directly managed
by researchers from the humanities who are not familiar with RDF and graph-based representations
in general. In turn, this limitation can hinder the exploration of similarities and points of contacts
between literary data from diferent areas. In this paper, we describe the data gathering and integration
process designed and implemented for CON.NE.C.T W.O.N.D.E.R.S., which relies on the use of SPARQL
Anything, an open-source project that supports the triplification of diverse data sources without the
need of a domain vocabulary.</p>
      <p>The paper is structured as follows. After introducing the topic of digital archives in the literary field
in Section 2, we describe the project in Section 3. Section 4 provides the background about the WL-KG;
the data modeling and gathering, and the triplification process are described in Section 5. Section 6
provides an example of data integration. Conclusion and Future Work end the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>
        Several digital resources that provide information about literary works and writers are available online.
Wikidata [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a general-purpose KG which includes knowledge about writers and their works. Other
archives are domain-specific: Goodreads is a social cataloging website owned by Amazon, where readers
share their impressions about books. Open Library is a project of the Internet Archive2 where users can
borrow books. Among these three archives, only Wikidata relies on the Linked Open Data paradigm.
Open Library exposes its data through APIs, while Goodreads dismissed its APIs in 2020. This leads to
issues in data gathering and mapping, since there is no unified model to align these resources.
      </p>
      <p>
        Some digital archives are monographic and curated by teams of experts. It is the case of The European
Literary Text Collection3 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a multi-lingual dataset of novels written from 1848 to 1920; DraCor4 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a
collection of plays corpora in multiple languages; MiMoText5, a parallel corpus of French and German
novels published from 1750 to 1799.
      </p>
      <p>
        Other resources are more oriented to explore the intersection between people and society. The
Japanese Visual Media Graph6 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] gathers data about Japanese visual media (including manga and visual
novels) from communities of fans. The Orlando Textbase7 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a KG developed to explore feminist
literature. WeChangeEd8 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a KG of 1, 800 female editors born between 1710 and 1920 aligned with
Wikidata.
2https://archive.org
3https://www.distant-reading.net/eltec
4https://dracor.org
5https://mimotext.github.io
6https://jvmg.iuk.hdm-stuttgart.de
7https://www.artsrn.ualberta.ca/orlando
8https://www.wechanged.ugent.be
      </p>
      <p>In CON.NE.C.T W.O.N.D.E.R.S, we tap from this research methodology by using the World Literature
Knowledge Graph to alleviate two main issues. On the one side, we aim at bridging local repositories
of literary data, remodeling them according to a common model when needed, to overcome the
fragmentation of the research about the rural, mountain, cross-border areas across Europe. On the other
side, we aim at creating applications that leverage the networked representation of literary, geographical
and temporal data for the discovery and exploitation of new paths and connections in the field of literary
tourism.</p>
    </sec>
    <sec id="sec-3">
      <title>3. CON.NE.C.T W.O.N.D.E.R.S: Overview</title>
      <p>CON.NE.C.T W.O.N.D.E.R.S9 gathers four partners from the UNITA consortium10, namely University
of Turin (Italy), University of Pau (France), West University of Timisoara and Transilvania University
(Romania), all located in rural, mountain, and cross-border regions across Southern, Western and
Central-Eastern Europe. The overview of the project structure can be seen in Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Project objectives and methods</title>
        <p>To achieve the objective of experimenting a novel research framework that relies on the networked
representation of writers, works and places, each partner has identified a case study connected with
the research interests of its local team and the characteristics of its territory. The ultimate goal of
the project is to leverage this representation not only to study the local patterns of writers, works
and places and the connection over them at the cross-regional and cross-national level, but also to
design novel proposals for the valorization of the territory through literary tourism initiatives. The
latter may include, for example, itineraries consisting of locations mentioned in a given work, or point
of interests connected with the biographical events of a writer, with a preference for local and less
represented works and writers, with the ultimate goal of inspiring the creation of novel itineraries and
9https://connectwonders.di.unito.it/
10https://univ-unita.eu/
experiences that join territory and literature. To support the exploration of the case studies and the
design of tools for the valorization of literary heritage on a geographical basis, the consortium will
prototype a set of applications that leverage the networked representation of writers, works and places
to create itineraries, timelines and other types of interactive visualizations from the knowledge graph
in a semi-automatic fashion.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Case studies</title>
        <p>In the initial phase of the project, the case study explored by each partners have been identified in
cooperation with local stakeholders (museum, archives, and associations and professionals operating in
the touristic field):</p>
        <p>Case study 1 – Tradition Meets Technology: Innovative Solutions at Casa Mures, enilor Museum.
This case study revolves around Casa Mures, enilor Museum in Bra s,ov, home to one of Romania’s most
significant family archives—comprising over 25,000 documents (letters, oficial records, publications,
photographs). Casa Mures, enilor Museum11 is at the forefront of integrating technology to preserve and
promote cultural heritage. The Mures, ianu family, notable as the proprietors of Gazeta Transilvaniei,
the first political newspaper of Transylvanian Romanians, takes center stage through groundbreaking
projects that enhance the visitor experience through technology (use of virtual reality technologies to
ofer a captivating encounter with the museum’s archives, an interactive AI-powered avatar delivering
personalized information about the museum, exhibits, and history, a gamified Virtual Tour inviting
audiences to unravel the secrets of the Mure s,ianu family through an interactive digital narrative).</p>
        <p>
          Case study 2 – Revisiting the Pyrenees in sounds and pictures. This case study taps from the “RESPYR"
project, initiated in 2021, which studies the mountain landscape by crossing several approaches and
several views and by proposing a dialogue between specialists of diferent eras and disciplines [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The
starting point of the project is anchored in the nineteenth century and focuses on the study of accounts,
poems and drawings by British travelers in the Pyrenees and on the representation of the mountains by
major writers of the time, such as John Ruskin. The scientific challenge of this project is twofold: to
carry out research on a very local scale on the British presence in the Pyrenees and to participate in
larger-scale projects on the representations of the mountain landscape on a European scale in order to
participate in the development of landscape studies in the field of Anglophone studies.
        </p>
        <p>
          Case study 3 – Transylvania and the Banat in British travel writing. Seen through British travellers’
eyes in the nineteenth century, the Carpathians in the Banat region and in Transylvania are sources
of historical, geographic and ethnographic richness [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. English travel accounts have many common
features, ranging from the wilderness of the landscape, the greatness of the mountains and their sublime,
depicted in Major E. C. Johnson’s “On the Track of the Crescent", to the melancholy feeling stirred in
Charles Boner’s Transylvania: its products and its people. Some narratives are enriched with personal
sketches of animate or inanimate features, military men, local peasants or milk women dressed in
simple traditional costumes coming to Herculesbad or Băile Herculane, or the “magnificent scenery" on
the banks of the Danube, the bubbling waters and whirlpools through the Kasan Pass. On the other
hand, Transylvanian castles, such as Hunyadi Castle and the fortress of Deva, are depicted as imposing
places which fall into ruin and desolation.
        </p>
        <p>
          Case study 4 – Travel and literature: practices and authors in the French-speaking world of yesterday
and today. This case study proposes a reflection on tourism and literature encompassing two main
areas. The first focuses on literary tourism, with particular attention to the practices of trekking and
literary walks, which combine the physical experience of walking with the discovery of places evoked
by literature. The second explores the contribution of the Swiss writer Rodolphe Töpfer (19th century),
known for his “voyages en zigzag" in the Alps and the resulting writings, which interweave narration,
geography and autobiography, giving rise to an interesting reflection on the experience of the traveller
and the tourist [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ].
        </p>
        <p>In the current phase of the project, partners are carrying out the integration of the data collected
within each case study in the World Literature Knowledge Graph using a set of web forms specifically
designed to alleviate the task of translating the knowledge about literary facts into an RDF format,
leaving to conversion procedures the task of carrying out the translation from the ingestion format to
the format required by the knowledge graph. In parallel, we are developing and testing APIs that allow
applications to retrieve paths connecting writers, works and places from the knowledge graph.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. The World Literary Knowledge Graph</title>
      <p>
        The ecosystem of digital archives of literature is vast, but fragmented, and not all resources acknowledge
the Linked Data paradigm. For instance, there is no systematic mapping of writers’ pages on Wikidata
onto other sources such as OpenLibrary12 and Worldcat13. In addition, underrepresentation of minorities
is a long-lasting problem that afects both digital and traditional media. Silencing practices [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] relegated
ethnic minorities and non-Western people to a marginal role in textbooks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], movies [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and digital
archives [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>4.1. Knowledge Graph construction</title>
        <p>
          Developed to provide a data-driven representation of World Literature while reducing at the same time
the gap between mainstream literature and underrepresented writers and works, the World Literature
Knowledge Graph14 (WL-KG) [
          <xref ref-type="bibr" rid="ref1 ref16 ref17">16, 1, 17</xref>
          ] is a knowledge base of writers and their works gathered from
Wikidata and aligned with three external archives: OpenLibrary, Goodreads, and Google Books.
        </p>
        <p>The creation of the WL-KG relied on two main strategies. The first was based on the semantic
alignment of resources: writers extracted from Wikidata were aligned with two other public archives,
Open Library15 and Goodreads16, and mapped onto the identifiers from VIAF and Open Library; the
resulting graph was further augmented with literary works from Open Library and Goodreads (see
Figure 2). Aligning literary facts from diferent platforms in a single semantic resource allows for a
richer representation of the World Literature, with a more balanced knowledge about writers from
diferent areas, also thanks to the inclusion of the readers’ communities. The second strategy was based
on the automatic extraction of the writers’ biographical triples from English Wikipedia pages. The
pipeline combines the methodology described in [18], which relies on an annotated corpus of writers’
biographies called WikiBio, with an approach based on Lexico-Semantic Patterns [19] to automatically
extract relations belonging to four career-relevant properties for writers on Wikidata: ‘educated at’
(P69), ‘employer’ (P108), ‘award received’ (P166), and ‘nominated for’ (P1411).
12https://openlibrary.org
13https://www.worldcat.org
14Funded by the by Next Generation Internet (NGI) Search Programme - “Change the way we use and experience, search and
discover data and resources on the internet and web", 2022-2023.
15https://openlibrary.org/
16https://www.goodreads.com/</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. The UR-ON ontology network</title>
        <p>
          Reflecting the commitment of the WL-KG towards the mitigation of underrepresentation in the
literary domain, writers and books in the WL-KG are represented according to the Under-Represented
Ontology Network (UR-ON)17 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], a network of two domain ontologies aimed at encoding life events
of potentially under-represented writers and their works: The Under-Represented Writers Ontology
(URW-O), which allows drawing a link between a writer’s life and their cultural production, and the
the Under-Represented Books Ontology (URB-O), which introduces the publication event, a concept
that encodes a number of information about a work and its production process.
        </p>
        <p>URW-O provides the implementation of the biographical patterns to in order to represent two main
situations, namely the process of migrating, and the status of a person in a given country. Both are
embodied in a specific time interval, and this relation of time-dependency need to be formally expressed
for two reasons: on one side, it is essential to order life events in a chronological fashion; on the other
side, it allows drawing a link between a writer’s life, and their cultural production.</p>
        <p>URB-O is mapped onto the Functional Requirements for Bibliographic Records (FRBR) [20], a standard
for modeling the relationship between a work (frbr:Work), its expressions (frbr:Expression), and
manifestations (frbr:Manifestation). Following the FRBR ontology, we defined a work as an instance
of type frbr:Expression, which is described as the ‘intellectual or artistic realization of a work in
the form of alpha-numeric, musical, or choreographic notation’. We then defined the concept of
urb:Edition as a subclass of frbr:Manifestation, namely ‘the physical embodiment of an expression
of a work’. These two concepts are linked through the property frbr:embodiment. Each semantic
relation between an expression and its edition is wrapped in a urb:Publication pattern, which is a
subclass of a dul:Event, an event in DOLCE can be used as a reification to provide rich descriptions
of something that happens or occurs. Finally, the model integrates the pim:Reception pattern with
a number of attributes that are specific to the reception of literary works. Depending on the source
of knowledge from which a work is derived, it may have an average rating (urb:rated), a number of
ratings (urb:numberOfRatings), or a number of readers (urb:numberOfReaders)</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Graph visualization</title>
        <p>The fruition of the WL-KG is supported by a dedicated graphical interface, designed with the goal
of promoting the exploration of the connections between literary works. The navigation flow that
starts with an initial search for a topic of interest. Once a relevant topic is found, the user can drag the
resource onto the central board and explore its relationships with other objects and predicates, creating
a visual representation of the connections.</p>
        <p>By clicking on resources of type “Person” (as visible in Figure 3), the user can access information
about an author, including both direct relationships such as published works and indirect relationships
such as all the topics covered in their works, or a map of all the locations where their works were
published. Clicking on resources of type “Expression” (as visible in Figure 3) displays information
specific to a particular work, such as editions, languages, and readers ratings.</p>
        <p>The platform also allows subject-based navigation: users can browse all works linked to a specific
item from the urb:Folksonomy. The graph-based navigation encourages serendipitous discovery,
allowing users to stumble upon unexpected connections and relationships: for example, the Italian
writer Italo Calvino and the American writer Stephen King share the genre termed “speculative science
ifction novel" to which their respective books “The Nonexistent Knight" and “The Dark Tower" belong.</p>
        <p>
          The visualization platform has been evaluated with the help of domain experts who have used it
to perform search task and have been subsequently requested to fill in a questionnaire about their
experience (details can be found in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]).
17https://purl.archive.org/urwriters, https://purl.archive.org/urbooks
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Integrating curated datasets in the WL-KG</title>
      <sec id="sec-5-1">
        <title>5.1. Data modeling and gathering</title>
        <p>Each case study in CON.NE.C.T W.O.N.D.E.R.S brings with itself variations in the description of the
domain entities, which need to be reconciled with the data model provided by URW-O and URB-O on
a case by case basis. For example, the case study about Transylvania and the Banat does not include
websites as sources for data, diferently from the other case studies. Or else, the publication date is not
always known in the case study about the Alps.</p>
        <p>To do so, we have designed a preliminary set of forms for each of the core domain entities using the
domain ontologies as a reference, asking each partner to provide a set of example entries and map them
onto the basic forms to let discrepancies emerge. Based on this feedback, all requirements have been
merged to create a set of forms that include all the required fields for each case study, asking partners to
produce a set of guidelines for the ingestion of the various entries (books, writers, places) for each case
study (see Figure 5 for an example). For each form, a mapping with the URW-O and URB-O ontologies
has been defined. The rationale behind the decision to create a single set of forms is not only the need
to make the integration of records into the knowledge graph simpler, but also to create a common
ground to all case studies that facilitates the emergence of connections and similarities between them
since the data modeling and gathering phase.</p>
        <p>Once exported in CSV formats, records are imported into the knowledge graph using a set of scripts
that rely on SPARQL Anything for translating the input records into a set of RDF triples that fit the
patterns defined for each entity type, according to the pipeline described in Figure 4.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Data integration</title>
        <p>The W3C standard SPARQL 1.1 [21] is the reference language for interacting with RDF knowledge
graphs. SPARQL has methods for selecting, filtering, and aggregating data into tabular form. In addition,
SPARQL can also project the result into an RDF template using the CONSTRUCT query type. For this
reason, research has explored the application of SPARQL to a range of use cases broader than querying,
specifically, the integration of heterogeneous data sources [ 22, 23, 24]. In practice, approaches rely on
extending SPARQL to access data in non-RDF formats.</p>
        <p>PREFIX urw : &lt; h t t p s : / / p u r l . a r c h i v e . o r g / u r w r i t e r s # &gt;
PREFIX urb : &lt; h t t p s : / / p u r l . a r c h i v e . o r g / u r b o o k s # &gt;
PREFIX p r o v : &lt; h t t p : / / www. w3 . o r g / ns / p r o v # &gt;
urb : magyarland a urb : E x p r e s s i o n ;
p r o v : w a s A t t r i b u t e d T o urw : nina − e l i z a b e t h − m a z u c h e l l i ;
urb : s u b j e c t urb : p e s t ;
urb : g e n r e urb : t r a v e l − w r i t i n g ;
urb : s e t t i n g urw : b u d a p e s t ;
p r o v : wasDerivedFrom h t t p s : / / a r c h i v e . o r g / d e t a i l s / m a g y a r l a n d b e i n g 0 1 u n k n g o o g</p>
        <p>Recent research [25] proposes to rely on an intermediate RDF model, named Façade-X, whose
components can be transparently mapped to various file formats. This method allows building software
that provides indirect access to source data as-RDF, relieving knowledge engineers from the task
of dealing with the variety of formats and related languages they rely upon – re-engineering (i.e.
transforming resources by minimising domain considerations and focusing on the syntactical
metamodel), and letting them focus on the semantic lifting – remodelling (i.e. re-framing the original
domain model into a new one) [25]. Façade-X enables uniform access to a wide range of data formats
as-if they were RDF, including the popular CSV, JSON, HTML, and XML and it was successfully
applied to many complex scenarios, from scraping the content of Web sites, joining data from multiple
sources, and even building knowledge graphs from Music scores [26]. Façade-X can, in principle,
represent any format expressed in a BNF grammar, as well as the relational data model [27]. The
Façade-X approach is at the basis of the SPARQL Anything open source project, which also constitutes
the reference implementation18. The software has been applied in many real-world scenarios and is
receiving increasing attention from the KG community in both academia and industry19. The method is
accessible as a command-line interface, as a server, and from Java and Python code [28]20.</p>
        <p>The data acquisition and integration process is described in Figure 4. Users input information in a
Google form. The data is exported in tabular format. Next, a SPARQL Anything query extracts and
maps the data to the URW ontology, generating the output knowledge graph. Figure 7 shows the query
developed to extract information about books.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Example</title>
      <p>In this section we present an example of KG population based on the Case study 3 - Transylvania
and the Banat in British travel writing. Through the engagement with domain experts on this topic we
identified a gap in Wikidata about the literary work “Magyarland”: Being the Narrative of Our Travels
Through the Highlands and Lowlands of Hungary [29].</p>
      <p>Despite the existence of Nina Elizabeth Mazuchelli in Wikidata21, there are no mentions of her work
in this knowledge base. Therefore, partners from the Universitatea de Vest din Timisoara integrated the
World Literature KG with additional information about this literary work. To make their contribution
easier, the project team created a set of templates for data ingestion, are available on the oficial website
of the project22.</p>
      <p>As it can be observed in Figure 5, the web form contains facts about this work: besides the mention
of the work itself, experts added the topic, the genre, the place of setting and a relevant quote from
18http://sparql-anything.cc
19See the activity on the open-source project page on GitHub: http://github.com/sparql-anything/sparql.anything
20See also the extensive online documentation: https://sparql-anything.readthedocs.io/
21https://www.wikidata.org/wiki/Q56025933
22https://connectwonders.di.unito.it/contribute-kg/
the book. After the insertion we used SPARQL Anything to convert the knowledge provided by the
experts in triples (Figure 6) that were subsequently added to the KG, thus filling a gap in the existing
knowledge base.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and Future Work</title>
      <p>In this paper, we illustrated the process by which information about writers, books and locations
concerning mountain areas are integrated in an existing knowledge graph of world literature, the World
Literature Knowledge Graph, as part of the CON.NE.C.T W.O.N.D.E.R.S. project involving four European
universities from the UNITA alliance located in rural and mountain areas. In particular, we described
the procedure for ingesting the information about the diferent type of domain entities through web
based forms that alleviate the data gathering process for the domain experts, leaving to a set of SPARQL
Anything scripts the task of triplifing the input data according to the model provided by the reference
ontologies. The rationale behind this process in to overcome the fragmentation of the landscape of
mountain-related literature and make it available for the study and development of rural, mountain
and border areas across Europe. Most of the data sources for this endeavor, in fact, are currently not
represented in digital form or maintained in local repositories.</p>
      <p>As future work, we envisage two main activities: first, searching for common patterns and connections
between the case studies by leveraging the integrated representation provided by the World Literature
Knowledge Graph; second, using the extracted paths to create novel literary and touristic itineraries in
the areas studied by the project in a semi-automatic fashion.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
and Library science, IRCDL 2023, Bari, Italy, February 23-24, 2023, volume 3365 of CEUR Workshop
Proceedings, CEUR-WS.org, 2023, pp. 38–46. URL: https://ceur-ws.org/Vol-3365/short3.pdf.
[18] M. A. Stranisci, R. Damiano, E. Mensa, V. Patti, D. Radicioni, T. Caselli, et al., Wikibio: a semantic
resource for the intersectional analysis of biographical events, in: Proceedings of the 61st Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1,
Association for Computational Linguistics, 2023, pp. 12370–12384.
[19] M. A. Stranisci, V. Basile, R. Damiano, V. Patti, et al., Mapping biographical events to odps
through lexico-semantic patterns, in: Proceedings of the 12th Workshop on Ontology Design
and Patterns (WOP 2021) co-located with the 20th International Semantic Web Conference (ISWC
2021), volume 3011 of CEUR WORKSHOP PROCEEDINGS, CEUR-WS, 2021, pp. 1–12. URL: https:
//ceur-ws.org/Vol-3011/paper3.pdf.
[20] B. B. Tillett, Frbr and cataloging for the future, Cataloging &amp; classification quarterly 39 (2005)
197–205.
[21] S. Harris, A. Seaborne, SPARQL 1.1 Query Language, W3C Recommendation, W3C, 2013.</p>
      <p>Https://www.w3.org/TR/2013/REC-sparql11-query-20130321/.
[22] R. Cyganiak, Tarql (sparql for tables): Turn csv into rdf using sparql syntax, Technical Report,</p>
      <p>Technical Report, 2015. Available at: http://tarql.github.io, 2015.
[23] M. Lefrançois, A. Zimmermann, N. Bakerally, A sparql extension for generating rdf from
heterogeneous formats, in: Proc of ESWC, Springer, 2017, pp. 35–50.
[24] F. Michel, C. Faron-Zucker, F. Gandon, SPARQL micro-services: lightweight integration of web</p>
      <p>APIs and linked data, in: LDOW@ WWW, 2018.
[25] E. Daga, L. Asprino, P. Mulholland, A. Gangemi, Facade-X: an opinionated approach to SPARQL
anything, Studies on the Semantic Web 53 (2021) 58–73.
[26] M. Ratta, E. Daga, Knowledge graph construction from musicxml: An empirical investigation
with sparql anything, Proceedings of the first workshop on Musical Heritage Knowledge Graphs
(MHKG), co-located with the 21st International Semantic Web Conference (ISWC) (2022).
[27] L. Asprino, E. Daga, A. Gangemi, P. Mulholland, Knowledge graph construction with a façade: A
unified method to access heterogeneous data sources on the web, ACM Trans. Internet Technol.
(2023). URL: https://doi.org/10.1145/3555312. doi:10.1145/3555312.
[28] PySPARQL Anything Showcase, Adjunt proceedings of the Extended Semantic Web Conference
(ESWC), Posters and Demos (2024).
[29] N. E. Mazuchelli, " Magyarland;": Being the Narrative of Our Travels Through the Highlands and
Lowlands of Hungary, volume 1, London: S. Low, Marston, Searle, &amp; Rivington, 1881.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bernasconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ceriani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Damiano</surname>
          </string-name>
          ,
          <article-title>The world literature knowledge graph</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2023</year>
          , pp.
          <fpage>435</fpage>
          -
          <lpage>452</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Schöch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Odebrecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Primorac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tonra</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Poniž</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Kanellopoulou</surname>
          </string-name>
          ,
          <article-title>Distant reading for european literary history. a cost action</article-title>
          ,
          <source>Proceedings of DH2018</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Börner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Göbel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hechtl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kittel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Milling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Trilcke</surname>
          </string-name>
          ,
          <article-title>Programmable corpora: Introducing dracor, an infrastructure for the research on european drama</article-title>
          ,
          <source>Digital Humanities</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
          <article-title>5</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pfefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Japanese visual media graph: Providing researchers with data from enthusiast communities</article-title>
          ,
          <source>in: International Conference on Dublin Core and Metadata Applications</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>136</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Simpson</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Brown,</surname>
          </string-name>
          <article-title>From xml to rdf in the orlando project</article-title>
          ,
          <source>in: 2013 International Conference on Culture and Computing</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>194</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Van Remoortel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Birkholz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alesina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bezari</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. D'Eer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Forestier</surname>
          </string-name>
          , Women editors in europe,
          <source>Journal of European Periodical Studies</source>
          <volume>6</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Roussillon-Constanty</surname>
          </string-name>
          ,
          <article-title>Revoir les pyrénées: Le voyage aux eaux sous la plume des voyageuses britanniques, Oltre la crisi. Il patrimonio ambientale e culturale transfrontaliero: sfide, potenziale, prospettive (</article-title>
          <year>2024</year>
          )
          <fpage>51</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Crişan</surname>
          </string-name>
          ,
          <article-title>19th century oradea: The reflections of a multiethnic city in british travel literature</article-title>
          .,
          <source>Romanian Review on Political Geography/Revista Română de Geografie Politică</source>
          <volume>13</volume>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Trinchero</surname>
          </string-name>
          , et al.,
          <article-title>«je ne suis pas un touriste»: i patrimoni culturali del viaggio in italia di jean giono, in: Valorizzazione della macroarea italo-francese per un turismo sostenibile</article-title>
          . Riflessi culturali, sociali ed economici, volume
          <volume>8</volume>
          ,
          <article-title>Edizioni della Associazione Culturale Antonella SalvaticoCentro</article-title>
          . . . ,
          <year>2023</year>
          , pp.
          <fpage>115</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Begliuomini</surname>
          </string-name>
          , et al.,
          <article-title>Zig-zag fra le alpi di rodolphe töpfer, in: Valorizzazione della macroarea italo-francese per un turismo sostenibile</article-title>
          . Riflessi culturali, sociali ed economici,
          <source>Edizioni della Associazione Culturale Antonella Salvatico</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Spivak</surname>
          </string-name>
          ,
          <article-title>Can the subaltern speak?, in: Colonial discourse and post-colonial theory</article-title>
          ,
          <source>Routledge</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wolf</surname>
          </string-name>
          , Minorities in us history textbooks,
          <source>1945-1985, The Clearing House</source>
          <volume>65</volume>
          (
          <year>1992</year>
          )
          <fpage>291</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Erigha</surname>
          </string-name>
          , Race, gender, hollywood
          <article-title>: Representation in cultural production and digital media's potential for change</article-title>
          ,
          <source>Sociology compass 9</source>
          (
          <year>2015</year>
          )
          <fpage>78</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Brückner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Naslund</surname>
          </string-name>
          ,
          <article-title>Who counts as a notable sociologist on Wikipedia? Gender, race, and the “professor test”</article-title>
          ,
          <source>Socius</source>
          <volume>5</volume>
          (
          <year>2019</year>
          )
          <fpage>2378023118823946</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Damiano</surname>
          </string-name>
          , et al.,
          <article-title>Representing the under-represented: A dataset of post-colonial, and migrant writers</article-title>
          , in: D.
          <string-name>
            <surname>Gromann</surname>
            , G. Sérasset,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gracia</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bosque-Gil</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bobillo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Heinisch (Eds.),
          <source>3rd Conference on Language, Data and Knowledge, LDK 2021, September 1-3</source>
          ,
          <year>2021</year>
          , Zaragoza, Spain, volume
          <volume>93</volume>
          of OASIcs, Schloss Dagstuhl - LeibnizZentrum
          <source>für Informatik</source>
          ,
          <year>2021</year>
          , pp.
          <volume>7</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          :
          <fpage>14</fpage>
          . URL: https://doi.org/10.4230/OASIcs.LDK.
          <year>2021</year>
          .
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Damiano</surname>
          </string-name>
          ,
          <article-title>User-generated world literatures: a comparison between two social networks of readers</article-title>
          , in: A.
          <string-name>
            <surname>Falcon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ferilli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bardi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Marchesin</surname>
          </string-name>
          , D. Redavid (Eds.),
          <source>Proceedings of the 19th The Conference on Information and Research</source>
          science Connecting to Digital
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>