<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Conference On Museum Big Data, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Combining LLMs and Hundreds of Knowledge Graphs for Data Enrichment, Validation and Integration Case Study: Cultural Heritage Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michalis Mountantonakis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manos Koumakis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yannis Tzitzikas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Crete</institution>
          ,
          <addr-line>Heraklion</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science</institution>
          ,
          <addr-line>FORTH, Heraklion</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Recently, there is a high trend for exploiting Large Language Models (LLMs) and Knowledge Graphs (KGS) for providing better access services and analytics, and for aiding user experience for the Cultural Heritage (CH) domain. In this direction, we discuss the corresponding challenges and potential use cases for both simple users and data owners for the CH domain. Afterwards, we present the research prototype GPToLODS+ (and its services), which ofers large scale knowledge services that exploit the capabilities of hundreds of KGs and LLMs for several domains (including the CH one). In particular, it combines ChatGPT, LODsyndesis KG (that aggregates hundreds of RDF KGs) and Entity Recognition tools, for ofering the following functionality: a) Question Answering using ChatGPT and hundreds of RDF KGs, b) Entity Recognition, Linking and Enrichment over the ChatGPT responses (or any given plain text) using LODsyndesis, c) Fact Validation of ChatGPT responses (or web texts) with provenance using LODsyndesis, d) Connectivity Analytics and Integration with existing RDF KGs at real time through LODChain service, e.g., for aiding data publishing and discoverability, and others. Finally, we provide dedicated examples over the CH domain for the use cases by exploiting GPToLODS+ and its services using real entities and CH KGs, by mainly focusing on CIDOC-CRM based KGs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RDF Knowledge Graphs</kwd>
        <kwd>LLMs</kwd>
        <kwd>Cultural Heritage</kwd>
        <kwd>Fact Checking</kwd>
        <kwd>Discoverability</kwd>
        <kwd>Reusability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        There is a high proliferation of using Large Language Models (LLMs) for aiding several tasks of any
domain, including Cultural Heritage (CH), such as Question Answering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Digital Storytelling [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
Entity Recognition and Enrichment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and many others [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For the CH domain, the ultimate target
is to enhance the user experience through access services like interactive Artificial Intelligence (AI)
chatboxes and web applications [
        <xref ref-type="bibr" rid="ref2 ref5">5, 2</xref>
        ], i.e., for aiding users to find resources on the websites, for ofering
digital storytelling into museum visits and others. LLMs can be of primary importance for the CH
domain, since they have been trained from web texts including CH data (see the left side of Fig. 1),
they are very creative, they ofer human-like expressiveness and are suitable for the mentioned tasks.
However, LLMs do not provide justifications for the responses, and they are vulnerable to hallucinations,
including erroneous and outdated facts [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        On the contrary, there are available thousands of Knowledge Graphs (KGs) having structured data
with high correctness and information about their provenance [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], including numerous KGs for the CH
domain (right side of Fig. 1). For instance, see a list of such KGs in Linked Open Data (LOD) Cloud [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and in a portal [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for KGs using the ISO standard CIDOC-CRM model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. However, a disadvantage is
that it is not trivial to ask questions over such KGs, i.e., it is required either to be familiar with Linked
Data technologies and SPARQL query language [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], or to provide user-friendly interfaces and access
services for aiding the browsing from simple users. As it is shown in Fig. 1 the ultimate challenge is how
to combine the advantages of both LLMs and KGs, for providing better access services and analytics
and improving user experience (lower side of Fig. 1).
      </p>
      <p>
        Below, we discuss the challenges concerning the CH domain for LLMs and KGs (see the right side
of Fig. 2). First of all, we distinguish two types of users, a) simple users, i.e., visitors of museums,
web users of digital libraries, etc., and b) data owners (of CH domain), i.e., responsible persons of a
museum, a digital library, etc. Regarding the challenges, for any type of users i) it is not trivial to find
more information (with provenance) about the entities of interest of an LLM response (e.g., of CH
domain), since the LLMs are not directly connected with the several existing KGs (including KGs of
the CH domain), and ii) it is dificult to validate (CH) facts from ChatGPT responses or from web texts,
since evidence and provenance are not always provided. Moreover, iii) it is time consuming for a data
publisher to generate a KG (from text) and it is dificult to discover relevant KGs (or datasets), given the
high number of available KGs and the several data integration problems that should be faced [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], for
providing a KG that will be connected with existing high quality KGs of the LOD Cloud. This problem
exists even iv) with KGs of a specific domain that use the same model, such as CIDOC-CRM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Towards this objective, we present the research prototype GPToLODS+ (and its underlying services),
which provides both a user-friendly web application and a REST API, for enabling several services
(which are related to the presented challenges): a) Entity Recognition, Linking and Enrichment for the
entities found in the ChatGPT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] responses through a dialogue-based user interface, b) Fact Validation
over the ChatGPT responses with provenance from 400 RDF KGs from the LODsyndesis aggregated KG
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], c) KG generation from text (either from a ChatGPT response or plain text) and d) Connectivity
Analytics and Integration with real RDF datasets for the generated KG. Concerning our contribution,
we focus on providing use cases for the CH domain for both simple users and data owners, we present
the services of GPToLODS+ and we provide real examples for the use cases by using GPToLODS+.
Finally, we present services for CIDOC-CRM based KGs, by presenting a real example with connectivity
analytics. As regards the novelty, to the best of our knowledge, there is not any other suite of services
ofering a dialogue-based user interface and a REST API that exploits hundreds of RDF KGs for enriching
and validating ChatGPT responses.
      </p>
      <p>The rest of this paper is organized as follows: §2 presents the desired use cases for the CH domain
for both users and publishers. §3 discusses the related work, whereas §4 presents the steps and services
of GPToLODs+. Finally, §5 shows how to perform the desired use cases using the presented services
and §6 concludes the paper and discusses future directions.
2. Use Cases over Cultural Heritage (for LLM and KGs)
Based on the presented challenges, we provide four use cases, two for each type of users, which are
covered by the GPToLODS+.</p>
      <p>UC1. Browsing more Information about the Entities of Interest from KGs. As it is shown in
the upper side of Fig. 2, the user asks an LLM about the Kritios Boy sculpture1 and retrieves a response
from the LLM. However, the user desires to find more facts, links (URIs) and KGs about it and the
museum where it belongs to (see UC1 in the upper right side of Fig. 2).</p>
      <p>
        UC2. Fact Validation (over the LLM response or a web text). Here, the same user (see UC2 in
Fig. 2) desires to check the validity of the fact(s) returned by the LLM (since provenance usually is
not given [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), whereas the same user can also desire to check the validity of facts (e.g., related to CH
domain) from texts of web pages.
      </p>
      <p>
        UC3. KG Generation, Connectivity Analytics and Data Integration. Here, the data owner (e.g.,
of a museum or a digital library) first desires to create an RDF KG. In many cases, the publisher needs
to combine more than one files in many formats to create the KG, e.g., CSV files, text files, existing RDF
triples and others, and then to integrate all of them by using a given standard such as CIDOC-CRM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Afterwards, the data publisher desires to discover more KGs (or datasets) containing complementary
information about the same entities (see UC3 in Fig. 2), i.e., to create an enriched version of his/her KG
connected with other high quality KGs. This will enable the execution of more complex queries and
will make the dataset more discoverable and reusable for other users and publishers.
      </p>
      <p>UC4. Data Publishing to CH portals. By having created the RDF KG, the data owner desires to
publish the KG into dedicated CH portals, for improving its discoverability over the CH community,
and for finding even more analytics with relevant KGs (see UC4 in Fig. 2).</p>
    </sec>
    <sec id="sec-2">
      <title>3. Related Work</title>
      <p>Here, we focus on approaches using CH data with KGs (see §3.1) and LLMs (see §3.2), whereas we
provide a comparison with related approaches (see §3.3).
1https://www.theacropolismuseum.gr/en/youth-statue-kritios-boy</p>
      <sec id="sec-2-1">
        <title>3.1. Cultural Heritage and KGs</title>
        <p>
          CH is one of the most successful application domains of the Semantic Web technologies [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], i.e., there
are numerous KGs about CH, and many of them can be listed in online catalogs and portals, such as
the LOD Cloud [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and the CIDOC-CRM portal [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] (containing KGs that have been modeled through
the ISO standard CIDOC-CRM). The KGs of the CH domain include museums [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], digital libraries
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], historical archives [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], archaeological excavation [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and others. Indicatively, the CIDOC-CRM
portal [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] includes 30 KGs from the CH domain having more than 500 million RDF triples. The ultimate
target of creating such KGs is to provide better access services to the users and analytics, including
Browsing systems [19], Question Answering [20, 21], Virtual Exhibitions to museums [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and Digital
Storytelling [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], whereas the constructed KGs can be very useful for Digital Humanities research [22].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Exploiting LLMs for the CH domain</title>
        <p>
          Similarly to the case of KGs, one of the ultimate challenges of using LLMs in the CH domain is to aid
the user experience [23], e.g., museum visitors, users that browse digital libraries, and others. First, [24]
surveys approaches that apply Machine Learning and Artificial Intelligence techniques for reducing
the costs related to the compliance and interoperability of Cultural Heritage KGs, by focusing on
CIDOC-CRM based KGs. Moreover, the authors in [25] studied the problem of connecting ArtGraph
KG and Wikidata through the aid of LLMs (and specifically LLaMa), for improving Entity Alignment to
enhance entity enrichment for ArtGraph KG. Furthermore, a context-aware visual QA system based on
multi-modal LLMs is presented in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], and its target is to ofer accurate answers to questions of CH
domain by also providing as input to the LLM the associated KG and corresponding images. In [26], the
authors presented an LLM-based virtual art guide, for enabling users to express inquiries about the
displayed artwork. Finally, [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] presents the system MAGICAL, which is a digital tour guide in museums
and it exploits the capabilities of GPT-4 for composing texts and dialogues with the visitor.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. Comparison with Related Work</title>
        <p>
          Concerning the applications exploiting LLMs, KGs, or both of them, we mainly focus on combining
hundreds of KGs and LLMs for ofering Entity Enrichment, Fact Checking and Connectivity Analytics
over LLM responses and web texts, and not on providing applications such as virtual exhibitions and
QA systems [
          <xref ref-type="bibr" rid="ref1 ref14">14, 1, 20, 21</xref>
          ]. However, it is worth noting that the research prototype that we present can
be possibly used by such systems, since it provides a REST API for enabling the reusing of services
programmatically.
        </p>
        <p>
          Compared to our previous work [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], here we give emphasis on how to combine LLMs and KGs for the
CH domain, by discussing challenges and by providing use cases and dedicated examples for diferent
types of users. Also, we provide a solution for these challenges by presenting the capabilities of the
up-to-date version of the GPToLODS+ prototype, which i) ofers a new dialogue-based user interface
where all the services are accessible on the same page for both LLM responses and web texts, ii) can be
connected to LODChain [27] for having access to connectivity analytics over the LOD Cloud, and iii)
ofers a REST API for aiding users to reuse the services in their (CH) applications. To the best of our
knowledge, this is the first suite of tools ofering such services by exploiting hundreds of RDF KGs over
LLM responses.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. The Services of GPToLODS+</title>
      <p>We present the steps and services of GPToLODS+, which are shown in Figure 3. The user can either
add as an input a question to ChatGPT, or a plain text. For the first case, the user can send a question
through a dialogue interface, and then GPToLODS+ sends a request to ChatGPT API and retrieves the
textual response. For the second case, the user can just add any text (e.g., found on the web). Afterwards,
the user is able to exploit the following services over the desired text: either A) Entity Recognition
and Linking through LODsyndesisIE, for enabling the enrichment of all the entities of the text, or to
perform B) Triples Generation, for retrieving the facts of the text in RDF format. The latter enables the
service to check the validity of the facts found in the text, and the connectivity analytics and integration
service through LODChain.</p>
      <p>Service</p>
      <p>Input
A. Question
Answering (Dialogue)</p>
      <p>User Questions in
natural language
B. Entity
Recognition &amp; Linking</p>
      <p>ChatGPT response
or plain text
B1. Entity
Enrichment
C. Triples (KG)
Generation from text
C1. Fact Checking
C2. Connectivity
Analytics and
Integration</p>
      <p>Entities URIs from
KGs (entities from
ChatGPT response
or plain text)
ChatGPT response
or plain text
(+optionally enhanced
with URIs)
RDF Triples (e.g.,
generated from
ChatGPT)
RDF KG</p>
      <p>Output
ChatGPT
sponse</p>
      <p>Tools Used
re</p>
      <p>ChatGPT</p>
      <p>LODsyndesisIE
Entities URIs
from KGs,
Enriched Version
of Text
More URIs, LODsyndesis
Datasets and
Facts (with
Provenance)
RDF Triples
(from
ChatGPT)</p>
      <p>ChatGPT
(+optionally
LODsyndesisIE)
GPToLODs
Fact Checking
Service</p>
      <p>LODChain
Corresponding
RDF triples
with
provenance from
RDF KGs
Visualizations,
Statistics,
Enriched version
of the input KG</p>
      <p>Data
Provenance
Web Pages,
Wikipedia, etc.</p>
      <p>DBpedia,
LODsyndesis
LODsyndesis
KG (based on
Semanticsaware indexes)
ChatGPT
DBpedia
(current version),
LODsyndesis
KG
LODsyndesis
KG</p>
      <p>Data
Volume
45TB
raw
data
Billions
of RDF
triples
2 billion
RDF
triples
45TB
raw
data
&gt;2
billion
RDF
triples
2 billion
RDF
triples</p>
      <sec id="sec-3-1">
        <title>4.1. The Core Services</title>
        <p>Below, we provide more details for each of the services and the underlying tools, which are also listed
in Table 1 and are depicted in Fig. 3. More detailed examples are given in §5.</p>
        <p>A. Question Answering (through Dialogue). This functionality is ofered through a dialogue
interface. In particular the user can type a question in natural language, which is then sent to ChatGPT
(current version used is v3.5) to retrieve the response. The key notion is that the resulted ChatGPT
response can be used by the services described below.</p>
        <p>
          B. Entity Recognition and Linking. Here, the input can be any plain text, e.g., the ChatGPT
response or any text from the web (in English), i.e., see Fig. 3. Afterwards, the user can exploit the
Entity Recognition and Linking Service of LODSyndesisIE [28]. In particular, the mentioned tool can
use any combination of three popular Entity Recognition tools, DBpedia Spotlight [29], WAT [30] and
Stanford CoreNLP [31] for identifying the entities of a given text and for providing a unique URI (or
link) for each entity to DBpedia [32] and LODsyndesis KG [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Afterwards, the entities of the text (e.g.,
of the ChatGPT response) are marked and the user can browse more information about each of them,
as it is explained below.
        </p>
        <p>
          B1. Entity Enrichment. The next step is to perform Entity Enrichment for the entities of the given
text. In particular, the user can click on any recognized entity to retrieve more information about it
from LODsyndesis [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], which is an Aggregated KG where the contents of 400 LOD datasets have been
aggregated, after having computed the transitive and symmetric closure of 45 million equivalence
relationships. For providing fast access to all the URIs, KGs and facts (with provenance) for each entity,
it ofers semantics-aware indexes and services for over 412 million entities and 2 billion triples. In this
way, the user can have access to (and export) all this information for the selected entities of interest.
        </p>
        <p>C. Triples (KG) Generation from text. The user can further process the text to create an RDF
representation for the facts of the text, i.e., by generating RDF triples from text. This is an important
step for further enabling the combination of an LLM response and KGs, i.e., for having the information
for both of them in the same data format. In GPToLODS+, this can be done by sending a request to
ChatGPT to convert the text into RDF triples, e.g., “give me the RDF N-triples using DBpedia format
for the text T". For aiding ChatGPT to provide valid URIs for the entities of the desired text, we can
optionally provide the list of URIs that were recognized through the Entity Recognition and Linking
process. The result is a set of RDF triples (or facts). The ultimate target is the RDF facts to be exploited
for performing real time fact validation with provenance and for ofering advanced connectivity
analytics for the generated KG, i.e., services C1 and C2.</p>
        <p>
          C1. Fact Validation Service. The user can select any of the generated facts for validation by
exploiting RDF KGs, either DBpedia [32] or LODsyndesis KG [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Afterwards, a specialized algorithm
(proposed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]) that uses both SPARQL queries and word embeddings from the
“all-MiniLM-L6v2" library2, is exploited for detecting the K most similar facts to the desired fact. Finally, the top-K
corresponding facts and their provenance are returned to the user along with a similarity score, for
enabling the validation of the given fact.
        </p>
        <p>The whole pipeline is depicted in Fig. 4. In particular, we can see that after generating the triples for
the facts of ChatGPT response (middle part of Fig. 4), we have three diferent facts for validation. The
algorithm uses three diferent rules. First, Rule A checks for finding exactly the same triple in the KG,
e.g., for the fact #1, we found the same triple in DBpedia. Second, if Rule A fails, then Rule B checks for
ifnding either the same subject-object or the same subject-predicate, e.g., for the fact #2, we have the
same subject object (Acropolis Museum and Athens) with a diferent predicate (location instead of
locationCity). Finally, if both rules fail, Rule C checks for the most similar RDF triples, e.g., see the
example for the fact with ID #3, where the algorithm found the most similar triples by computing the
cosine similarity of the embeddings.</p>
        <p>
          C2. Connectivity Analytics and Integration. By having generated the RDF KG for a given text,
the user can connect to LODChain [27]. This is a research prototype that enables the connection of
a new or an existing RDF KG to the 400 RDF KGs of LODsyndesis for ensuring its connectivity, for
ifxing possible connectivity errors, and for enriching its contents by discovering related datasets. In
particular, it computes at real time the transitive and symmetric closure of equivalence relationships
between the given KG and the 400 RDF KGs of LODsyndesis, for discovering new connections for the
KG. This functionality is ofered through a user interface with connectivity analytics, visualizations and
options to download the enriched data. As a result, the user can export an enriched version of the KG,
with URIs to existing RDF KGs, or/and complementary facts. The enriched version can be published to
the LOD Cloud, for aiding its discoverability and reusability from other publishers, and for ofering
advanced query services.
4.2. External CH Publishing Services (for a generated CIDOC-CRM based KG)
If the generated KG (or any KG) has been created using the CIDOC-CRM model, one option is to
also publish it to the CIDOC-CRM portal [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], which ofers several statistics and measurements for
CIDOC-CRM based KGs. This can be of primary importance for several tasks including, i) Dataset
Discovery and Selection, i.e., for enabling the discovery of the KG from interested users of the CH
community, ii) Data Integration, i.e., for integrate the KG with existing datasets that use common
CIDOC-CRM properties and classes for enriching their information, and iii) Ontology Evaluation, i.e,
for detecting possible problems more easily (e.g., using the ontology in a wrong way).
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>4.3. How GPToLODS+ can be accessed</title>
        <p>GPToLODS+ is available online3 and ofers the mentioned real time interactive services (see also a
tutorial video4). It runs on a server with 8 GB main memory, 8 cores and 64 GB disc space. Moreover, a
REST API is ofered for most of the services (except for C2), for making it feasible to exploit the services
programmatically, e.g., for integrating the ofered services into external services. The REST API is
available online5, where one can find guidelines of how to use each of the ofered services. Finally the
code is available in GitHub6.
2https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
3https://demos.isl.ics.forth.gr/GPToLODS
4https://youtu.be/cE57RqHbDt8
5https://demos.isl.ics.forth.gr/GPToLODS/GPToLODSplusREST
6https://github.com/mountanton/GPT-LODs_plus
5. The Use Cases using GPToLODS+ &amp; Connectivity Analytics
Here, we present how to perform the use cases of §2 through the services of GPToLODS+ (i.e., §4).
We provide scenarios with real examples and Kerameikos KG [33], which is a KG about Ceramics of
Ancient Greece, including connectivity analytics and statistics for the mentioned KG.
5.1. UC1: Browsing more Information about the Entities of Interest from KGs
As we can see in Fig. 5, the user asks GPToLODS+ for the location of “Kritios Boy Sculpture". The
answer is retrieved from ChatGPT (i.e., Acropolis museum in Athens) and then the entities of the text
are recognized and marked from the LODsyndesisIE Entity Recognition and Linking Service. Then,
the user selects to discover more information about the entities of the text. In that example (real data
from LODsyndesis are presented), the user discovered all the URIs for “Kritios Boy" in LODsyndesis (5
URIs in total), all the facts for the Acropolis Museum (321 facts from 10 KGs in total), and all the KGs
containing information about Athens (19 KGs in total).
5.2. UC2: Fact Validation (over the LLM response or a web text)
Fig. 6 shows a dialogue between a user and GPToLODS+. Αfter the Entity Recognition process, the
facts of the text can be converted to triples and then can be validated by using DBpedia or LODsyndesis
KG. As we can see, for the first question (location of the sculpture), we managed to confirm that it is
located in the Acropolis Museum. Although we found a slightly diferent RDF triple in DBpedia KG
(than the one produced by ChatGPT), they had a high similarity score. Afterwards, the user continued
the dialogue and asked about the number of objects exhibited in the museum, and ChatGPT returned
“approximately 4,000 objects". By performing the same steps, we found a more accurate answer in
LODsyndesis KG, i.e., "4,250+ objects". The dialogue can be continued with as many as questions the
user wants, and can return to any previous question for using all the mentioned services.
5.3. UC3: KG Generation, Connectivity Analytics and Data Integration
We suppose that the data owner has generated an RDF KG, by connecting diferent pieces of information,
e.g., a part (or the whole KG) can be generated by converting texts to an RDF KG through the GPToLODS+.
For representing real measurements, here we use a real KG from the CH domain, called Kerameikos KG
[33], which contains 289,596 triples about ceramic data of Ancient Greece. The data owner has created
mappings (i.e., links) with 6 external LOD KGs (see the upper side of Fig. 7), whereas the KG has been
modelled using the CIDOC-CRM standard.</p>
        <p>Afterwards, the data owner uses the LODChain service for connecting and integrating the KG with
more RDF KGs of the LOD Cloud, and as Fig. 7 shows, 43 more connections with RDF KGs were
discovered, i.e., the nodes with green color represent the new connections (KGs) for Kerameikos KG.
The data owner can export equivalent URIs, triples, complementary facts and others for publishing
a more enriched KG in the LOD Cloud (lower left side of Fig. 7). In this way, more advanced queries
could be expressed such as, “Give me the artist of a specific greek pottery, other museums including
artworks of that artist and a description for the artist in Greek language" (lower side of Fig. 7). The first
part (artist of the pottery) can be answered by Kerameikos KG, the second part (museums including
artworks of the artist) from Wikidata KG [34] and the third one (description in Greek language) from
DBpedia KG [32].</p>
        <p>Measurement
# of unique entities
# of common entities
KG with most common entities
# of connections before LODChain
# of inferred connections
# of connections after LODChain
# of connections with CH KGs before
LODChain
# of connections with CH KGs after
LODChain
# of complementary facts for
Kerameikos entities
KG with most complementary facts</p>
      </sec>
      <sec id="sec-3-3">
        <title>5.4. UC4: Data Publishing to CH portals</title>
        <p>
          By having generated the RDF KG and (optionally) published it to the LOD Cloud, one has also the
option to connect it to the CIDOC-CRM portal [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] where ontology statistics/visualizations are ofered
based on VoID vocabulary. For instance, see the lower right side of Fig. 7, and also some statistics about
Kerameikos KG derived from that portal in Table 3. Indicatively, we can see some dedicated statistics
about the desired KG and CIDOC-CRM model, such as the number of CIDOC-CRM properties and
classes, in how many RDF triples CIDOC-CRM properties and entities are used and which KGs share
the most CIDOC-CRM properties and classes with Kerameikos KG. More analytics (for the Kerameikos
and many other CH KGs) can be browsed in the website of the portal [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Concluding Remarks</title>
      <p>Since there is a high need to exploit and combine LLMs and KGs for aiding the user experience over
Cultural Heritage (CH) information, i.e., by providing advanced access services and analytics, we
presented a research prototype, called GPToLODS+, that tries to combine the advantages of both LLMs
and KGs for achieving that target. We presented the challenges and several use cases over the CH
domain and all the services of GPToLODS+, including Entity Recognition, Linking, and Enrichment,
Fact Validation, Connectivity Analytics and Data Integration and others. Afterwards, we provided real
examples for each of the use cases by using the mentioned services, including connectivity analytics for
a real KG from the CH domain, by also focusing on CIDOC-CRM standard. As regards the future work,
the plan is to extend the services for covering more applications that can combine KGs and LLMs, such
as a) converting natural questions to SPARQL queries over RDF KGs by exploiting LLMs, b) to provide
and evaluate (e.g., through a task based evaluation with users) the ofered services by using more LLMs,
such as LLaMA or diferent versions of ChatGPT, and c) to create digital storytelling applications of CH
data from RDF KGs by proposing and evaluating diferent LLM prompts.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has received funding from ECHOES, a project funded by the European Commission under
Grant Agreement n.101157364. Views and opinions expressed in this paper are those of the authors
only and do not necessarily reflect those of the European Union or the European Research Executive
Agency. Neither the European Union nor the granting authority can be held responsible for them.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
R. Opitz, E. Uleberg, Semantic modelling of archaeological excavation data. a review of the current
state of the art and a roadmap of activities, Internet Archaeology (2023).
[19] E. Ikkala, E. Hyvönen, H. Rantala, M. Koho, Sampo-UI: A full stack javascript framework for
developing semantic portal user interfaces, Semantic Web 13 (2022) 69–84.
[20] N. Gounakis, M. Mountantonakis, Y. Tzitzikas, Evaluating a radius-based pipeline for question
answering over cultural (CIDOC-CRM based) knowledge graphs, in: Proceedings of the 34th ACM
Conference on Hypertext and Social Media, 2023, pp. 1–10.
[21] O. Suissa, M. Zhitomirsky-Gefet, A. Elmalech, Question answering with deep neural networks for
semi-structured heterogeneous genealogical knowledge graphs, Semantic Web 14 (2023) 209–237.
[22] A. Ahola, E. Hyvönen, A. Kauppala, Publishing and studying historical opera and music theatre
performances on the semantic web: case operasampo 1830–1960, in: International Workshop on
Semantic Web and Ontology Design for Cultural Heritage, CEUR-WS. org, 2023, p. 12.
[23] G. Trichopoulos, G. Alexandridis, G. Caridakis, A survey on computational and emergent digital
storytelling, Heritage 6 (2023) 1227–1263.
[24] Y. Tzitzikas, M. Mountantonakis, P. Fafalios, Y. Marketakis, CIDOC-CRM and machine learning: a
survey and future research, Heritage 5 (2022) 1612–1636.
[25] A. S. Lippolis, A. Klironomos, D. F. Milon-Flores, H. Zheng, A. Jouglar, E. Norouzi, A. Hogan, et al.,</p>
      <p>Enhancing entity alignment between wikidata and artgraph using LLMs., in: SWODCH, 2023.
[26] N. Constantinides, A. Constantinides, D. Koukopoulos, C. Fidas, M. Belk, CulturAI: Exploring
mixed reality art exhibitions with large language models for personalized immersive
experiences, in: Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and
Personalization, 2024, pp. 102–105.
[27] M. Mountantonakis, Y. Tzitzikas, LODChain: Strengthen the connectivity of your RDF dataset to
the rest LOD Cloud, in: ISWC, Springer, 2022, pp. 537–555.
[28] M. Mountantonakis, Y. Tzitzikas, LodsyndesisIE: Entity extraction from text and enrichment using
hundreds of linked datasets, in: European Semantic Web Conference, Springer, 2020, pp. 168–174.
[29] P. N. Mendes, M. Jakob, A. García-Silva, C. Bizer, DBpedia spotlight: shedding light on the web of
documents, in: Proceedings of the 7th international conference on semantic systems, 2011, pp.
1–8.
[30] F. Piccinno, P. Ferragina, From TagME to WAT: a new entity annotator, in: Proceedings of the
ifrst international workshop on Entity recognition &amp; disambiguation, 2014, pp. 55–62.
[31] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, D. McClosky, The stanford coreNLP
natural language processing toolkit, in: Proceedings of 52nd annual meeting of the association for
computational linguistics: system demonstrations, 2014, pp. 55–60.
[32] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey,
P. Van Kleef, S. Auer, et al., DBpedia–a large-scale, multilingual knowledge base extracted from
wikipedia, Semantic web 6 (2015) 167–195.
[33] Kerameikos.org, http://kerameikos.org/, 2021. Accessed: March 2024.
[34] D. Vrandečić, M. Krötzsch, Wikidata: a free collaborative knowledgebase, Communications of the
ACM 57 (2014) 78–85.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Rachabatuni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Principi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mazzanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bertini</surname>
          </string-name>
          ,
          <article-title>Context-aware chatbot using MLLMs for cultural heritage</article-title>
          ,
          <source>in: Proceedings of the 15th ACM Multimedia Systems Conference</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>459</fpage>
          -
          <lpage>463</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Trichopoulos</surname>
          </string-name>
          ,
          <article-title>Large language models for cultural heritage</article-title>
          ,
          <source>in: Proceedings of the 2nd International Conference of the ACM Greek SIGCHI Chapter</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mountantonakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <article-title>Real-time validation of ChatGPT facts using RDF knowledge graphs</article-title>
          ., in: ISWC (Posters/Demos/Industry),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Harnessing the power of LLMs in practice: A survey on chatgpt and beyond</article-title>
          ,
          <source>ACM Transactions on Knowledge Discovery from Data</source>
          <volume>18</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Casillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Colace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lorusso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santaniello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Valentino</surname>
          </string-name>
          ,
          <article-title>The role of AI in improving interaction with cultural heritage: An overview, Handbook of Research on AI and ML for Intelligent Machines and Systems (</article-title>
          <year>2024</year>
          )
          <fpage>107</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Melo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <source>ACM Computing Surveys (Csur) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] LOD cloud</article-title>
          , https://lod-cloud.net/,
          <source>2024 (accessed July 7</source>
          ,
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mountantonakis</surname>
          </string-name>
          , I. Theocharakis,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <article-title>Why we need ontology-specific data portals: A case study for CIDOC-CRM.</article-title>
          , in: SWODCH,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Doerr</surname>
          </string-name>
          ,
          <article-title>The CIDOC conceptual reference module: an ontological approach to semantic interoperability of metadata</article-title>
          ,
          <source>AI</source>
          magazine
          <volume>24</volume>
          (
          <year>2003</year>
          )
          <fpage>75</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W. W. W.</given-names>
            <surname>Consortium</surname>
          </string-name>
          , et al.,
          <source>Sparql</source>
          <volume>1</volume>
          .
          <article-title>1 overview (</article-title>
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mountantonakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <article-title>Large-scale semantic integration of linked data: A survey, ACM Computing Surveys (CSUR) 52 (</article-title>
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] OpenAI, ChatGPT, https://openai.com/,
          <year>2021</year>
          . Accessed: March
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mountantonakis</surname>
          </string-name>
          , Y. Tzitzikas,
          <source>LODsyndesis: global scale knowledge services, Heritage</source>
          <volume>1</volume>
          (
          <year>2018</year>
          )
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Monaco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Pellegrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Scarano</surname>
          </string-name>
          , L. Vicidomini,
          <article-title>Linked open data in authoring virtual exhibitions</article-title>
          ,
          <source>Journal of Cultural Heritage</source>
          <volume>53</volume>
          (
          <year>2022</year>
          )
          <fpage>127</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Goodlander, Publishing the data of the smithsonian american art museum to the linked data cloud</article-title>
          ,
          <source>International Journal of Humanities and Arts Computing</source>
          <volume>8</volume>
          (
          <year>2014</year>
          )
          <fpage>152</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Metilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bartalesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meghini</surname>
          </string-name>
          , et al.,
          <article-title>Steps towards a system to extract formal narratives from text</article-title>
          .,
          <source>in: Text2Story@ ECIR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Teixeira Lopes, Archonto, a CIDOC-CRM-based linked data model for the portuguese archives</article-title>
          ,
          <source>in: International Conference on Theory and Practice of Digital Libraries</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Katsianis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bruseker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nenova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Marlet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hivert</surname>
          </string-name>
          , G. Hiebel,
          <string-name>
            <given-names>C.-E. S.</given-names>
            <surname>Ore</surname>
          </string-name>
          , P. Derudas,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>