<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Facilitating Search of the Virtual Record Treasury of Ireland Knowledge Graph using ChatGPT ⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alex Randles</string-name>
          <email>alex.randles@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucy McKenna</string-name>
          <email>lucy.mckenna@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lynn Kilgallon</string-name>
          <email>kilgall@tcd.ie</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Beyza Yaman</string-name>
          <email>beyza.yaman@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Crooks</string-name>
          <email>pcrooks@tcd.ie</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Declan O'Sullivan</string-name>
          <email>declan.osullivan@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre for Digital Content, Trinity College Dublin</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of History, Trinity College Dublin</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Virtual Record Treasury of Ireland (VRTI) is an initiative to digitally recreate the contents of the Irish central archive which was destroyed during the Civil War. The project has created a Knowledge Graph (KG) to facilitate information discovery and reasoning over the recovered items. However, complex queries must be created to retrieve data in the KG, which require a high level of technical expertise. In this paper, we explore the application of Large Language Models (LLMs) to facilitate searching of the VRTI-KG by users who lack this technical expertise and to decrease workload for those who do not. The VRTI-ChatGPT framework is proposed which uses ChatGPT to interpret requests from users and to facilitate the creation of queries which can be executed on the KG.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;KG Search</kwd>
        <kwd>User Interface</kwd>
        <kwd>ChatGPT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Virtual Record Treasury of Ireland (VRTI) [1–3] is a state-funded programme hosted at
Trinity College Dublin. The VRTI began with the objective of digitally reconstructing
archival records destroyed during the 1922 Irish Civil war [2]. A fire during the war
destroyed the Public Record Office of Ireland, which damaged records dating back more
than 700 years. The staff at the time spent months to recover documents which were
recreated a century later. The initial VRTI Knowledge Graph (KG) was created as a result of
the lead
        <xref ref-type="bibr" rid="ref3">project named Beyond 2022</xref>
        . The VRTI-KG contains notable information about
people, places, roles, organisations and their interconnections from Irish history.
Representing this information using a KG allows the integration of heterogenous
source data formats and supports reasoning and inference of the data and applied to a range
of scenarios already from event based [4] networks to more recently climate action related
applications [5]. The KG was implemented using RDF, which means data must be retrieved
using SPARQL [6] queries. Creating these queries is time-consuming and requires an
understanding of the SPARQL query language and structure of relevant schemas. Many of
the historians who would interact with the VRTI-KG do not possess the technical expertise
to create these complex queries. Large Language Models (LLMs) such as ChatGPT [7]
provide functionality which could allow the data in the VRTI-KG to be easily searched and
the results presented using natural language. It was decided to use ChatGPT in the proposed
approach as it provided the best results in early experimentation. With the emergence of
generative AI, we are interested in exploring what benefits it can have for the VRTI-KG
system [1]. However, it is important to ensure that the proposed application of generative
AI to the VRTI-KG is constrained to information only contained in the VRTI and does not
pollute responses with external information on the requested topic. In order to explore how
generative AI could be applied, we propose the VRTI-ChatGPT framework which was
designed to facilitate searching of the VRTI-KG through natural language questions and
answers. The framework uses strict prompt templates to interact with ChatGPT in order to
process the users input and form sentences from KG query results. A recent survey [8] has
highlighted the importance of providing straightforward interaction between semantic
interfaces and respective domain experts. The survey compared 28 interfaces based on
interaction paradigm, information being displayed, and strategies used to improve the
understanding of information. The survey concluded that many of these approaches still
require some level of technical expertise to be used effectively, which some domain experts
may lack. It is hoped natural language interaction can bridge the gap between domain
experts (historians) and diverse data held in VRTI-KG. An existing tool designed for KG
natural language querying by Ontotext1 was experimented with before deciding to create a
bespoke solution. The tool uses LLMs to create SPARQL queries from a provided ontology
and natural language question. The endpoint of the VRTI-KG and ontology were provided
to the tool, however, it was observed that it struggled to create syntactically correct queries
for most test cases. The incorrect queries could be a result of the complex structure of the
VRTI ontology. Using an approach involving query templates ensures that the query created
from the natural language is syntactically correct and retrieves all of the required
information to provide a sufficient response. The query templates used in the framework
are configurable which is hoped to allow the approach to be customised for other KGs in
future. Early observations from the historians in the VRTI has been positive when the
involved prompts are strictly constrained so that ChatGPT does not make inference on the
provided information.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. VRTI-ChatGPT framework</title>
      <p>This section discusses the design and implementation of the VRTI-ChatGPT framework. The
implementation of the framework is available online2. The framework is configurable to
1 https://www.ontotext.com/blog/natural-language-querying-of-graphdb-in-langchain/
2 https://vrti-graph.adaptcentre.ie/gpt-search
allow changes in the VRTI-KG to be easily synchronised with the involved prompts and
queries. Figure 1 presents an overview of the activities of the framework.
Several Python libraries3 were used to implement the framework. Flask is a customizable
web framework which was used to create the web application. SPARQLWrapper is used to
execute queries on the endpoint of the VRTI-KG. The Open-AI library is used to
communicate with ChatGPT 4.04. Python string formatting are used to create prompts and
queries from the templates. Figure 2 presents search results displayed on the
implementation.
3 https://github.com/alex-randles/VRTI-ChatGPT/blob/main/libraries.pdf
4 https://openai.com/index/gpt-4/</p>
      <p>Initial Processing of Users Input. First, the user enters a question into the search bar
(A –Figure 2) or selects a suggested question. For instance, a question could ask about a
specific (&lt;person&gt;) such as “Tell me about &lt;person&gt;?” “, “Where and when was &lt;person&gt;
born”, “Was &lt;person&gt; in the army?” or “What job did &lt;person&gt; have?”. For the running
example, the user inputs “Tell me about Michael Collins”. Michael Collins5 is a notable Irish
person who was involved in the Irish civil war. The question is inserted in a prompt
template which extracts the name of people and places from the user’s input. The generated
prompt is “Extract the names of people and places in this text ‘Tell me about Michael Collins’
and output the result into a JSON dictionary”. Then, the prompt is fed into ChatGPT 4.0 using
a request carried out by the OpenAI library. The result is a dictionary containing key-value
pairs of names of people and places which is stored in memory.</p>
      <p>Creation of SPARQL query. The extracted entity ("Michael Collins”) is inserted into a
SPARQL [6] query template6 defined in the configuration file. The insertion involves
translating the key-value pairs from the created JSON dictionary into FILTER conditions
(FILTER CONTAINS(?Name, "Michael Collins”)) using string formatting methods. The query
retrieves resources with a matching name along with their related properties, such as birth
date and place. Thereafter, the query is executed on the VRTI-KG using the SPARQLWrapper
library to retrieve matching resources. The query results are represented in dictionary
format.</p>
      <p>Creation of Natural Language Response. The dictionary containing the query result is
inserted into a prompt template to generate the natural language answer. For this example,
the generated prompt is “Answer this question ‘Tell me about Michael Collins’ using only
the information in this dictionary ‘{Person: &lt;….&gt;, Occupation: &lt;…&gt;, BirthDate: “…”}’. Do not
include any external information in the answer.”). The prompt template is designed to
constrain ChatGPT to use only the information from the query results from the VRTI-KG
rather than external information it has on the topic. The response (B – Figure 2) from
ChatGPT is then displayed on the interface. In addition, the URI of each resource returned
from the query are presented in a tabular format (C – Figure 2), which allows further
exploration with the application.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Future Work and Conclusion</title>
      <p>Future work includes usability testing of the framework with a cohort of historians. The
testing will allow the user requirements to be refined and validated. The testing will involve
the participants interacting with the framework to complete several tasks which mimic the
expected user interaction. In addition, it is hoped to configure the framework to answer
questions from information stored in other KGs.</p>
      <p>The VRTI-ChatGPT framework proposed in this paper provides possible direction for the
integration of generative AI, such as LLMs in the VRTI-KG system [1]. It is hoped the
proposed approach can facilitate searching by users who lack relevant technical expertise.
Thus, reducing workload and improving the uptake of information by domain experts.
5 https://kb.virtualtreasury.ie/person/Collins_Michael_c20_dib_a1860
6 https://github.com/alex-randles/VRTI-ChatGPT/blob/main/sample-query.rq
Finally, the prompts used by the framework are hoped to provide guidance for researchers
who propose similar approaches.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>Virtual Record Treasury of Ireland (VRTI) is funded by the Government of Ireland, through
the Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media, under the Project
Ireland 2040 framework. The project is also partially supported by the ADAPT Centre for
Digital Content Technology under the SFI Research Centres Programme (Grant
13/RC/2106_P2).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Yaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>McKenna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Randles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kilgallon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Crooks</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>O'Sullivan, Digital Prosopography Information in Virtual Record Treasury of Ireland Knowledge Graph</article-title>
          ,
          <source>in: Proceedings of the 1st International Workshop of Semantic Digital Humanities (SemDH) Co-Located with the 21st Extended Semantic Web Conference</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3724</volume>
          /paper2.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Crooks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <article-title>The Virtual Record Treasury of Ireland: A century of Recovery from the 1922 Four Courts Blaze -</article-title>
          and
          <string-name>
            <surname>Beyond</surname>
          </string-name>
          ,
          <source>Hist Irel</source>
          <volume>30</volume>
          (
          <year>2022</year>
          )
          <fpage>38</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Munnelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kilgallon</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. O'Sullivan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Crooks</surname>
          </string-name>
          ,
          <article-title>Creating a Knowledge Graph for Ireland's Lost History: Knowledge Engineering and Curation in the Beyond 2022 Project</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Comput</surname>
          </string-name>
          . Cult. Herit.
          <volume>15</volume>
          (
          <year>2022</year>
          ). https://doi.org/10.1145/3474829.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Keeney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roblek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>O'Sullivan, Extending Siena to support more expressive and flexible subscriptions</article-title>
          ,
          <source>in: Proceedings of the Second International Conference on Distributed Event-Based Systems</source>
          , Association for Computing Machinery, New York, NY, USA,
          <year>2008</year>
          : pp.
          <fpage>35</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>https://doi.org/10.1145/1385989.1385995.</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>J. Wu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Orlandi</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. O'Sullivan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Dev</surname>
          </string-name>
          ,
          <article-title>An ontology model for climatic data analysis</article-title>
          ,
          <source>in: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS</source>
          ,
          <year>2021</year>
          : pp.
          <fpage>5739</fpage>
          -
          <lpage>5742</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Prud'hommeaux, SPARQL 1.1 Query Language</article-title>
          ,
          <source>World Wide Web Consortium (W3C) Recommendation</source>
          <volume>21</volume>
          (
          <year>2013</year>
          )
          <fpage>778</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          https://www.w3.org/TR/sparql11-query
          <source>/ (accessed April 1</source>
          ,
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Q.-L. Han,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>A brief overview of ChatGPT: The history, status quo and potential future development</article-title>
          ,
          <source>IEEE/CAA Journal of Automatica Sinica</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <fpage>1122</fpage>
          -
          <lpage>1136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Bernasconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Miguel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mecella</surname>
          </string-name>
          ,
          <article-title>Linked Data interfaces: a survey</article-title>
          ,
          <source>in: 19th Conference on Information and Research Science Connecting to Digital and Library Science</source>
          ,
          <year>2023</year>
          : pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>