<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Rail Vehicle Information in Europe</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohammed H. Rasheed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marina Aguado</string-name>
          <email>marina.aguado@ehu.eus</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Large Language Models, KGQA, SPARQL, RDF, Prompt Engineering</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of the Basque Country</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>The Interoperable Europe Act came into efect in April this year. Semantic interoperability as a key element can ensure cross-border interoperability between various public services such as rail and transport services. This demo introduces a chatbot engine that employs Large Language Model (LLM) to facilitate human interaction with domain-specific Knowledge Graphs (KG) governed by the European Union Agency for Railways. The chatbot engine facilitates domain-specific SPARQL query generation based on natural language query, thereby providing an intuitive interface for non-expert users to retrieve in-domain knowledge. In contrast, our chatbot automated query generation allows domain experts to directly query triple data stores more intuitively and without intermediaries, contributing to improving the quality of hosted information. The chatbot engine uses a zero-shot prompting approach by taking the user's natural language query as an input and translates into SPARQL query to retrieve factual knowledge from the target KG. To improve the quality of the generated SPARQL query and thus improve the relevancy of results corresponding to the user question, the LLM is supported with in-domain knowledge by injecting extracted KG vocabulary information along with the user natural language question as an input. The experiments conducted on twenty in-domain competency questions revealed that leveraging LLM is a promising approach and can be eficiently oriented to be utilized in domain-specific KGs to increase productivity, reduce query construction time and increase usability by allowing non-technical users to obtain knowledge intuitively. Furthermore, the chatbot ofers a very fine feature inherited from LLMs, which is the ability to answer multilingual queries, allowing it's utilization among a wide range of users regardless of any language boundaries which ultimately contributes to the cross-border interoperability of public services in countries using diferent languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Large language models (LLMs) are experiencing notable advances in their performance and
functionalities; as a result, LLMs have been integrated and applied in various fields to provide
both mainstream and downstream tasks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The proven ability of LLM to process natural
language has opened the door for integration into many applications, specifically text-based
applications, including Knowledge Graphs Question Answering (KGQA) through the translation
of natural language (NL) questions into SPARQL queries [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][5]. This demo paper and from a
general perspective is part of an ongoing experiments to examine and evaluate the capabilities
of LLM to translate NL questions to SPARQL queries over domain-specific KG, specifically the
European Union Railway Agency (ERA) Knowledge Graph (KG) motivated by the fact that there
are limited attempts to target such domain-specific KGs.
      </p>
      <p>
        Constructing and composing SPARQL queries that can interrogate and retrieve responses
from domain-specific KGs is a challenging task on the technical and temporal levels [ 6][7]. It
requires extensive experience and proficiency in KG querying languages such as SPARQL, as
well as a deep understanding of the architecture and vocabulary of the target KG [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][8][9]. To
bridge the gap between human-KG interaction, LLM can be utilized as an interface. Such an
interface can first simplify the process of querying general and domain-specific KGs, allowing
users without technical proficiency in KG’s querying languages such as SPARQL to interact
and retrieve information based on NL as input. Furthermore, composing SPARQL queries using
NL can increase productivity, reduce complexity, and reduce the construction time of manual
queries[8][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Lastly, LLMs can provide an intuitive and user-friendly environment that allows
interaction with various KG complexity.
      </p>
      <p>Based on the above, our chatbot engine provides an LLM-based interface engine that aims to
facilitate information retrieval from domain-specific KGs. Our approach proposes an unadorned
fashion approach to perform a zero-shot SPARQL query generation by augmenting LLMs with
previously extracted KG vocabulary information. Despite its great impact on the quality of
responses, the chatbot does not utilize any natural language processing (NLP) techniques over
the input text, therefore concentrating solely on the capability of LLM to generate a SPARQL
query based on NL user input augmented with extracted KG vocabulary information.</p>
      <p>The rest of the paper is structured as follows: In Section 2 we present related work. Section
3 introduces the design and architecture of the chatbot engine. Section 4 demonstrates the
chatbot interface and its operations. Performance and evaluation of our proposed engine are
presented in Section 5. Finally, Section 6 summarizes the contribution of this demo and suggests
insights for future work.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Literature Review</title>
      <p>A considerable amount of studies have explored the translation of NL questions to SPARQL
queries with and without LLMs. However, the expressiveness and ambiguity of natural language
led to the emergence of various non-deterministic approaches and methods to perform the
translation task. With recent advances in LLMs, many researches have leveraged the use of LLMs
to support the translation process at various locations within the translation process pipeline.
[10] designed a one-shot prompt-based template to instruct LLM to generate SPARQL from NL
based on similar labeled examples extracted based on three publicly available datasets. Another
promising approach with high leaderboard scores is proposed by [11] which focuses on
finetuning LLMs using ground truth datasets to generate the SPARQL-based NL question. [12][13]
utilized semantic and syntactic analysis of the NL question to identify entities and relations
in the input text, then perform a URI lookup against the KG to retrieve the corresponding KG
entities and relations to generate a SPARQL representation of the NL question. Another LLM
based approach is proposed by [14], proposing an approach that uses few shot LLM prompts
with top-n labeled examples that has a context similar to the NL question to retrieve scholarly
information from SciQA benchmark. Their approach is only efective when the top-n labeled
examples match the input question. [5] explored the potential of using ChatGPT to support
related KG tasks, among these tasks is the generation of SPARQL from NL. The experiment was
based on a self-created small KG and was therefore very limited. [7] proposed using controlled
natural language as a middle step to answer the NL question over KGs using semantic parsing
via LLMs, addressing the unambiguous translation of controlled natural languages into SPARQL
queries. In conclusion, leveraging LLMs to translate natural language into SPARQL queries has
shown promising results in various domains, including semantic analysis, KGQA, and syntactic
formulation. However, most of the approaches targeting public KGs supported by already
available ground truth competency questions or ground truth datasets to train or fine-tune
LLMs in multi-shot bases, whereas our approach uses only domain-specific vocabulary and KG
information in a zero-shot prompt to support the translation of NL to SPARQL with minimal
token sizes.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Design and Architecture</title>
      <p>Our proposed test bed is used to explore the ability of LLM to generate SPARQL queries and
retrieve information from a domain-specific KG using NL questions. As illustrated in Figure 1,
our method involves generating an enhanced prompt augmented with in-domain KG ontology
vocabulary information extracted from the Register of Infrastructure (RINF1) ontology KG. The
generated prompt is then fed to the LLM in the form of a zero-shot prompt[15][8] to translate
the NL question into a SPARQL query, which is used to obtain the answer from the ERA SPARQL
endpoint. Our test bed engine uses the Gemini API (gemini-pro) from Google2. To preserve
the randomness, diversity, and consistency of the responses, our test bed kept the temperature
constant (value: 4) throughout all tests. The SPARQL query generated by the LLM is then
executed against the ERA SPARQL endpoint to retrieve the responses back to the user.</p>
      <p>The chatbot engine demo is implemented using Streamlit framework3 allowing users to easily
interact and explore the RINF KG information using the Gemini API.</p>
      <sec id="sec-4-1">
        <title>1https://data-interop.era.europa.eu/era-vocabulary/ 2https://gemini.google.com/app 3https://streamlit.io/</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Chatbot Engine Demonstration</title>
      <p>The chatbot engine provides an intuitive and user-friendly interface and can be accessed through
this link 4. The chatbot allows the user to input their own question in natural language that
targets the ERA RINF Ontology KG as demonstrated in Figure 2. Alternatively, if the user is
unfamiliar with the domain in question, the chatbot provides a drop-down list of suggested
sample questions of varying complexity that can be used to test the tool.</p>
      <p>After feeding the requested question into the input area and selecting Generate SPARQL
Query, the tool and behind the seen will generate a zero-shot prompt, fed to Gemini API to
translate the input text from unstructured NL form to a structured SPARQL query format,
ultimately querying the ERA SPARQL endpoint to retrieve responses as shown in Figure 2. It is
important to note that LLM inherently may not provide a consistent answer on every run; if
the query fails initially, trying again may yield an answer.</p>
      <p>Ultimately, the tool keeps track and saves all questions with the corresponding responses
generated by the chatbot. The recorded information will be used to improve and fine-tune the
chatbot engine for future advancements.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Chatbot Engine Performance</title>
      <p>Our chatbot engine demo demonstrated promising performance in translating competency
questions of varying complexity. Although some questions are straightforward and have their</p>
      <sec id="sec-6-1">
        <title>4https://llm-chatbot-translator.streamlit.app/</title>
        <p>answers available explicitly, others are complex and their answers are available implicitly and
require visiting multiple triple paths to aggregate answers from KG. For example, the question:
What is the longest section of line? suggests that the data in the target KG contain multiple
instances of entities labeled as ”section of line” with associated lengths. This type of question
is classified as a complex question, since its corresponding answer is not explicitly available
in the KG. It requires visiting more than triple paths to retrieve the required information,
performing comparisons across multiple data paths, and aggregation to determine the correct
answer as shown in Figure 3. Based on aforementioned, answering such complex questions by
humans requires a clear and thorough understanding of the KG schema, structure of entities and
associated data, length value representation, and vocabulary used to express related information.
Therefore, leveraging LLM to perform all these operations can support steps towards simplifying
interaction with KG-based systems and increase user productivity and usability.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion, Envisaged Benefit and Future Development</title>
      <p>In this chatbot engine demonstration, we have showcased that utilizing LLMs to perform KGQA
by translating NL to SPARQL query to retrieve information from domain-specific KG is promising
and efective. Undoubtedly, leveraging LLM in KGQA systems can increase productivity, reduce
query construction time, and usability by allowing non-technical users to obtain answers from
any KG. As a next step, this chatbot can be transformed into a reusable component to facilitate
the query of domain-specific triple stores. It will also allow an automatic benchmarking of
diferent LLMs, since tests can be easily automated and tested against diferent LLM models.
Another feature is to capture domain-specific competency questions for further analysis. We
also expect to identify and find key performance indicators to define the LLM requirements
to serve adequately as an end-user interface. In addition to the aforementioned functionality
and features, the chatbot ofers a very fine feature inherited from LLMs, which is the ability to
answer multilingual queries, allowing a wide range of users to utilize regardless of any language
boundaries. In other words, the core ontology does not need to be multilingual in order to allow
multilingual queries, as LLM will take this translation burden.
D. Roth, Recent advances in natural language processing via large pre-trained language
models: A survey, ACM Computing Surveys 56 (2023) 1–40.
[5] L. Meyer, C. Stadler, J. Frey, N. Radtke, K. Junghanns, R. Meissner, G. Dziwis, K. Bulert,
M. Martin, Llm-assisted knowledge graph engineering: experiments with chatgpt (2023),
in: conference proceedings of AI-Tomorrow-23, volume 29, 2023, pp. 6–2023.
[6] J. Liu, B. Mozafari, Query rewriting via large language models, arXiv preprint
arXiv:2403.09060 (2024).
[7] J. Lehmann, P. Gattogi, D. Bhandiwad, S. Ferré, S. Vahdati, Language models as controlled
natural language semantic parsers for knowledge graph question answering, in: European
Conference on Artificial Intelligence (ECAI), volume 372, IOS Press, 2023, pp. 1348–1356.
[8] T. Abedissa Tafa, R. Usbeck, Leveraging llms in scholarly knowledge graph question
answering, arXiv e-prints (2023) arXiv–2311.
[9] M. Yani, A. A. Krisnadhi, Challenges, techniques, and trends of simple knowledge graph
question answering: A survey, Information 12 (2021). URL: https://www.mdpi.com/
2078-2489/12/7/271. doi:10.3390/info12070271.
[10] L. Kovriguina, R. Teucher, D. Radyush, D. Mouromtsev, Sparqlgen: One-shot prompt-based
approach for sparql query generation (2023).
[11] M. R. A. H. Rony, U. Kumar, R. Teucher, L. Kovriguina, J. Lehmann, Sgpt: a generative
approach for sparql query generation from natural language questions, IEEE Access 10
(2022) 70712–70723.
[12] J.-D. Kim, K. B. Cohen, Natural language query processing for sparql generation: A
prototype system for snomed ct, in: Proceedings of biolink, volume 32, 2013, p. 38.
[13] N. Mihindukulasooriya, G. Rossiello, P. Kapanipathi, I. Abdelaziz, S. Ravishankar, M. Yu,
A. Gliozzo, S. Roukos, A. Gray, Leveraging semantic parsing for relation linking over
knowledge bases, in: International Semantic Web Conference, Springer, 2020, pp. 402–419.
[14] T. A. Tafa, R. Usbeck, Leveraging llms in scholarly knowledge graph question answering,
arXiv preprint arXiv:2311.09841 (2023).
[15] S. Schulhof, M. Ilie, N. Balepur, K. Kahadze, A. Liu, C. Si, Y. Li, A. Gupta, H. Han, S.
Schulhof, et al., The prompt report: A systematic survey of prompting techniques, arXiv
preprint arXiv:2406.06608 (2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <article-title>Explainability for large language models: A survey</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al.,
          <article-title>A survey on evaluation of large language models</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Myers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mohawesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. I.</given-names>
            <surname>Chellaboina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Sathvik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Venkatesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Henshaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alhawawreh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Berdik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jararweh</surname>
          </string-name>
          ,
          <article-title>Foundation and large language models: fundamentals, challenges, opportunities, and social impacts</article-title>
          ,
          <source>Cluster Computing</source>
          <volume>27</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P. B.</given-names>
            <surname>Veyseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sainz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Agirre</surname>
          </string-name>
          , I. Heintz,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>