<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Toward Exploring Knowledge Graphs with LLMs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guangyuan Piao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mike Mountantonakis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Panagiotis Papadakos</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pournima Sonawane</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidan OMahony</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dell Technologies</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>W3C</institution>
          ,
          <addr-line>ERCIM</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Interacting with knowledge graphs (KGs) is challenging for non-technical users with information needs who are unfamiliar with KG-specific query languages such as SPARQL and the underlying KG schema. Previous KG question answering systems require ground-truth pairs of questions and queries or fine tuning (Large) Language Models (LLMs) for a specific KG, which is time-consuming and demands deep expertise. In this poster, we present a framework for exploring KGs for question answering using LLMs in a zero-shot setting for non-technical end users, without the need for ground-truth pairs of questions and queries or fine-tuning LLMs. Additionally, we evaluate an example implementation in a simple yet challenging setting using LLMs exclusively based on the framework, without the extra efort of maintaining the embeddings or indexes of entities from KG for retrieving relevant ones to a given question. We share preliminary experimental results indicating that exploring a KG using LLM-generated SPARQL queries with reasonable complexity is possible in such a challenging setting.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Framework of Question Answering with a KG using LLMs</title>
      <p>
        Figure 1 illustrates our framework for question answering with a knowledge graph using
LLMs. The top three components depict a straightforward pipeline. Specifically, the top-left
component indicates the input to an LLM. In addition to a question, a user can provide any
user-input context, such as a description about the KG. The extracted context refers to any context
automatically retrieved from the system. This can include, but is not limited to, the KG schema
or a subset of triples in the KG that might be relevant to the question. The LLM component
refers to any LLMs, such as Code Llama [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], used for KG-specific query generation for answering
the question. The output component executes the provided query to obtain the answer. As
a special case, the LLM can also generate the output or asnwer directly based on the given
question and context derived from the KG without generating any queries as in KG-RAG [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In addition to these three basic components, the framework includes a set of optional
components indicated by dotted boxes. The context extractor aims to automatically extract any
useful context for answering the question. For example, it can extract a set of predicates or class
types that are relevant to the question. The context parser and enhancer process the output
of the extractor and enhance it by validating, updating or pruning as necessary. For example,
they can check whether the extracted predicates actually exist in the KG schema or revisit the
context extractor if necessary. KG-RAG [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] implements a context extractor based on embedding
similarities between the question sentence or a set of extracted entities and precomputed entity
embeddings with a small language model to retrieve relevant entities from the KG. A set of
triples associated with each entity is retrieved, and then parsed and pruned based on their
relevance to the given question. Auto-KGQA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] implements the retrieval of relevant entities
by building and maintaining KG resource indexes for text- or embedding-based approaches.
For each entity, all its triples are retrieved along with their neighbors up to a predefined depth.
These triples are then parsed and pruned to construct a sub-graph containing the most relevant
triples to the question. The query parser and enhancer parse the LLM output, extract the
query, and refine the query if necessary. For instance, they can regenerate the query in cases
where it is not executable or returns no results. Auto-KGQA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] prompts a LLM to generate
several SPARQL queries and parse the results, and then let the LLM choose the best one.
      </p>
      <p>In the framework, edges highlighted in blue and orange indicate repeatable loops. For example,
the loop of 5 6 7 8 can be repeated multiple times, with each iteration providing a query
as extracted context. The query extracted from the previous loop can be used in the next query
generation process to enhance it. The framework also allows for the extraction of diferent
types of context. That is, one can have several sets of context extractors, parsers, and enhancers
(shaded boxes in Figure 1). For instance, one set can be used for extracting the KG schema,
while another can be used for extracting a set of relevant triples to the question.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Example Implementation and Experimental Results</title>
      <p>
        Here, we present an example implementation1 built upon the proposed framework with all
components, using Code Llama Instruct 7B [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as our LLM. Specifically, we are interested in
exploring an implementation that relies solely on LLMs, i.e., without maintaining indexes or
embeddings of entities in the KG for retrieving relevant entities, which would otherwise require
extra efort and expertise [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. To this end, we consider questions with extracted context as
input to the LLM. We use two types of context extractors. The first extractor extracts class
types and predicates from the KG schema as context. The second one prompts the same LLM
to infer the top-k class types and predicates relevant to the question. Next, the context parser
and enhancer check the extracted class types and predicates, and rerun the context extractor if
non-existent class types or predicates are detected. Based on these extracted class types and
predicates, we can retrieve p triple(s) for each class type and predicate, which are provided as
context triples (=5 and =1 in our experiments). Afterwards, the LLM generates the output for
the given question and extracted context. Subsequently, the query parser and enhancer parse
the output to obtain the query and enhance it if necessary. Again, we prompt the same LLM to
check the generated query and add, remove, or modify it if necessary. Steps 5 6 7 8 can be
repeated multiple times before finalizing the SPARQL query as our output. Figure 2 illustrates
an example workflow of generating the final SPARQL query for Q14 in Listing 1.
      </p>
      <p>
        Experimental settings. We use a custom Bestiary KG [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which contains diverse
information about over 4,000 creatures from a fantasy role-playing game, comprising 98,070 triples.
However, upon careful investigation of each question in the dataset, we noticed that the majority
of 100 questions from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] require high-complexity SPARQL queries. In our challenging setting,
using those questions for evaluation may not be feasible (as the majority cannot be answered),
and could potentially hinder the exploration of new directions by beginning with those queries.
After careful investigation of those questions, we empirically chose eight questions that are
possible to answer in our setting but also have varying complexities. Listing 1 shows the set
of questions and the corresponding generated SPARQL queries for our discussion. It shows
1The source code, prompt templates, and examples are available at https://github.com/parklize/LLM4SPARQL.
      </p>
      <p>Listing 1: Eight example questions and their corresponding generated SPARQL queries.
Q0: which creatures not speaking draconic language do have chaotic good alignment?
SELECT ?creature WHERE {?creature a :Beast. ?creature :hasAlignment :chaoticGood.</p>
      <p>FILTER NOT EXISTS {?creature :hasLanguages :DraconicL.}}
Q9: what creatures do have cold resist?
SELECT ?creature WHERE {?creature :hasResists :cold.}
Q13: what creatures do speak both common and undercommon languages?
SELECT ?creature WHERE {?creature :hasLanguages :CommonL. ?creature :hasLanguages :UndercommonL.}
Q14: how many creatures do speak all three languages: abyssal, sylvan and elven?
SELECT (COUNT(?creature) AS ?count) WHERE {?creature :hasLanguages :AbyssalL.
?creature :hasLanguages :SylvanL. ?creature :hasLanguages :ElvenL.}
Q58: what creatures speaking dwarven language do have armor class greater than 12?
SELECT ?creature WHERE {?creature a :Beast. ?creature :hasLanguages :DwarvenL. ?creature :hasACValue ?ac. FILTER( ?ac &gt; 12)}
Q64: what is the average number of health points for creatures speaking gnome language?
SELECT (AVG(?hp) AS ?hp_AVG) WHERE {?creature rdf:type :Beast. ?creature :hasLanguages :GnomeL. ?creature :hasHPvalue ?hp.}
Q83: which creatures speaking necril and abyssal languages do have wisdom attribute more than 4?
SELECT ?creature WHERE {?creature rdf:type :Beast. ?creature :hasLanguages :NecrilL. ?creature :wis ?wis. FILTER(?wis &gt; 4)}
Q94: what is the average dexterity attribute for Phoenix and Sleipnir?
SELECT (AVG(?dex) AS ?dex_AVG) WHERE {?beast rdf:type :Beast. ?beast :dex ?dex. FILTER(?beast = :Phoenix || ?beast = :Sleipnir)}
00.9
cc0.3
A
0.0
GraphSparqlQAChain
w/o query enhancer
w/ query enhancer
Q0 Q9 Q13 Q14</p>
      <p>Q58 Q64 Q83 Q94</p>
      <p>QID
that even exclusively using LLMs, we can still answer questions of reasonable complexity in a
zero-shot setting. This includes questions with negation (Q0), aggregation (Q64), or even those
with multiple ?   patterns with filtering (Q83), where ? indicates a variable.</p>
      <p>As LLMs can produce diferent answers each time, we use @10, which measures the
percentage of correct answers obtained out of 10 runs for a given question, to evaluate the
performance. The most relevant work to ours is GraphSparqlQAChain2, which uses only the KG
schema as extracted context in the prompt to generate the query for a given question. We use
this as our baseline. In addition, we include two variants of our implementation – one without
the query enhancer and the other with the enhancer component in our framework.</p>
      <p>Results. Figure 3 shows the results for the eight questions. As illustrated by dashed lines, the
average @10 scores over these questions using GraphSparqlQAChain and the two variants –
one without and one with the enhancer – are 0.125, 0.326, and 0.831, respectively. As we can see
from the figure, GraphSparqlQAChain, which uses only the KG schema as the context, could not
generate the majority of queries because it is not aware of the IRI patterns of entities in the KG.
The results with and without the query enhancer component clearly indicate that the enhancer
consistently improves the quality of generated queries. For example, the @10 is zero for
both Q14 and Q94 without the query enhancer, while with the enhancer, it increases to 0.9 and
1.0, respectively. Q14 in Listing 1 shows an example where the initial query (without blue part)
2https://python.langchain.com/docs/use_cases/graph/graph_sparql_qa
has been enhanced (with blue part). For quantitative evaluation, we manually extended the
initial eight questions by adding 22 similar ones, resulting in a total of 30 questions. The average
@10 scores using GraphSparqlQAChain, without query enhancer, and with query enhancer
are 0.14, 0.22, and 0.57 respectively ( &lt; . 05). Although it is clear that the performance improves
with the query enhancer, it is worth noting that this improvement comes with the extra cost of
prompting LLMs n more times, where n is a predefined parameter indicating how many times
we want to repeat the enhancing process.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Future Work</title>
      <p>
        In this work, we presented a general framework for exploring KGs with LLMs. In addition, we
investigated an example implementation using all components defined in the framework in a
challenging setting, which exclusively uses LLMs in a zero-shot setting without ground-truth
data and fine-tuning. While the example implementation eliminates the need for maintaining
entity embeddings for embedding-based entity retrieval, it may result in hallucinations, where
non-existent entities are used as subjects or objects in the generated queries. In addition,
answering questions that require complex SPARQL queries is challenging due to the current
limitations of LLM in generating such queries. Further investigation with other LLMs, including
specialized open-source LLMs trained on open question-query datasets, is required. Additionally,
using all 100 questions from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for evaluation in our setting – without ground-truth and without
ifne-tuning – is challenging. Those questions contain many complex queries, such as those
requiring regex patterns, and might exclude the interesting possibility of exploring this research
direction. Hence, a benchmark dataset with varying query complexities would be beneficial.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments References</title>
      <p>This work was funded by the GLACIATION Horizon Europe project (No. 101070141).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bo</surname>
          </string-name>
          ,
          <article-title>Llm-based sparql generation with selected schema from large scale knowledge base</article-title>
          ,
          <source>in: CCKS</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>304</fpage>
          -
          <lpage>316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovriguina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Teucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radyush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mouromtsev</surname>
          </string-name>
          , Sparqlgen:
          <article-title>One-shot prompt-based approach for sparql query generation</article-title>
          ,
          <source>in: SEMANTiCS</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Roziere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gloeckle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sootla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Gat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. E.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Adi</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Remez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rapin</surname>
          </string-name>
          , et al.,
          <article-title>Code llama: Open foundation models for code</article-title>
          ,
          <source>arXiv:2308.12950</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Soman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Akbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Peetoom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Villouta-Reyes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cerono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rizk-Jackson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Israni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Baranzini</surname>
          </string-name>
          ,
          <article-title>Biomedical knowledge graph-optimized prompt generation for large language models</article-title>
          ,
          <source>arXiv:2311.17330</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. V. S.</given-names>
            <surname>Avila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <article-title>A framework for question answering on knowledge graphs using large language models</article-title>
          ,
          <source>in: ESWC</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>