<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Toward Exploring Knowledge Graphs with LLMs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Guangyuan</forename><surname>Piao</surname></persName>
							<email>guangyuan.piao@dell.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Dell Technologies</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mike</forename><surname>Mountantonakis</surname></persName>
							<email>mike.mountantonakis@ercim.eu</email>
							<affiliation key="aff1">
								<orgName type="department">W3C</orgName>
								<orgName type="institution">ERCIM</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Panagiotis</forename><surname>Papadakos</surname></persName>
							<email>panagiotis.papadakos@ercim.eu</email>
							<affiliation key="aff1">
								<orgName type="department">W3C</orgName>
								<orgName type="institution">ERCIM</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pournima</forename><surname>Sonawane</surname></persName>
							<email>pournima.sonawane@dell.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Dell Technologies</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Aidan</forename><surname>Omahony</surname></persName>
							<email>aidan.omahony@dell.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Dell Technologies</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Toward Exploring Knowledge Graphs with LLMs</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">755E770F1E9DB30E659B8D91AB54A2AC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Interacting with knowledge graphs (KGs) is challenging for non-technical users with information needs who are unfamiliar with KG-specific query languages such as SPARQL and the underlying KG schema. Previous KG question answering systems require ground-truth pairs of questions and queries or fine tuning (Large) Language Models (LLMs) for a specific KG, which is time-consuming and demands deep expertise. In this poster, we present a framework for exploring KGs for question answering using LLMs in a zero-shot setting for non-technical end users, without the need for ground-truth pairs of questions and queries or fine-tuning LLMs. Additionally, we evaluate an example implementation in a simple yet challenging setting using LLMs exclusively based on the framework, without the extra effort of maintaining the embeddings or indexes of entities from KG for retrieving relevant ones to a given question. We share preliminary experimental results indicating that exploring a KG using LLM-generated SPARQL queries with reasonable complexity is possible in such a challenging setting.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Exploring knowledge graphs (KGs) for question answering poses challenges, particularly for nontechnical users lacking sufficient knowledge of KG-specific query languages such as SPARQL and Cypher. In previous studies of KG question answering systems <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>, they either require ground-truth pairs of questions and queries or fine-tuning of (Large) Language Models (LLMs) for any KG of interest. However, creating such ground-truth data is time-consuming and demands significant effort and expertise. The requirement for fine-tuning necessitates machine learning expertise and restricts the use of many proprietary LLMs, such as ChatGPT, which is only accessible through APIs but exhibits outstanding performance. The goal of the poster is to present a general framework for exploring KGs with LLMs for question answering, without these requirements (Section 2). In addition, we present an example implementation with all components defined in the framework, along with preliminary experimental results (Section 3). In contrast to building and maintaining indexes or embeddings of entities for retrieving relevant ones from a KG, we focus on a simpler yet more challenging setting: using LLMs exclusively by prompting the chosen LLM to automatically infer entity IRIs (Internationalized Resource Identifiers). Finally, we discuss some challenges and future work in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Framework of Question Answering with a KG using LLMs</head><p>Figure <ref type="figure" target="#fig_0">1</ref> illustrates our framework for question answering with a knowledge graph using LLMs. The top three components depict a straightforward pipeline. Specifically, the top-left component indicates the input to an LLM. In addition to a question, a user can provide any user-input context, such as a description about the KG. The extracted context refers to any context automatically retrieved from the system. This can include, but is not limited to, the KG schema or a subset of triples in the KG that might be relevant to the question. The LLM component refers to any LLMs, such as Code Llama <ref type="bibr" target="#b2">[3]</ref>, used for KG-specific query generation for answering the question. The output component executes the provided query to obtain the answer. As a special case, the LLM can also generate the output or asnwer directly based on the given question and context derived from the KG without generating any queries as in KG-RAG <ref type="bibr" target="#b3">[4]</ref>.</p><p>In addition to these three basic components, the framework includes a set of optional components indicated by dotted boxes. The context extractor aims to automatically extract any useful context for answering the question. For example, it can extract a set of predicates or class types that are relevant to the question. The context parser and enhancer process the output of the extractor and enhance it by validating, updating or pruning as necessary. For example, they can check whether the extracted predicates actually exist in the KG schema or revisit the context extractor if necessary. KG-RAG <ref type="bibr" target="#b3">[4]</ref> implements a context extractor based on embedding similarities between the question sentence or a set of extracted entities and precomputed entity embeddings with a small language model to retrieve relevant entities from the KG. A set of triples associated with each entity is retrieved, and then parsed and pruned based on their relevance to the given question. Auto-KGQA <ref type="bibr" target="#b4">[5]</ref> implements the retrieval of relevant entities by building and maintaining KG resource indexes for text-or embedding-based approaches. For each entity, all its triples are retrieved along with their neighbors up to a predefined depth. These triples are then parsed and pruned to construct a sub-graph containing the most relevant triples to the question. The query parser and enhancer parse the LLM output, extract the query, and refine the query if necessary. For instance, they can regenerate the query in cases where it is not executable or returns no results. Auto-KGQA <ref type="bibr" target="#b4">[5]</ref> prompts a LLM to generate several SPARQL queries and parse the results, and then let the LLM choose the best one.</p><p>In the framework, edges highlighted in blue and orange indicate repeatable loops. For example, the loop of 5 6 7 8 can be repeated multiple times, with each iteration providing a query as extracted context. The query extracted from the previous loop can be used in the next query generation process to enhance it. The framework also allows for the extraction of different types of context. That is, one can have several sets of context extractors, parsers, and enhancers   (shaded boxes in Figure <ref type="figure" target="#fig_0">1</ref>). For instance, one set can be used for extracting the KG schema, while another can be used for extracting a set of relevant triples to the question.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Example Implementation and Experimental Results</head><p>Here, we present an example implementation<ref type="foot" target="#foot_0">1</ref> built upon the proposed framework with all components, using Code Llama Instruct 7B <ref type="bibr" target="#b2">[3]</ref> as our LLM. Specifically, we are interested in exploring an implementation that relies solely on LLMs, i.e., without maintaining indexes or embeddings of entities in the KG for retrieving relevant entities, which would otherwise require extra effort and expertise <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. To this end, we consider questions with extracted context as input to the LLM. We use two types of context extractors. The first extractor extracts class types and predicates from the KG schema as context. The second one prompts the same LLM to infer the top-k class types and predicates relevant to the question. Next, the context parser and enhancer check the extracted class types and predicates, and rerun the context extractor if non-existent class types or predicates are detected. Based on these extracted class types and predicates, we can retrieve p triple(s) for each class type and predicate, which are provided as context triples (𝑘=5 and 𝑝=1 in our experiments). Afterwards, the LLM generates the output for the given question and extracted context. Subsequently, the query parser and enhancer parse the output to obtain the query and enhance it if necessary. Again, we prompt the same LLM to check the generated query and add, remove, or modify it if necessary. Steps 5 6 7 8 can be repeated multiple times before finalizing the SPARQL query as our output. Figure <ref type="figure" target="#fig_1">2</ref> illustrates an example workflow of generating the final SPARQL query for Q14 in Listing 1.</p><p>Experimental settings. We use a custom Bestiary KG <ref type="bibr" target="#b1">[2]</ref> which contains diverse information about over 4,000 creatures from a fantasy role-playing game, comprising 98,070 triples. However, upon careful investigation of each question in the dataset, we noticed that the majority of 100 questions from <ref type="bibr" target="#b1">[2]</ref> require high-complexity SPARQL queries. In our challenging setting, using those questions for evaluation may not be feasible (as the majority cannot be answered), and could potentially hinder the exploration of new directions by beginning with those queries. After careful investigation of those questions, we empirically chose eight questions that are possible to answer in our setting but also have varying complexities. Listing 1 shows the set of questions and the corresponding generated SPARQL queries for our discussion. It shows Listing 1: Eight example questions and their corresponding generated SPARQL queries.  that even exclusively using LLMs, we can still answer questions of reasonable complexity in a zero-shot setting. This includes questions with negation (Q0), aggregation (Q64), or even those with multiple ?𝑠 𝑝 𝑜 patterns with filtering (Q83), where ?𝑠 indicates a variable. As LLMs can produce different answers each time, we use 𝐴𝑐𝑐@10, which measures the percentage of correct answers obtained out of 10 runs for a given question, to evaluate the performance. The most relevant work to ours is GraphSparqlQAChain<ref type="foot" target="#foot_1">2</ref> , which uses only the KG schema as extracted context in the prompt to generate the query for a given question. We use this as our baseline. In addition, we include two variants of our implementation -one without the query enhancer and the other with the enhancer component in our framework.</p><p>Results. Figure <ref type="figure" target="#fig_2">3</ref> shows the results for the eight questions. As illustrated by dashed lines, the average 𝐴𝑐𝑐@10 scores over these questions using GraphSparqlQAChain and the two variantsone without and one with the enhancer -are 0.125, 0.326, and 0.831, respectively. As we can see from the figure, GraphSparqlQAChain, which uses only the KG schema as the context, could not generate the majority of queries because it is not aware of the IRI patterns of entities in the KG. The results with and without the query enhancer component clearly indicate that the enhancer consistently improves the quality of generated queries. For example, the 𝐴𝑐𝑐@10 is zero for both Q14 and Q94 without the query enhancer, while with the enhancer, it increases to 0.9 and 1.0, respectively. Q14 in Listing 1 shows an example where the initial query (without blue part) has been enhanced (with blue part). For quantitative evaluation, we manually extended the initial eight questions by adding 22 similar ones, resulting in a total of 30 questions. The average 𝐴𝑐𝑐@10 scores using GraphSparqlQAChain, without query enhancer, and with query enhancer are 0.14, 0.22, and 0.57 respectively (𝛼 &lt; .05). Although it is clear that the performance improves with the query enhancer, it is worth noting that this improvement comes with the extra cost of prompting LLMs n more times, where n is a predefined parameter indicating how many times we want to repeat the enhancing process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion and Future Work</head><p>In this work, we presented a general framework for exploring KGs with LLMs. In addition, we investigated an example implementation using all components defined in the framework in a challenging setting, which exclusively uses LLMs in a zero-shot setting without ground-truth data and fine-tuning. While the example implementation eliminates the need for maintaining entity embeddings for embedding-based entity retrieval, it may result in hallucinations, where non-existent entities are used as subjects or objects in the generated queries. In addition, answering questions that require complex SPARQL queries is challenging due to the current limitations of LLM in generating such queries. Further investigation with other LLMs, including specialized open-source LLMs trained on open question-query datasets, is required. Additionally, using all 100 questions from <ref type="bibr" target="#b1">[2]</ref> for evaluation in our setting -without ground-truth and without fine-tuning -is challenging. Those questions contain many complex queries, such as those requiring regex patterns, and might exclude the interesting possibility of exploring this research direction. Hence, a benchmark dataset with varying query complexities would be beneficial.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The framework of question answering with a KG using LLMs. It can include several sets of shaded components to extract different contexts. The colored loops can be repeated multiple times.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example workflow for generating the SPARQL query of a question with the framework.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Acc@10 for each of the eight questions, with the average indicated by a dashed line.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Q0: which creatures not speaking draconic language do have chaotic good alignment? SELECT ?creature WHERE {?creature a :Beast. ?creature :hasAlignment :chaoticGood. FILTER NOT EXISTS {?creature :hasLanguages :DraconicL.}} creatures speaking dwarven language do have armor class greater than 12? SELECT ?creature WHERE {?creature a :Beast. ?creature :hasLanguages :DwarvenL. ?creature :hasACValue ?ac. FILTER( ?ac &gt; 12)} Q64: what is the average number of health points for creatures speaking gnome language? SELECT (AVG(?hp) AS ?hp_AVG) WHERE {?creature rdf:type :Beast. ?creature :hasLanguages :GnomeL. ?creature :hasHPvalue ?hp.} Q83: which creatures speaking necril and abyssal languages do have wisdom attribute more than 4? SELECT ?creature WHERE {?creature rdf:type :Beast. ?creature :hasLanguages :NecrilL. ?creature :wis ?wis. FILTER(?wis &gt; 4)} Q94: what is the average dexterity attribute for Phoenix and Sleipnir? SELECT (AVG(?dex) AS ?dex_AVG) WHERE {?beast rdf:type :Beast. ?beast :dex ?dex. FILTER(?beast = :Phoenix || ?beast = :Sleipnir)}</figDesc><table><row><cell cols="3">Q9: what creatures do have cold resist?</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="5">SELECT ?creature WHERE {?creature :hasResists :cold.}</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="7">Q13: what creatures do speak both common and undercommon languages?</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="10">SELECT ?creature WHERE {?creature :hasLanguages :CommonL. ?creature :hasLanguages :UndercommonL.}</cell></row><row><cell cols="8">Q14: how many creatures do speak all three languages: abyssal, sylvan and elven?</cell><cell></cell><cell></cell></row><row><cell cols="8">SELECT (COUNT(?creature) AS ?count) WHERE {?creature :hasLanguages :AbyssalL.</cell><cell></cell><cell></cell></row><row><cell cols="7">?creature :hasLanguages :SylvanL. ?creature :hasLanguages :ElvenL.}</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q58: what Q0 0.0 0.3 0.6 0.9 Acc@10</cell><cell>Q9</cell><cell>Q13</cell><cell>Q14</cell><cell>QID</cell><cell>Q58</cell><cell>Q64</cell><cell>Q83</cell><cell>Q94</cell><cell>GraphSparqlQAChain w/o query enhancer w/ query enhancer</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The source code, prompt templates, and examples are available at https://github.com/parklize/LLM4SPARQL.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://python.langchain.com/docs/use_cases/graph/graph_sparql_qa</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was funded by the GLACIATION Horizon Europe project (No. 101070141).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Llm-based sparql generation with selected schema from large scale knowledge base</title>
		<author>
			<persName><forename type="first">S</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Teng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bo</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
			<publisher>Springer</publisher>
			<biblScope unit="page" from="304" to="316" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Sparqlgen: One-shot prompt-based approach for sparql query generation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Kovriguina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Teucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Radyush</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mouromtsev</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
			<publisher>SEMANTiCS</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Roziere</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gehring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gloeckle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sootla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">E</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Adi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Remez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rapin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.12950</idno>
		<title level="m">Code llama: Open foundation models for code</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Soman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">W</forename><surname>Rose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Morris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Akbas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Peetoom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Villouta-Reyes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Cerono</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rizk-Jackson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Israni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Nelson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E</forename><surname>Baranzini</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2311.17330</idno>
		<title level="m">Biomedical knowledge graph-optimized prompt generation for large language models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A framework for question answering on knowledge graphs using large language models</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">V S</forename><surname>Avila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Casanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">M</forename><surname>Vidal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ESWC</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
