1. Introduction

Semantic Web 15 (2024) 2193-2207. URL: https://journals.sagepub.com/doi/abs/10.3233/SW-233471. doi:10.3233/ SW- 233471. arXiv:https://journals.sagepub.com/doi/pdf/10.3233/SW

1613-0073

10.1038/s41597

GRASP: Generic Reasoning And SPARQL Generation across Knowledge Graphs - Demo System

Sebastian Walter

bast@cs.uni-freiburg.de swalter@cs.uni-freiburg.de 0

Hannah Bast

Workshop

Question Answering, SPARQL, Knowledge Graphs

0 University of Freiburg , Georges-Köhler-Allee 51, 79110 Freiburg im Breisgau , Germany

2025

769 2 6

GRASP is the first approach for SPARQL-based question answering that, in principle, works for arbitrary given RDF knowledge graphs zero-shot, that is, without prior training or information on the graph. In this work, we present and describe a prototypical demo system that implements the GRASP approach. The system also supports general question answering and follow-up questions. We extend the evaluation of the associated research paper by experiments on the IMDb knowledge graph and the TEXT2SPARQL challenge.

1. Introduction

Contributions the architecture and the core components (see Section 3).

2. Related Work

The SPARQL QA problem fits within the broader domain of knowledge graph question answering, which can be divided into three categories. The first category includes methods that are fine-tuned for a specific benchmark and knowledge graph [ 7, 8, 9 ]. These methods often achieve strong results by being able to adapt to patterns in the benchmark and knowledge graph. The second category includes https://ad.informatik.uni-freiburg.de/staff/walter (S. Walter); https://ad.informatik.uni-freiburg.de/staff/bast (H. Bast)

CEUR

ceur-ws.org { } { }

Send query message “question”: “Who published the most papers …”, “knowledge_graphs”: [“dblp”], “task”: “sparql-qa”

Send update messages “type”: “model”, “content”: “To answer the question, I need to follow these steps: …”,

GRASP Agent

Execute

SPARQL Knowledge graph SPARQL endpoints

Search in indices

Knowledge graph

indices

Query subgraphs

Get index

data* { { }

{ “type”: “function”, ““tcyopnet”e:n“tf”u:n“scteioanrc”,h_entity”, “““tnayrapgmes””e::”“{:f“u“qsnueceatriroyc”nh:”_“,NenetuitryIP”,S”, ““ca“okrggns”te:”:n“d{t“”bq:lu“pse”e}r,ya”r:c“hIC_eMn“aLt“irr”t“eg,ykss”g,u””::lt{”““:dq“buTlepor”py},”1:0“IeCnLtRity”,…”, “kg”: “dblp”}, “result”: “Top 10 entit}y …”, “result”: “Top 10 entity …”, } * Building knowledge graph indices is currently done offline and uses pre-defined SPARQL queries to extract the index data from the knowledge graphs via the SPARQL endpoints. However, these queries could in theory also be found by the GRASP agent itself and the index building could happen online, if the knowledge graph is reasonably small. methods that do not require fine-tuning but instead use in-context learning with information that is specific for the benchmark or knowledge graph, such as exemplary question-SPARQL pairs [ 10, 11, 12 ]. The third category covers methods which function without any of the above, but rather explore the knowledge graph on the go [ 1, 13, 14, 15 ]. For a more detailed account of related work, we refer to [ 1 ].

Only one of the latter methods, SPINACH [ 13 ], provides a publicly available demo of their approach at spinach.genie.stanford.edu, which uses a chat-like interface similar to ours.1 But like its underlying approach, this system is limited to the Wikidata knowledge graph.

3. System

In the following, we describe the main components of our system: an LLM-based agent that controls the function calls (Section 3.1), the indices used by the agent to search the knowledge graphs (Section 3.2), and the configuration allowing the user to adapt the system to their needs (Section 3.3).

The GRASP CLI is the main tool for working with the GRASP system. It is used to prepare knowledge graphs, build indices, start the GRASP server, run the GRASP agent in a headless fashion, and more. See Fig. 2 for an overview. Most users will use GRASP in a client-server setup. For that, we also provide a compatible web application. See Fig. 1 for an overview.

3.1. GRASP Agent

In a nutshell, the GRASP agent works as follows: starting from the user’s question augmented by a generic instruction prompt, it enters a loop, querying the knowledge graph exploratively until it finds a SPARQL query that produces the desired answer. It reasons about previous query results to determine the next query or whether the final answer has been found. The queries are realized via function calling.

Specifically, the agent is provided with a fixed set of functions, each with a unique name and a 1See also www.wikidata.org/wiki/User:SpinachBot # Run GRASP agent on a question echo "Name 10 german race car drivers" | grasp run <config> # Run GRASP agent on multiple questions cat questions.jsonl | grasp file <config> # Start the GRASP server grasp serve <config> # Get search index data for a knowledge graph grasp data <kg> [--endpoint <endpoint>] # Build search indices for a knowledge graph grasp index <kg> # Build an example index from question-sparql-pairs grasp examples examples.jsonl examples.index # Evaluate GRASP's predictions on a benchmark grasp evaluate benchmark.jsonl pred.jsonl <endpoint> description in natural language of its purpose and its parameters.2 The functions are: EXE (execute an arbitrary SPARQL query), LST (list triples with given constraints), SEN (search for entities matching a given query string), SPR (search for properties matching a given query string), SPE (search for properties of a given entity), SOP (search for objects of a given property), ANS (answer and stop), CAN (cancel and stop). If few-shot examples are available, one of the functions FSE (find similar examples) or FEX (find random examples) is provided as well. See [ 1 ] for a more detailed description of each of these functions.

For this paradigm to work properly, we rely on the underlying model being trained to support zero-shot function calling, which is true for nearly all recent closed-source models like GPT-4.1 by OpenAI [ 16 ] or Gemini 2.5 by Google [ 17 ], as well as for many recent open-source models like Qwen2.5 [ 18 ] or Qwen3 [ 19 ]. The latter can be easily self-hosted via vLLM [ 20 ]. Wherever supported by the model provider, we use constrained decoding to force the model to output valid function calls that are guaranteed to follow the available function signatures for increased reliability.

Table 1 provides F1-scores and statistics on six benchmarks. For CWQ, WQSP, and SPINACH, GRASP achieves a comparatively low F1-score and, on average, uses more steps, function calls, and time. We suspect that this is due to the harder questions, in particular on SPINACH. For CWQ and WQSP, we observe more LST and fewer EXE calls than on the other benchmarks. We assume that this is due to the older Freebase knowledge graph being less familiar to the underlying LLM, which thus requires more exploration and verification steps (which GRASP typically realizes via LST). We also find that the SOP function is almost never used; it could therefore probably be removed without loss of performance.

Tasks

Not all questions are answerable by or via a SPARQL query. We have therefore implemented an extension that allows the user to dynamically switch to the task of general question answering over 2The implementation of the functions is fixed and part of the GRASP system. When calling a function, the model provides the function name and parameter values, and receives back the results in text format from our implementation. knowledge graphs by adapting the GRASP instruction and ANS and CAN functions respectively. For this task, the final output is not a SPARQL query, but arbitrary natural language text. For example, this is useful for a question like “Write a Python script to download all Wikipedia articles about dog breeds”, which can be answered by first finding a SPARQL query to retrieve the article URLs, and then using this SPARQL query in a Python program.

Follow-up question answering

In [ 1 ], the GRASP agent is used to answer questions by finding a corresponding SPARQL query in a single uninterrupted interactive process between model and knowledge graphs. In practice, one would like to have multi-turn conversations and ask follow-up questions, potentially switching tasks and the underlying knowledge graphs at each subsequent question. We implement this by first determining the GRASP instruction for the current task and knowledge graph selection, then adding all previous questions and reasoning or function call steps unchanged, and finally asking the follow-up question. The web application supports this use case by sending an additional past field containing the full interaction history on follow-up questions.

3.2. Search indices

Besides the agent, the search indices are the second integral part of the overall GRASP approach. Currently, the GRASP system supports two types of search indices: prefix-keyword indices (PFX) for prefix-sensitive keyword search, and similarity indices (SIM) for vector-based similarity search. We refer to [ 1 ] for a more detailed explanation of both index types.

The indices enable search queries that are either ineficient in SPARQL (in the case of prefix-keyword search) or not supported by SPARQL (in the case of vector-based similarity search). We make the search indices accessible to the GRASP agent via easy-to-use search functions (see Section 3.1), which are purpose-built for the most common types of searches a human expert performs while writing SPARQL queries. It is shown in [ 1 ] that these search functions (implemented using the mentioned indices) significantly boost the overall system performance (compared to when only EXE is provided).

For entities, we build a PFX by default, because it requires less disk space and RAM, and is faster to query than a SIM. Besides, SIM does not give significantly better results because keyword search works well on entities. Given a PFX query, we calculate a score between each entity and the query as the weighted sum of the number of exact and prefix keyword matches minus a weighted sum of the number of unmatched query and entity keywords. If there is neither an exact nor a prefix keyword match, the entity is excluded entirely. The entities with the highest scores are then returned as search results, where is a GRASP configuration option (see search_top_k: in Fig. 3).

For properties, keyword queries often miss relevant search results. For example, searching for “born in” should also match “place of birth”, but does not when using a PFX. We therefore build a SIM for properties by default. Since knowledge graphs typically have only few properties, the higher disk space and RAM consumption of SIM compared to PFX does not matter. Given a search query, we compute its vector embedding (using Qwen/Qwen3-Embedding-0.6B [26] by default), compute the cosine-similarity to all pre-computed property embeddings and return the top properties with the highest similarity as search results. Note that a GPU is required to run a SIM eficiently in practice.

Both PFX and SIM support specifying a subset of items to restrict the search to. Together with a SPARQL endpoint we use this functionality to implement the advanced search functions of GRASP. For example, SPE allows searching for properties of a given entity. For that, we first send a SPARQL query to the endpoint to retrieve all potential properties for the given entity, restrict the corresponding property index to these properties, and execute the search query over that restricted index.

Table 2 provides statistics for the search indices of eight knowledge graphs. Starting the GRASP server with all these indices takes less than 20 s, and uses ≈ 20 GB of RAM and ≈ 3 GB of GPU memory, measured on a machine with an AMD Ryzen 9 7950X CPU, an NVIDIA GeForce RTX 4090 GPU, and 4 × 4 TB NVMe SSD.

3.3. Configuration

GRASP can be easily configured via a single configuration file in YAML format. See Fig. 3 for the ifle structure and configuration options. We provide sensible defaults for all configuration options, so the user often only needs to configure the particular knowledge graphs they want to use with GRASP. For example, a YAML config to run GRASP with Wikidata and Freebase can be as simple as knowledge_graphs: [kg: "wikidata", kg: "freebase"]. With the client-server setup, all configured knowledge graphs are automatically available in the web application, which itself requires no configuration; only the address and port of the GRASP server need to be set.

We briefly discuss the three most important configuration options for GRASP besides the model and knowledge graphs themselves: 1. Setting feedback: true corresponds to GRASP-F from [ 1 ] and allows the GRASP agent to relfect on and improve its own answers, which increases quality at the cost of longer runtimes. The max_feedbacks: option sets the upper bound for the number of feedback loops per generation. 2. Setting force_examples: to a knowledge graph that specifies an example index via example_index: triggers a call of either the FEX or FSE function (depending on whether random_examples: is set to true or false) at the beginning of a generation. This enables few-shot learning in the style of the few-shot evaluations from [ 1 ]. 3. Setting know_before_use: true tells GRASP to verify knowledge graph items before using them in EXE function calls. This is enforced by returning an error message rather than the query result if an EXE call uses items that were not present in any previous function call result. This mechanism avoids hallucinations of knowledge graph items, which we found to be a frequent problem with GRASP, in particular on knowledge graphs that are less familiar to the underlying LLM. For example, for the DBLP-QuAD [31] benchmark, without this setting, GRASP often uses incorrect properties without verification, like the seemingly more canonical dblp:author instead of the correct dblp:authoredBy. Consequently, this setting improves the F1-score on this benchmark from 51.0 to 66.8.

4. Additional Evaluations

To further validate GRASP’s zero-shot question answering capabilities, we extend the set of evaluated knowledge graphs and benchmarks from [ 1 ]. First, we build an own small benchmark for IMDb [ 5 ], the popular movie and series database, consisting of 15 questions. Second, we evaluate GRASP on the small, domain-specific knowledge graph representing a corporate setting from the TEXT2SPARQL challenge [ 6 ]. The corresponding challenge benchmark contains 50 questions and is “designed to test a model’s ability to adapt to restricted and domain-focused data environments”. On both benchmarks, GRASP achieves good results and even surpasses the best (INFAI) and second best (IIS-Q) entries from the TEXT2SPARQL challenge by a large margin. See Table 3 for full results.

5. Conclusion

We have built a complete system based on the GRASP approach from [ 1 ]. We have combined the core SPARQL question-answering capability with general question answering and multi-turn follow-up questions. We have extended the evaluation from [ 1 ] by new experiments on the IMDb knowledge graph and the TEXT2SPARQL challenge, with strong results. This provides further support for GRASP’s zero-shot capabilities across knowledge graphs.

For future work, we consider the support of automatic builds of search indices from nothing but a SPARQL endpoint, as well as integrating search indices and their functionality directly into SPARQL. This would be a step towards both zero-shot and zero-configuration question answering on arbitrary given knowledge graphs. To prevent GRASP from repeatedly making the same mistakes, which can occur in the zero-shot setting, we also consider adding memory or other forms of online learning as potential future work.

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 499552394 – SFB 1597.

Declaration on Generative AI

The author(s) have not employed any Generative AI tools.

[1]

Walter ,

Bast , GRASP: Generic reasoning and SPARQL generation across knowledge graphs , in: ISWC , 2025 . Accepted for publication . Preprint available at https://arxiv.org/abs/2507.08107.

[2]

Vrandecic ,

Krötzsch , Wikidata: a free collaborative knowledgebase , Commun. ACM 57 ( 2014 ) 78 - 85 .

[3]

K. D.

Bollacker ,

Evans ,

P. K.

Paritosh ,

Sturge ,

Taylor , Freebase: a collaboratively created graph database for structuring human knowledge , in: SIGMOD Conference , ACM, 2008 , pp. 1247 - 1250 .

[4]

M. R.

Ackermann ,

Bast ,

B. M.

Beckermann ,

Kalmbach ,

Neises , S. Ollinger, The dblp knowledge graph and SPARQL endpoint , TGDK 2 ( 2024 ) 3: 1 - 3 : 23 . URL: https://doi.org/10.4230/ TGDK.2. 2 .3. doi: 10 .4230/TGDK.2. 2 .3.

[5] IMDb, IMDb: Ratings, Reviews, and Where to Watch the Best Movies & TV Shows, 2025 . URL: https://www.imdb.com/.

[6] AKSW , TEXT2SPARQL ' 25 - AKSW , 2025 . URL: https://text2sparql.aksw.org/.

[7]

Yu ,

Zhang , P. Ng,

Zhu ,

A. H.

Li ,

Wang ,

Hu ,

W. Y.

Wang ,

Wang , B. Xiang, DecAF: Joint decoding of answers and logical forms for question answering over knowledge bases, in: ICLR, OpenReview .net, 2023 .

[8]

Luo ,

Li ,

Hafari ,

Pan , Reasoning on graphs: Faithful and interpretable large language model reasoning , in: ICLR, OpenReview.net, 2024 .

[9]

Luo , H. E,

Tang ,

Peng ,

Guo ,

Zhang , C. Ma, G. Dong,

Song ,

Lin ,

Zhu , A. T. Luu, ChatKBQA: A generate-then-retrieve framework for knowledge base question answering with ifne-tuned large language models , in: ACL (Findings), Association for Computational Linguistics , 2024 , pp. 2039 - 2056 .

[10]

Patidar ,

Sawhney ,

A. K.

Singh ,

Chatterjee , Mausam, I. Bhattacharya , Few-shot transfer learning for knowledge base question answering: Fusing supervised models with in-context learning , in: ACL (1) , Association for Computational Linguistics , 2024 , pp. 9147 - 9165 .

[11]

Gu ,

Deng ,

Su , Don't generate, discriminate: A proposal for grounding language models to real-world environments , in: ACL (1) , Association for Computational Linguistics , 2023 , pp. 4928 - 4949 .

[12]

Ma ,

Gao ,

Chai ,

Sun ,

Wang ,

Pei ,

Tao ,

Song , J. Liu,

Zhang , L. Cui, Debate on graph: A flexible and reliable reasoning framework for large language models , in: AAAI, AAAI Press, 2025 , pp. 24768 - 24776 .

[13]

Liu ,

S. J.

Semnani ,

Triedman ,

Xu ,

I. D.

Zhao ,

M. S.

Lam , SPINACH: SPARQL-based information navigation for challenging real-world questions , in: EMNLP (Findings), Association for Computational Linguistics , 2024 , pp. 15977 - 16001 .

[14]

Sun ,

Xu ,

Tang ,

Wang ,

Lin ,

Gong ,

L. M.

Ni ,

Shum ,

Guo , Think-on-Graph: Deep and responsible reasoning of large language model on knowledge graph, in: ICLR, OpenReview .net, 2024 .

[15]

Jiang ,

Zhou ,

Dong ,

Ye ,

Zhao , J. Wen, StructGPT: A general framework for large language model to reason over structured data , in: EMNLP, Association for Computational Linguistics, 2023 , pp. 9237 - 9251 .

[16] OpenAI , Introducing

GPT

-4 .1 in the

API

, 2025 . URL: https://openai.com/index/gpt-4-1/, accessed: 2025 -05-11.

[17] Google

DeepMind

, Introducing Gemini 2 . 0: our new AI model for the agentic era , 2024 . URL: https: //blog.google/technology/google-deepmind/ google-gemini-ai- update- december-2024/, accessed: 2025 -05-11.

[18]

Yang ,

Zhang ,

Hui ,

Zheng ,

Yu ,

Li ,

Liu ,

Huang ,

Wei , et al., Qwen2.5 technical report, arXiv preprint arXiv:2412.15115 ( 2024 ).

[19] Qwen

Team

, Qwen3 technical report , 2025 . URL: https://arxiv.org/abs/2505.09388. arXiv: 2505 . 09388 .

[20]

Kwon ,

Li ,

Zhuang ,

Sheng ,

Zheng ,

C. H.

Yu ,

J. E.

Gonzalez ,

Zhang , I. Stoica, Eficient