<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Shapes in Large Language Model Contexts for Question Answering on Public and Private Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan G. Wardenga</string-name>
          <email>jan.wardenga@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobias Käfer</string-name>
          <email>tobias.kaefer@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Knowledge Graphs, Question Answering, Large Language Models, SPARQL Generation, Data Shapes, Prompt</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Engineering</institution>
          ,
          <addr-line>Semantic Web, Domain Adaptation, Retrieval-Augmented Generation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>First International TEXT2SPARQL Challenge, Co-Located with Text2KG at ESWC25</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Karlsruhe Institute of Technology (KIT)</institution>
          ,
          <addr-line>Kaiserstraße 12, 76131 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Knowledge Graph Question Answering aims to make structured semantic data accessible through natural language interfaces. While recent Large Language Models can generate SPARQL queries from natural language questions, their efectiveness is limited by their reliance on prior exposure to vocabularies and datasets during pretraining. This work explores how Knowledge Graph specific schema information, so called shape constraints, can be used to provide Large Language Models with context about the dataset to be queried, enabling more accurate and generalizable query generation. We show that ShEx-augmented prompting achieves an F1-score of 0.28 on unseen knowledge graphs, compared to a baseline score of 0.00, demonstrating its ability to generalize beyond the training distribution. A pipeline is developed that integrates knowledge graph data shape extraction, prompt construction, and automatic SPARQL validation. Experiments were conducted on benchmarks with public and proprietary data sets using diferent state-of-the-art Large Language Models and demonstrate that shape-informed prompting improves the execution accuracy of generated queries. These results suggest that augmentation with data shapes can reduce the dependency on Large Language Model pre-training or fine-tuning on a specific dataset and ofer a practical pathway toward domain-agnostic, robust Knowledge Graph Question Answering systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Question Answering (QA) systems have a long history, initially focusing on retrieving and extracting
answers from unstructured textual sources such as documents or the Web [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The goal of QA systems,
both then and now, has been to enable end users to leverage data sources while abstracting away the
complexity of the underlying retrieval mechanisms. With the advent of large-scale structured Knowledge
Bases (KBs) like Freebase [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Wikidata [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and DBpedia [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] a new paradigm emerged: answering
questions based on structured, machine-readable data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These KBs are commonly represented as
Knowledge Graphs (KGs) and structured based on the Web Ontology Language (OWL) and the Resource
Description Framework (RDF). They store factual knowledge as triples and can be queried using formal
languages such as SPARQL Protocol and RDF Query Language (SPARQL) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The goal of the Knowledge Graph Question Answering (KGQA) domain is to abstract the complexity
of the underlying retrieval mechanisms, such as SPARQL, RDF, or OWL, in order to enable non-expert
users to access structured information intuitively. KGQA systems aim to make semantic data more
accessible to non-expert users, using natural language: a commonly adopted approach is to translate
Natural Language Questions (NLQs) into formal query representations, such as SPARQL, thereby
bridging the gap between human language and machine-interpretable knowledge [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. KGQA can
also be considered as a special form of Retrieval Augmented Generation (RAG) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which traditionally
is done using vector data bases, but other RAG approaches based on query generation can be shown to
provide fresher and more holistic results [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
https://github.com/Branchenprimus/ (J. G. Wardenga)
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>
        This problem is of considerable relevance due to the widespread adoption of KGs as core infrastructure
in many data-driven enterprises. Companies such as Google, Facebook, eBay, and IBM rely on knowledge
graphs to improve search capabilities, personalize user interactions, support product discovery, and
enhance knowledge extraction and decision-making processes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Despite the richness of large-scale KGs, accessing the knowledge they contain remains challenging.
While recent advances in Large Language Models (LLMs) have significantly improved the automatic
generation of syntactically correct SPARQL queries from natural language questions, a key limitation
remains: the semantic correctness of such queries often depends on ”knowledge” of the underlying
graph, including entity identifiers, relations, and constraints specific to the target KG. Furthermore,
a critical limitation of current KGQA systems based on LLMs is their reliance on prior exposure to
the target KG during pre-training. Many recent approaches implicitly assume that the model has
seen the entities, properties, and structural patterns of widely used public graphs such as Wikidata or
DBpedia during training, which introduces a bias and hinders generalization [
        <xref ref-type="bibr" rid="ref12">12, 13, 14</xref>
        ]. However, this
assumption does not hold in industrial or proprietary settings, where custom KGs are often
domainspecific, access-restricted, or entirely absent from public training corpora. For such cases, a system is
needed that can robustly handle KGQA tasks without requiring the LLM to have been pre-trained on
the specific target graph.
      </p>
      <p>
        To address this issue, this paper builds on recent work proposing the use of KG schema information,
so-called data shapes, as a ”semantic footprint” that captures the underlying structure and semantic
relationships of the KG, and integrates them into the prompt context of LLMs [
        <xref ref-type="bibr" rid="ref12 ref7 ref8">8, 12, 7</xref>
        ]. Data shapes,
expressed for instance via ShEx [15] or SHACL [16], ofer structured metadata about the types and
properties of entities within a KG [17, 18].
      </p>
      <p>This paper is based on a Master’s thesis that investigated the impact of shape-informed prompting on
KGQA performance. As part of this work, a modular LLM-driven pipeline was developed that integrates
entity extraction, data shape generation, and query synthesis. The system was subsequently evaluated
in the First International TEXT2SPARQL Challenge, where it was evaluated against the CK25 and DB25
benchmarks (see section 3.2). While the foundations of the artifact remain consistent with the challenge
submission, this paper presents extended experimental findings and deeper insights derived from the
thesis work, with a focus on generalization behavior and methodological implications.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Research Objectives</title>
      <p>This work is situated in the broader problem space of Knowledge-based Question Answering (KBQA)
with a specific focus on the subdomain of KGQA, which addresses the challenge of answering natural
language questions over structured KGs.</p>
      <p>The main contribution of this work is a comprehensive investigation into the integration of structured
schema information, in the form of KG data shapes, into the prompt context of LLMs for improved
KGQA. Specifically, this study comprises three core contributions.</p>
      <p>First, it provides a comparative analysis of the two most widely used constraint languages, SHACL
and ShEx with respect to their impact on the quality of the generated SPARQL queries.</p>
      <p>Second, it examines to what extent incorporating tailored KGs data shapes enhances the quality of
SPARQL query generation from natural language input, and quantifies the performance improvements
over a non shape-informed baseline.</p>
      <p>Third, it evaluates the generalizability of the proposed pipeline to previously unseen or proprietary
KGs, highlighting its potential to enable robust and comparable KGQA results across proprietary KGs
and their corresponding datasets.</p>
      <p>Through the development of a modular and extensible experimental pipeline, this work enables
systematic evaluation across multiple LLMs, datasets, and data shape configurations, ofering novel
insights into the interplay between structural schema information and language model prompting in
the context of KGQA.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology and Variables</title>
      <p>The developed artifact is a modular data pipeline that aims to investigate whether incorporating KG
data shapes into the prompt context of a LLM improves the quality of SPARQL query generation from
natural language questions. The independent variables are the LLMs, the datasets and the data shape
types. The pipeline is capable of executing any number of test cases, allowing statistically meaningful
sample sizes to be generated for empirical evaluation. It includes an automated evaluation mechanism
that calculates the F1-Score, which serves as the dependent variable. The development process followed
an objective-centered approach as described in the Design Science Research (DSR) methodology, with a
strong focus on iterative prototyping and optimization of the artifact [19]. While the research problem is
conceptual in nature, the technical eforts focused on realizing a stable, extensible, and testable system.</p>
      <sec id="sec-3-1">
        <title>3.1. Large Language Models</title>
        <p>Three state-of-the-art LLMs from diferent providers were selected. GPT-4o-mini (following: GPT)
was selected for its cost-eficient inference capabilities. DeepSeek-V3 (following: Deepseek) ofers a
similarly strong performance profile at a lower cost and was included as a promising alternative. As
an open-source counterpart, LLaMA-3.3-70B-Versatile (following: Llama) was selected to investigate
performance in a possibly fully self-hosted environment. All models were accessed via their respective
Application Programming Interfaces (APIs).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Datasets</title>
        <p>
          The artifact is tailored to the structure of QA datasets in the QALD format [20], which makes it partially
domain-specific and not fully generalized. Nevertheless, it supports a wide range of KGs, including
Wikidata [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], DBpedia [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and custom local RDF graphs.
        </p>
        <p>To empirically evaluate the efectiveness of the proposed approach, four datasets were selected,
covering both public benchmarks and proprietary knowledge graphs:
• QALD-9-plus (DBpedia subset): A subset of the QALD-9-plus benchmark [20], containing
natural language questions and corresponding golden SPARQL queries targeting the DBpedia
KG. It supports multilingual evaluation and is widely used in KGQA research.
• QALD-9-plus (Wikidata subset): The complementary subset of QALD-9-plus, containing
the same questions as the QALD-9-plus (DBpedia subset) but mapped to Wikidata instead of
DBpedia [20]. This facilitates cross-KG comparison using aligned questions.
• DB25: A benchmark dataset from the Text2SPARQL Challenge [21], comprising natural language
questions and SPARQL queries over the DBpedia KG. It is designed to evaluate LLM-based KGQA
systems under realistic linguistic variation and schema complexity.
• CK25: A proprietary dataset introduced in the same challenge series [22], based on an enterprise
knowledge graph not publicly available. It provides an evaluation scenario with unseen entities
and schema, simulating industrial application settings.</p>
        <p>To ensure consistency in evaluation, QALD-9-plus (DBpedia subset) and DB25 were consolidated
under the unified label DBpedia.</p>
        <p>It is noteworthy that we could not validate approximately 11% of the Wikidata 6% of the DBpedia
and 2% of CK25 answers. In part due to the natural evolution of KGs [20]. For example, the golden
query related to the question ”Butch Otter is the governor of which U.S. state?”:
PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;
PREFIX wdt: &lt;http://www.wikidata.org/prop/direct/&gt;
SELECT ?res WHERE { ?res wdt:P6 wd:Q39593 . }</p>
        <p>currently yields no result, because Butch Otter is no longer serving as a governor. Such queries were
classified as invalid and excluded from the F1-Score calculation to ensure that the evaluation is not
biased by outdated ground-truth data.</p>
        <p>The SPARQL queries in our experiments were executed against live versions of Wikidata and DBpedia
during the period from May to June 2025. Due to the nature of these experiments and the reliance on
public SPARQL endpoints, we did not persist complete dumps of the knowledge graphs at the time.
However, for the sake of reproducibility, we reference the closest available static snapshots: the May 20,
2025 Wikidata dump1 and the most recent DBpedia Live snapshot available as of August 20252. These
versions closely reflect the data state used in our experiments, although slight variations may exist due
to the dynamic nature of the endpoints during query execution.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Data Shapes</title>
        <p>Validation or data shape constraint languages are used to ensure data quality in KGs by defining integrity
constraints in the form of various KG data shape types. Data shape types are high-level structural
constraints that represent entity relations within a KG [18, 23, 24].</p>
        <p>Given, the importance of validation schemas for KGs, a couple of diferent technologies have emerged:
SHACL, ShEx, ReSh [18], DSP [25] or SPIN [26, 27]. However, the two most prominent constraint
languages, ShEx and SHACL will be discussed in greater detail [28, 17].</p>
        <p>SHACL, the Shapes Constraint Language, was developed by the World Wide Web Consortium (W3C)
Data Shapes Working Group as a draft community group report for RDF validation [ 16]. Its design
was influenced by earlier technologies such as SPIN and OSLC Resource Shapes. SPIN, in particular,
was used for expressing constraints in RDF, primarily within TopQuadrant’s TopBraid Composer, and
served both as a foundation and precursor to SHACL’s formalization [17].</p>
        <p>Shape Expressions or short: ShEx were proposed as a user-friendly and high-level language for RDF
validation. Initially proposed as a human-readable syntax for OSLC Resource Shapes, ShEx grew to
embrace more complex user requirements coming from clinical and library use cases [18]. ShEx now
has a rigorous semantics and interchangeable representations: JSON-LD and RDF [17].</p>
        <p>Respectively adapted, the shape type is configurable in the artifact settings of this works software
and supports both SHACL and ShEx formats, enabling the impact analysis of diferent types of data
shapes on the quality of the generated SPARQL queries.</p>
        <p>An example of a ShEx data shape can be found in appendix 3.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. ShapeMaps</title>
        <p>The ShapeMap standard, developed by the Shape Expressions Community Group, defines a formal
mechanism to associate RDF nodes with ShEx shape expressions. A ShapeMap consists of a finite
set of shape associations, where each association includes at minimum a node identifier and a data
shape label, and may optionally include status metadata (conformant or nonconformant), explanatory
reasons, or application-specific information. The primary role of ShapeMaps is to specify validation
intentions or outcomes. As input, a ShapeMap determines which node-shape pairs should be checked for
conformance against a ShEx schema. As output, it can capture the results of such validation, including
whether each node conforms to the associated data shape. [29]</p>
        <p>This work exclusively relies on fixed ShapeMaps for the entity based data shape generation. This
structure ensures compatibility with ShEx data shape validation engines and supports transparent and
reproducible high-level descriptions of the knowledge graphs examined throughout the experiments.
Within this work, the fixed ShapeMap serves as the initiating link between the entity extraction and
the data shape generation step, which further leads to The SPARQL query generation by augmenting
the LLM input prompt.
1https://dumps.wikimedia.org/wikidatawiki/20250520/
2https://downloads.dbpedia.org/live.tar.gz</p>
        <p>The SheXer library is used to automatically infer data shape expressions based on the neighborhood
structure of selected entities [30]. The resulting data shapes are derived based on the source entities
transformed into a fixed ShapeMap.</p>
        <p>It is notable that the SheXer library adopts a slightly modified approach to handling fixed ShapeMaps
for the purpose of data shape generation [30]. While ShapeMaps were originally introduced as part of
the ShEx ecosystem, primarily to associate RDF nodes with shape expressions for validation purposes
SheXer uses them diferently. In the oficial ShapeMap draft, ShapeMaps are intended to specify
candidate node-shape pairs as input to the validation process, or to report the conformance results
of a ShEx validation engine. In contrast, SheXer utilizes fixed ShapeMaps as an input mechanism for
shape induction: the specified RDF nodes serve as anchors for the automatic generation of data shapes.
These shapes can be expressed in either ShEx or SHACL syntax, enabling flexible downstream use in
validation or shape-informed applications. An example of such a shape map is shown below:
&lt;[.]/entity/Q1248784&gt;@&lt;Shape: [.]/entity/Q1248784 = airp.&gt;
&lt;[.]/entity/Q99&gt;@&lt;Shape: [.]/entity/Q99 = Calif.&gt;
&lt;[.]/entity/Q229623&gt;@&lt;Shape: [.]/entity/Q229623 = USA&gt;</p>
        <p>In this example, the placeholder [.] is replaced by http://www.wikidata.org. The
identiifers: entity/Q1248784 = airport, entity/Q99 = California, entity/Q229623 = USA in this
case were derived from the entity extraction process. See listing 3 for an insightful example of what is
produced by SheXer. It can be observed, that the part after the ”@” in the ShapeMap is simply used as
the data shape identifier, while the part before the ”@” is parsed and used for the data shape generation.</p>
        <p>This configuration is completed by a namespace dictionary that defines common prefixes, along with
a specification of namespaces to be ignored in order to avoid overly bloated data shapes. Together,
these elements enable the Shaper object to generate concise and efective data shapes.</p>
        <p>Furthermore in case of the Wikidata KGs, the SheXer library ofers a wikidata_annotation flag,
that resolves entity identifiers to readable names. This option was enabled to generate more semantic
clarity in regards to the LLM in In-Context Learning (ICL) capabilities. While shexer provides numerous
additional parameters for fine-tuning the output, the chosen configuration was suficient to meet the
requirements of the experiments conducted in this work.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Evaluation Metric</title>
        <p>The F1-Score provides a standard yet informative measure of answer quality by quantifying the harmonic
mean of precision and recall. In the context of this work, it is used to evaluate how well the
systemgenerated SPARQL queries approximate the correct result sets defined in the benchmark datasets. A
high F1-Score indicates that the LLM was able to generate SPARQL queries that return the correct
answers while avoiding incorrect ones, thereby balancing the trade-of between false positives and false
negatives.</p>
        <p>Following the recommendations by Usbeck et al., a QALD-specific macro-averaged F1-Score is
employed to reflect the system’s answer quality across all test questions [ 31]:</p>
        <p>F1-Score = 2 ⋅ Precision ⋅ Recall</p>
        <p>Precision + Recall
Precision = TP , Recall = TP</p>
        <p>TP + FP TP + FN</p>
        <p>The classification of true positives (TP), false positives (FP), false negatives (FN), and true negatives
(TN) is based on a comparison of the result sets derived from the execution of LLM-generated and
golden SPARQL queries provided in the datasets:
• True Positive (TP): Counted if the result set produced by the LLM-generated SPARQL query is
identical to the annotated golden answer. The order of results is disregarded.
• False Positive (FP): Counted if the result set of the LLM-generated SPARQL query difers from
the golden answer set.
• False Negative (FN): Counted if the LLM-generated SPARQL query yields an empty result set or
returns values such as 0, null, or similar indicators of failure.
• True Negative (TN): Would be counted if both the LLM-generated and golden SPARQL queries
return empty result sets. However, no such cases were examined in any of the datasets used.</p>
        <p>In the context of KGQA, the F1-Score serves as a strict measure of success, rewarding only fully
correct responses while penalizing both missing and incorrect results. The F1-Score is a bounded metric
ranging from 0 to 1, where a score of 1 indicates optimal performance, meaning the model consistently
generates answers that are both precise and complete.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Pipeline Architecture</title>
      <sec id="sec-4-1">
        <title>Host Machine</title>
        <p>Dataset
.env</p>
        <p>sys
prompt</p>
      </sec>
      <sec id="sec-4-2">
        <title>KG_Agent.sh</title>
        <p>(1) Entity Extraction
(2) Shape Generation
(3) SPARQL Generation
(4) Verifier</p>
        <p>Extract named entites</p>
        <p>from dataset NLQ
Generate shape based</p>
        <p>on named entities
Generate SPARQL using</p>
        <p>shape in LLM context
Compare LLM SPARQL
query against dataset</p>
        <p>The pipeline is designed to automate the experimental workflow and support testing in various
KGQA scenarios. The artifact is mainly implemented in Python using a number of open source libraries,
including but not limited to rdflib for RDF graph manipulation, requests and SPARQLWrapper for
querying endpoints and openai or for accessing LLM APIs.</p>
        <sec id="sec-4-2-1">
          <title>4.1. Infrastructure</title>
          <p>The pipeline can be run on most host machines with suficient compute and storage capabilities. The
experiments were conducted on a login node in the JUWELS Booster, which consists of 2× AMD EPYC
Rome 7402 CPU, 2× 24 cores, 2.8 GHz, Simultaneous Multithreading, 512GB DDR4, 3200 MHz [33].</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2. Step 1: Entity Extraction</title>
          <p>The purpose of the entity extraction is to identify relevant entities from the NLQs provided in the
benchmark dataset to be evaluated. The process begins by loading the dataset NLQs. For each entry, the
natural language question is combined with a predefined system prompt (see listing 1) and submitted
to the configured LLM endpoint for entity extraction.</p>
          <p>When testing the pipeline, we found that the proprietary graph linked to the CK25 dataset was small
enough to generate a complete data shape, making the entity extraction process obsolete. In such cases,
the class skips the extraction step and proceeds directly to the next processing stage. However, since
most of the tested datasets were too large to generate complete data shapes, we focused on including
entity extraction as the default mode.</p>
          <p>When entity extraction is performed, the LLM returns a list of relevant entity names. This list is
parsed and forwarded to a SPARQL endpoint for disambiguation and identifier resolution. The result is a
dictionary that maps entity names to their corresponding unique identifiers. The dereferencing process
is dataset-specific and implemented separately for Wikidata and DBpedia to account for diferences in
endpoint structure, query formulation, and identifier formats.</p>
          <p>Our evaluation does not include a dedicated assessment of the entity extraction step. Since this step
precedes shape construction and query generation, errors at this stage can propagate and afect overall
performance.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>4.3. Step 2: Shape Generation</title>
          <p>The shape generation step processes the entities extracted in the previous step and generates
corresponding SHACL or ShEx data shapes for each entity. The resulting data shapes are then written to a
designated output directory for subsequent use.</p>
          <p>The KG data shapes are derived from the extracted entities from step 1 if the target KG did not allow
to generate a complete data shape. This enables targeted extraction of structural constraints by guiding
the generation process based on pre-selected entities. A more detailed description of KG data shape
types and ShapeMaps can be found in section 3.3 and 3.4.</p>
          <p>Particular emphasis is placed on the use of the shexer library, a tool capable of generating SHACL or
ShEx data shapes based on a variety of configurable parameters [ 32]. After extensive testing, the optimal
configuration for this use case was found to involve initializing the Shaper object with a combined
ifxed ShapeMap [ 29] is constructed from the previously extracted entities.</p>
        </sec>
        <sec id="sec-4-2-4">
          <title>4.4. Step 3: SPARQL Generation</title>
          <p>The SPARQL generation step queries is responsible for creating a valid SPARQL query that retrieves
the information requested by the specified NLQ from the underlying KG.</p>
          <p>This construction process involves reading a predefined system prompt template (see listing 2) and
replacing its placeholders with context-specific parameters, including the natural language question,
the associated data shape, and optional retry parameters. Once the prompt is complete, it is sent to the
LLM API, whose endpoint is specified via environment variables.</p>
          <p>To ensure the accuracy of the generated queries, a retry mechanism is implemented. A query is
considered incorrect if the processed result contains an explicit error message, for example due to an
endpoint communication error or SPARQL syntax problems. If the above condition is met, the query is
marked as faulty and a retry is triggered. Each retry includes the previously generated faulty query in
the new prompt payload, allowing the LLM to revise its output based on the observed error pattern.</p>
          <p>Furthermore, the temperature LLM is increased by 0.1 for each consecutive retry, starting from 0 on
the first attempt. This implementation was chosen to give the LLM more flexibility in finding a creative
solution to the given problem. If none of the error conditions apply, the query is considered valid and
the generation step is completed.</p>
          <p>The system prompt (Listing 2) includes both a grounding constraint to use only the properties defined
in the shape and a fallback instruction to guess missing properties when needed. This was intended to
improve answer recall in cases where the extracted shape was sparse or incomplete.</p>
        </sec>
        <sec id="sec-4-2-5">
          <title>4.5. Step 4: Verifier</title>
          <p>The main purpose of this step is to calculate the F1-Score, which serves as the primary evaluation
metric for measuring the semantic correctness of LLM-generated SPARQL queries. To achieve this, the
program first executes the golden reference queries from the dataset against the corresponding public
or local KG endpoint to retrieve the expected set of answer entities. This result set forms the ground
truth for comparison.</p>
          <p>Subsequently, the generated SPARQL query produced by the pipeline is executed against the same
endpoint under identical conditions. The returned entity set is compared to the gold set using set-based
operations, disregarding the order of results. The outcome of this comparison is categorized into True
Positives, False Positives, and False Negatives on a per-question basis.</p>
          <p>These values are used to compute precision, recall, and ultimately the F1-Score, ofering a balanced
measure that captures both correctness and completeness of query results. In addition, the program
logs detailed mismatch information, retry counts, tokens used and other secondary metrics, which
facilitates error analysis and robustness evaluation of the query generation pipeline. This comparison
method aligns with established practices in KGQA benchmarking, ensuring that results are reproducible
and interpretable across datasets and models.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Results</title>
      <p>A total of 36 diferent experiments were conducted, 24 of which used shape-informed configurations and
12 of which were baseline experiments without data shape specifications. Each of the 36 experiments
encompasses 50 diferent questions from the corresponding dataset minus the disregarded invalid
questions (see section 3.2). Each experiment was defined by a unique combination of variables, including
the choice of LLM, the dataset and the data shape type.</p>
      <p>The experiments were executed sequentially, and each run was monitored manually to detect errors,
anomalies, or unexpected behavior. All outputs were logged in detail, including the number of LLM
retries, token usage, and result metrics.</p>
      <sec id="sec-5-1">
        <title>5.1. Data Shape Type Comparison</title>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Impact of ShEx Shape-Informed Prompting Versus Baseline</title>
        <p>The left diagram in figure 3 shows the F1-Score for a data shape configuration set to ShEx compared
to non shape-informed configuration. It is notable that shape-informed experiments outperform non
shape-informed experiments with a relative increase of 83.33% in median, with the IQR of 0.17 compared
to 0.33 being exceptionally low in these experiments.</p>
        <p>The analysis of shape-informed SPARQL query generation using ShEx constraints highlights clear
performance diferences across benchmarks and models (right).</p>
        <p>For the Corporate Graph (CK25), GPT achieved the highest median F1-score (0.37), followed by
Deepseek (0.28) and Llama (0.19). In the DBpedia benchmark, Deepseek performed best with a median
F1-score of 0.52, outperforming Llama (0.43) and GPT (0.31). This benchmark exposes meaningful
diferences in model behavior and serves as the most discriminative setting in the evaluation. On
Wikidata, Llama reached the highest median (0.47), followed by GPT-4o-mini (0.42) and Deepseek (0.30).</p>
        <p>In summary, Llama exhibited the strongest performance on Wikidata, while GPT showed leading
results in the CK25 benchmark, but with greater variation on DBpedia. Deepseek performs best on
DBpedia but comparatively lower on the other datasets. Among the three benchmarks, DBpedia proved
most efective in diferentiating model quality under ShEx constraints.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Per-Model and Per-Dataset Efects of Shape Integration</title>
        <p>Looking at the F1-Score in figure 4, broken down by the diferent categories LLMs and datasets, each
grouped by shape-informed configuration (ShEx) versus non shape-informed configurations. It can be
observed that shape-informed runs consistently achieve better results than their baseline variants.</p>
        <p>At the LLM level, it can be observed that Llama benefits particularly strongly from the integration of
data shape information: The median F1-Score increases from 0.13 in non shape-informed configurations
to 0.43, using data shape information. Similar efects can also be observed with GPT (from 0.25 to 0.37)
and Deepseek (from 0.32 to 0.4). At the same time, it is noticeable that the IQR is lower in all ShEx
configurations, which indicates more stable generation results.</p>
        <p>Visible improvements are also evident at the benchmark level. Particularly noteworthy is Wikidata,
whose median rises from 0.16 to 0.42. The efect is also positive for DBpedia, where the median increases
from 0.38 to 0.43, with a lower IQR variance at the same time. The proprietary CK25 benchmark shows
no output in non shape-informed configuration (F1-Score median = 0), while the integration of shapes
allows a median F1-Score of 0.28 to be achieved. This means that the LLMs were not able to generate
any positive results, when no data shape information was provided.</p>
        <p>Particularly noteworthy is the ability of LLMs to generate consistent and valid results using ShEx data
shapes, achieving performance levels that are comparable to those on public datasets. The deviation in
median F1-score amounts to only 0.15 points relative to DBpedia and 0.14 points relative to Wikidata,
indicating strong generalization despite the proprietary nature of the evaluated dataset.</p>
        <p>Overall, the results show that the use of data shapes via ShEx leads to more robust and significantly
better response generation in all models and datasets examined, especially in the proprietary dataset</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion and Limitations</title>
      <p>Comparing the two data shape languages (Figure 2), ShEx outperforms SHACL across most
configurations. The advantage is especially pronounced in the DBpedia and CK25 datasets, and when used in
combination with GPT and LLaMA models. However, the relative strength of ShEx must be interpreted
with caution, as its superiority may depend on dataset characteristics, annotation richness, or model
alignment with the data shape format and possible toolchain bias toward ShEx. Our shape extraction
and validation pipeline is based on the SheXer library and relies on ShapeMaps, both of which are rooted
in the ShEx ecosystem. This may introduce a systematic bias favoring ShEx over SHACL, particularly if
tool-specific optimizations or assumptions influence extraction quality.</p>
      <p>The experiments demonstrate that shape-informed prompting generally improves the performance
and stability of SPARQL query generation across all tested LLMs and datasets. As shown in Figure 3
(left), configurations enriched with declarative ShEx constraints exhibit a marked improvement over
non shape-informed baseline experiments, with an observed median F1-Score increase of 83.33% and
a corresponding reduction in IQR. This confirms the hypothesis that explicitly integrating structural
knowledge into prompt contexts can improve semantic grounding and guidance during query generation.
A potential extension of the baseline could involve incorporating simple schema information, such as a
list of commonly used predicates per entity. This would give the LLM some structural guidance while
remaining less complex than full-fledged ShEx or SHACL shapes.</p>
      <p>Figure 4 further highlights that shape-informed configurations not only achieve higher F1-Scores but
also lead to more consistent generation behavior. In all tested models, the IQR decreased under
shapeinformed conditions, indicating reduced variability and improved reliability. Notably, the proprietary
CK25 dataset yielded no valid SPARQL queries in baseline runs, but produced median F1-Scores up
to 0.37 when augmented with ShEx constraints. This underlines the role of data shape prompting in
enabling generalization to previously unseen or proprietary KGs.</p>
      <p>The experiments were limited to three datasets: two public (DBpedia and Wikidata) and one
proprietary (CK25). While this selection ofers a useful degree of diversity in terms of comparability
and domain specificity, the generalizability of the results to other domains and knowledge graphs
remains untested. In particular, the performance gains observed on CK25 may not necessarily extend to
other proprietary knowledge graphs. Despite eforts to evaluate the pipeline on additional proprietary
datasets, this remains a challenging task due to limited accessibility. By their very nature, proprietary
datasets are significantly harder to obtain and share than public benchmarks, making this a persistent
limitation, both for this work and for future research.</p>
      <p>Public datasets such as DBpedia and Wikidata may have been seen during LLM pretraining, potentially
inflating model performance through latent memorization. This confound is partially mitigated by the
inclusion of the CK25 benchmark, though further experiments with proprietary or synthetic graphs
would strengthen causal interpretation.</p>
      <p>Prompt construction was manually engineered and optimized iteratively. While efective in this
context, transferability to other tasks, LLMs, or graph structures is not guaranteed. The system prompt
induced fallback strategy, mentioned in 4.4 introduces a tension between precision and flexibility. While
it helps generate non-empty queries, it also allows the model to hallucinate properties not defined in
the shape. A stricter prompting strategy could ofer better control, which should be examined in future
studies.</p>
      <p>Although variance decreased in shape-informed runs, full reproducibility cannot be ensured due to
the inherent non-determinism of autoregressive LLMs (e.g., temperature settings, sampling strategies).</p>
      <p>Data shapes were generated using SheXer. Although this tool supports large-scale data shape
extraction through a streamlined implementation, it may yield more potential by fully utilizing its
configuration parameters. Such limitations can afect the completeness and quality of the data shape
context provided to the LLM.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This work contributes to the advancement of KGQA by demonstrating that integrating declarative
data shapes into prompt-based SPARQL query generation significantly improves both accuracy and
stability across multiple LLMs and knowledge graphs. Through a unified and reproducible experimental
framework, we evaluated the efect of ShEx and SHACL constraints on query generation performance
using F1-Score as the primary evaluation metric. Results indicate that shape-informed prompting
enables more reliable query generation, particularly for previously unseen or proprietary graphs such
as CK25.</p>
      <p>We can therefore conclude that the developed technology for generating data shapes enables the
generation of results on previously unseen datasets that are comparable in quality to those achieved on
publicly available benchmarks, despite the latter being likely included in the pretraining corpus of the
underlying language models.</p>
      <p>The artifact developed in this study supports modular data shape extraction, prompt enrichment,
and automated evaluation, making it suitable for systematic benchmarking across heterogeneous KGs
and model configurations. By enabling comparisons across data shape types, models, and datasets, it
provides a foundation for further exploration into structure-aware language modeling.</p>
      <p>Taken together, these directions point toward a more generalizable, robust, and semantically grounded
approach to KGQA using large language models.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this paper, generative AI systems and other digital tools were used as a
supporting capacity. Their use was limited to improving productivity, translating from and to English,
sentence polishing and rephrasing, and clarifying domain-specific concepts. The authors critically
reviewed and revised all AI-generated content before its integration. The authors take full responsibility
for the paper’s content. Specifically, the authors used the following AI systems and tools: ChatGPT
(GPT-4, ChatGPT Plus), GitHub Copilot, DeepL Translator.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>This work has been supported in part by the Deutsche Forschungsgemeinschaft (DFG, German Research
Foundation) – 459291153.
[13] X. Huang, J. Zhang, D. Li, P. Li, Knowledge graph embedding based question answering, in:
Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, ACM,
2019, pp. 105–113. doi:10.1145/3289600.3290956.
[14] J. Baek, A. Aji, A. Safari, Knowledge-augmented language model prompting for zero-shot
knowledge graph question answering, in: Proceedings of the First Workshop on Matching From
Unstructured and Structured Data (MATCHING 2023), Association for Computational Linguistics,
2023, pp. 70–98. doi:10.18653/v1/2023.matching-1.7.
[15] P. Eric, B. Iovka, Labra Gayo, Jose Emilio, Kellogg, Gregg, Shape Expressions Language 2.1, 2019.</p>
      <p>URL: http://shex.io/shex-semantics-20191008/.
[16] Knublauch, Holger, Kontokostas, Dimitris, Shapes Constraint Language (SHACL), 2017-07-20. URL:
https://www.w3.org/TR/2017/REC-shacl-20170720/.
[17] J. E. L. Gayo, E. Prud’hommeaux, I. Boneva, D. Kontokostas, Validating RDF Data, Synthesis
Lectures on Data, Semantics, and Knowledge, Springer International Publishing, 2018. doi:10.
1007/978-3-031-79478-0.
[18] A. G. Ryman, S. Speicher, A. Le Hors, OSLC Resource Shape: A language for defining constraints
on Linked Data 996 (2013) 96. URL: http://ceur-ws.org/Vol-996/.
[19] K. Pefers, T. Tuunanen, M. A. Rothenberger, S. Chatterjee, A Design Science Research Methodology
for Information Systems Research 24 (2007) 45–77. doi:10.2753/MIS0742-1222240302.
[20] A. Perevalov, D. Diefenbach, R. Usbeck, A. Both, QALD-9-plus: A Multilingual Dataset for
Question Answering over DBpedia and Wikidata Translated by Native Speakers (2022) 229–234.
doi:10.48550/arXiv.2202.00120. arXiv:2202.00120.
[21] AKSW, questions_db25.yaml – benchmark questions for Text2SPARQL, 2025. URL: https://github.</p>
      <p>com/AKSW/text2sparql.aksw.org/blob/develop/docs/benchmark/questions_db25.yaml.
[22] e. GmbH, CK25 corporate knowledge reference dataset for benchmarking text 2 SPARQL question
answering approaches, 2025. URL: https://github.com/eccenca/ck25-dataset.
[23] K. Rabbani, M. Lissandrini, K. Hose, SHACL and ShEx in the Wild: A Community Survey on
Validating Shapes Generation and Adoption, in: Companion Proceedings of the Web Conference
2022, ACM, 2022-04-25, pp. 260–263. doi:10.1145/3487553.3524253.
[24] K. Rabbani, M. Lissandrini, K. Hose, Extraction of Validating Shapes from Very Large Knowledge</p>
      <p>Graphs 16 (2023) 1023–1032. doi:10.14778/3579075.3579078.
[25] T. Bosch, K. Eckert, Towards description set profiles for RDF using SPARQL as intermediate
language, in: Proceedings of the 2014 international conference on dublin core and metadata
applications, DCMI’14, Dublin Core Metadata Initiative, 2014, pp. 129–137.
[26] H. Knublauch, J. A. Hendler, K. Idehen, SPIN - overview and motivation, 2011. URL: http://www.</p>
      <p>w3.org/submissions/2011/SUBM-spin-overview-20110222/.
[27] H. Knublauch, J. A. Hendler, K. Idehen, Sparql inferencing notation (spin) - overview and motivation,
2011. URL: https://www.w3.org/submissions/spin-overview/, w3C Member Submission.
[28] D. Tomaszuk, RDF Validation: A Brief Survey, in: Beyond Databases, Architectures and Structures.</p>
      <p>Towards Eficient Solutions for Data Analysis and Knowledge Representation, volume 716, Springer
International Publishing, 2017, pp. 344–355. doi:10.1007/978-3-319-58274-0_28.
[29] Eric Prud’hommeaux, Thomas Baker, ShapeMap Structure and Language, 2017-07-13. URL: http:
//shex.io/shape-map-20170713.
[30] D. Fernández-Álvarez, Shexer: Shape extractor for RDF graphs, 2025. URL: https://github.com/
weso/shexer.
[31] R. Usbeck, M. Röder, M. Hofmann, F. Conrads, J. Huthmann, A.-C. Ngonga-Ngomo, C. Demmler,</p>
      <p>C. Unger, Benchmarking question answering systems 10 (2019) 293–304. doi:10.3233/SW-180312.
[32] D. Fernandez-Álvarez, J. E. Labra-Gayo, D. Gayo-Avello, Automatic extraction of shapes using
sheXer 238 (2022) 107975. doi:10.1016/j.knosys.2021.107975.
[33] Stefan Kesselheim, A. Herten, K. Krajsek, J. Ebert, J. Jitsev, M. Cherti, M. Langguth, B. Gong,
S. Stadtler, A. Mozafari, G. Cavallaro, R. Sedona, A. Schug, A. Strube, R. Kamath, M. G. Schultz,
M. Riedel, T. Lippert, JUWELS Booster – A Supercomputer for Large-Scale AI Research, 2021. URL:
http://arxiv.org/abs/2108.11976. arXiv:2108.11976.</p>
    </sec>
    <sec id="sec-10">
      <title>A. Extended Results</title>
      <p>mean</p>
      <p>mean</p>
      <p>std</p>
      <p>std
min
min
max
max</p>
      <p>iqr
mean</p>
      <p>std
min
max</p>
      <p>iqr
median</p>
      <p>mean
benchmark shape_type_label count</p>
      <p>median
CK25
CK25
DBpedia
DBpedia
Wikidata
Wikidata</p>
      <p>Shape = None
Shape = ShEx
Shape = None
Shape = ShEx
Shape = None</p>
      <p>Shape = ShEx</p>
      <p>median</p>
      <p>std
min
max
mean</p>
      <p>std
min
max</p>
      <p>iqr
mean</p>
      <p>std
min
max</p>
      <p>iqr</p>
      <p>B. System Prompts
1 You are an expert in named entity recognition for knowledge graphs, specializing in {ont}. The
goal is, to use these extracted entities as a step in order to generate {ont} shema
information based on the extracted entities.
1 You are an expert in structured query languages, specializing in SPARQL. Your only purpose is to
generate valid SPARQL queries to query a {ont} graph.
2
3 ### Task
4 Generate a valid SPARQL query that answers the user's question, strictly on the provided {
shp_typ} shape constraints.
5
6 ### Rules
7 - Write your response **in a single line** without any line breaks or additional formatting.
8 - **Use** the properties and classes defined in the provided {shp_typ} shape.
9 - If the properties or entities are not found in the {shp_typ} shape, try to best guess them.
10 - Return **only raw SPARQL code**. No explanations, no comments, no natural language.
11 - Ensure the query is **syntactically correct** according to SPARQL standards.
12 - If a previous attempt failed, **reformulate creatively** to find a working alternative.
13 - **Assume** external knowledge beyond what is available through the shape if necessary.
14
15 ### Important
16 - this {shp_typ} shape is derived from a {ont} graph, the generated query must adhere to the {
ont} specifics
17 - Focus purely on constructing the query.
18 - Do not add headings, bullet points, or any other text output except the query itself.
19 - Maintain strict compliance with SPARQL syntax.
20
21 ### Input Format
22 Question: "{nlq}"
23
24 {shp_typ} Shape:
25 {shp_dat}</p>
      <p>Listing 2: System prompt for shape-informed SPARQL query generation. {ont} is replaced by the
ontology type, {shp_typ} by the shape language, and {shp_dat} by the concrete shape.</p>
      <p>C. Data Shape Example
Listing 3: Example ShEx shape with annotation. Note: Some information of the shape was omitted for
brevity. Each predicate is annotated with either a cardinality constraint (e.g., {2}) or a label
(e.g., --&gt; GND ID), guiding Large Language Models in understanding the expected structure
and semantics of the data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Harman</surname>
          </string-name>
          ,
          <article-title>Overview of the ninth text retrieval conference (trec-9) question answering track</article-title>
          ,
          <source>in: Proceedings of the Ninth Text Retrieval Conference (TREC-9)</source>
          ,
          <source>National Institute of Standards and Technology (NIST)</source>
          ,
          <year>2000</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . URL: https://trec.nist.gov/pubs/trec9/ papers/qa_overview.pdf, editors: Ellen M.
          <article-title>Voorhees</article-title>
          and
          <string-name>
            <given-names>Donna</given-names>
            <surname>Harman</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bollacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paritosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sturge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Freebase:
          <article-title>A collaboratively created graph database for structuring human knowledge</article-title>
          ,
          <source>in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery (ACM)</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          . URL: https://doi.org/10.1145/1376616.1376746. doi:
          <volume>10</volume>
          .1145/1376616.1376746, presented at SIGMOD/PODS '08.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: A free collaborative Knowledgebase</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . URL: https://doi.org/10.1145/2629489. doi:
          <volume>10</volume>
          .1145/2629489.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Kleef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , C. Bizer, DBpedia
          <article-title>- a large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web 6 (</article-title>
          <year>2015</year>
          )
          <fpage>167</fpage>
          -
          <lpage>195</lpage>
          . URL: https://doi.org/10.3233/SW-140134. doi:
          <volume>10</volume>
          . 3233/SW-140134.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <article-title>Large-scale simple question answering with memory networks, arXiv preprint (</article-title>
          <year>2015</year>
          ). URL: https://arxiv.org/abs/1506.
          <year>02075</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv. 1506.
          <year>02075</year>
          . arXiv:
          <fpage>1506</fpage>
          .
          <year>02075</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux, SPARQL Query Language for</article-title>
          RDF,
          <year>2008</year>
          . URL: http://www.w3.org/TR/2008/ REC-rdf
          <string-name>
            <surname>-</surname>
          </string-name>
          sparql-query-
          <volume>20080115</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pintscher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Peñuela</surname>
          </string-name>
          , E. Simperl,
          <article-title>Schema generation for large knowledge graphs using large language models</article-title>
          ,
          <year>2025</year>
          . URL: http://arxiv.org/abs/2506.04512. arXiv:
          <volume>2506</volume>
          .
          <fpage>04512</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovriguina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Teucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radyush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mouromtsev</surname>
          </string-name>
          , SPARQLGEN:
          <article-title>One-shot prompt-based approach for sparql query generation</article-title>
          ,
          <source>in: Proceedings of SEMANTICS 2023 EU: 19th International Conference on Semantic Systems, September 20-22</source>
          ,
          <year>2023</year>
          , Leipzig, Germany,
          <year>2023</year>
          . URL: https: //github.com/danrd/sparqlgen?tab=
          <article-title>readme-ov-file.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive NLP tasks</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Proceedings of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2020</year>
          . URL: https: //proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kerbusch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pruim</surname>
          </string-name>
          , T. Käfer,
          <article-title>Evaluating the performance of RAG methods for conversational AI in the airport domain</article-title>
          , in: W. Chen,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kachuee</surname>
          </string-name>
          , X.-Y. Fu (Eds.),
          <source>Proceedings of the</source>
          <year>2025</year>
          <article-title>Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          (Volume
          <volume>3</volume>
          :
          <string-name>
            <surname>Industry</surname>
            <given-names>Track)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Albuquerque, New Mexico,
          <year>2025</year>
          , pp.
          <fpage>794</fpage>
          -
          <lpage>808</lpage>
          . URL: https://aclanthology.org/
          <year>2025</year>
          .naacl-industry.
          <volume>61</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2025</year>
          .naacl-industry.
          <volume>61</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Aggour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Detor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gabaldon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mulwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cuddihy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>Compound Knowledge</surname>
          </string-name>
          Graph-Enabled
          <source>AI Assistant for Accelerated Materials Discovery</source>
          <volume>11</volume>
          (
          <year>2022</year>
          )
          <fpage>467</fpage>
          -
          <lpage>478</lpage>
          . doi:
          <volume>10</volume>
          .1007/s40192-022-00286-z.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Emonet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bolleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Duvaud</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. M. de Farias</surname>
          </string-name>
          , A. C.
          <article-title>Sima, LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs</article-title>
          ,
          <year>2024</year>
          . URL: http://arxiv. org/abs/2410.06062. arXiv:
          <volume>2410</volume>
          .
          <fpage>06062</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>