<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>C. Stadler);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Approach using ReAct and Knowledge Graph Exploration Utilities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Felix Brei</string-name>
          <email>brei@infai.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenz Bühmann</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Frey</string-name>
          <email>frey@informatik.uni-leipzig.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Gerber</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lars-Peter Meyer</string-name>
          <email>LPMeyer@infai.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claus Stadler</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kirill Bulert</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SPARQL</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ReAct</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SPARQL</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chemnitz Technical University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>First International TEXT2SPARQL Challenge, Co-Located with Text2KG at ESWC25</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Applied Informatics at Leipzig University</institution>
          ,
          <addr-line>Goerdelerring 9, 04109 Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Leipzig University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Interacting with knowledge graphs can be a daunting task for people without a background in computer science since the query language that is used (SPARQL) has a high barrier of entry. Large language models (LLMs) can lower that barrier by providing support in the form of Text2SPARQL translation. In this paper we introduce a generalized method based on SPINACH, an LLM backed agent that translates natural language questions to SPARQL queries not in a single shot, but as an iterative process of exploration and execution. We describe the overall architecture and reasoning behind our design decisions, and also conduct a thorough analysis of the agent behavior to gain insights into future areas for targeted improvements. This work was motivated by the Text2SPARQL challenge, a challenge that was held to facilitate improvements in the Text2SPARQL domain.</p>
      </abstract>
      <kwd-group>
        <kwd>Utilities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Knowledge graphs are a modern approach at storing and linking information, that have made their
way into several large projects like Wikidata, DBpedia or the Open Research knowledge graph. Their
inherent structure enables the storage of not only the information entities themselves, but also the
meaningful relationships between them. Information from such a graph can be retrieved in various ways,
including the use of structured query languages like SPARQL, Cypher, or GQL, as well as alternative
methods such as faceted browsing. While SPARQL is a powerful tool widely used in the semantic web
community, it can be challenging for those without training in these technologies. The last couple of
years have shown that large language models (LLMs) can be helpful in translating the intent of a user
into a matching SPARQL query, but their precision is still very limited. To facilitate further research in
this area, a contest was created by eccenca GmbH where any individual or group could submit a URL
that would return a SPARQL query for a given question (and dataset that the question relates to). In
this paper, we describe the details of our submission to this contest.</p>
      <sec id="sec-1-1">
        <title>1.1. The TEXT2SPARQL Challenge</title>
        <p>The first TEXT2SPARQL Challenge1 was designed to evaluate systems capable of translating natural
language questions into SPARQL queries across multiple datasets and languages. Participants were
required to deploy their solutions as publicly accessible RESTful web services. Each registered system
had to expose a uniform API interface accepting two GET parameters: question (the input in natural
∗Corresponding author.
†These authors contributed equally. Authors in alphabetical order</p>
        <p>CEUR
Workshop</p>
        <p>ISSN1613-0073
language) and dataset (a URL identifying the target knowledge graph). The service was expected to
return a valid SPARQL query string in response, within a timeout limit of ten minutes.</p>
        <p>The evaluation infrastructure orchestrated 250 queries per endpoint during a 5-day testing phase.
Two datasets2 were employed to test diferent aspects of system performance:
• DBpedia (DB25) An English and Spanish subset of DBpedia 2015-10 - a large-scale, multilingual
knowledge graph derived from Wikipedia - encompassing 200 selected questions equally split
between English and Spanish. The challenge organizers state that they manually curated and
validated question-query pairs, with refinements such as GROUP BY and ORDER BY clauses to
ensure structural diversity.
• Corporate Knowledge Graph (CK25): A domain-specific KG built from scratch by the challenge
organizers. It contained 50 English questions reported to be created manually in consultation
with corporate stakeholders to simulate realistic enterprise queries.</p>
        <p>The challenge thereby tested scalability and multilingual robustness on open-domain data (DBpedia)
while assessing domain adaptation and precision on specialized data (Corporate).</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Contribution</title>
        <p>
          A recent approach to tackle Text2SPARQL is the use of large language models as agents that can traverse
the knowledge graph, retrieve information from it, and create SPARQL queries. One such agent is
SPINACH, released by Stanford [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. While SPINACH is an impressive proof of concept implementation
targeted at Wikidata, it cannot be easily configured to support other knowledge graphs.
        </p>
        <p>The major contribution of our work consists of the generalization of SPINACH to RDF graphs and
adapting and deploying it for the multilingual and multi-KG setting of the TEXT2SPARQL Challenge.
To this end, we extended SPINACHs codebase and prompting such that it would use our own knowledge
graph exploration utilities instead of Wikidata API endpoints. These utilities were realized w.r.t. RDF
and OWL standard vocabularies, such that they can be adapted to work with other RDF knowledge
graphs and languages as well. In addition, we conducted a qualitative and quantitative analysis of our
approach in the TEXT2SPARQL challenge, combining the evaluation logs of the challenge organizers
with ARUQULAs action, observation, and reasoning logs.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Although the TEXT2SPARQL Challenge represents the first edition under this specific name, there
exists earlier work addressing the problem of translating natural language questions into SPARQL
queries. This task is related to the field Knowledge Graph or Knowledge Base Question Answering
(KGQA &amp; KBQA) [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ] and corresponding benchmarks &amp; leaderboards. In the scope of this paper, we
focus on LLM-powered SPARQL-based KGQA systems &amp; Text2SPARQL approaches, as well as relevant
benchmarks &amp; leaderboards.
      </p>
      <p>
        Since the rise of LLMs several colleagues evaluated the capabilities around KGQA and SPARQL.
Lehmann et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] propose the usage of a controlled natural language as an intermediate step in the
Text2SPARQL translation for KGQA. Meyer et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] manually evaluated several KG related capabilities
of ChatGPT 3.5 and ChatGPT 4.0, including Text2SPARQL. The LLMs were generating syntactically
correct SPARQL queries with semantic problems on bigger KGs (Mondial KG). Kovriguina et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
applied SPARQLGEN to evaluate a Text2SPARQL approach on bigger KGs (Bestiary, Wikidata and
DBpedia) with low F1 score. Tafa and Usbeck [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] present a KGQA system that finds similar questions in
a dataset and uses the corresponding SPARQL queries for a few shot prompt Text2SPARQL translation
on ORKG. This results in a high F1 score, but was tested only for the SciQA benchmark. The potential
of small language models is evaluated by Brei et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
2https://web.archive.org/web/20250626113401/https://text2sparql.aksw.org/assets/talks/1-sebastian-tramp-introduction.pdf
      </p>
      <p>
        Text2SPARQL capabilities are evaluated as well in the LLM-KG-Bench [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11, 12</xref>
        ]. Here we were
able to add several SPARQL SELECT query related tasks and assess the capabilities of more than 40
LLMs [13, 14, 12, 15, 16]. Even for the best state-of-the-art models the F1 scores vary on diferent
datasets.
      </p>
      <p>
        For Wikidata Liu et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] present SPINACH. They apply a ReAct [17, 18] approach on the Wikidata
KG for a new benchmark dataset. They reached a quite good performance and the approach looks
promising for us.
      </p>
      <p>
        There exist several eforts for benchmarking KGQA systems. The KGQA Leaderboard [ 19]3 collects
results for common datasets like LC-QuAD [20, 21], QALD [22]. Other notable datasets are SciQA [23],
Beastiary [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or SPINACH [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. New datasets for a KG can get generated as well [
        <xref ref-type="bibr" rid="ref12">24</xref>
        ]. The GERBIL [
        <xref ref-type="bibr" rid="ref13">25</xref>
        ]
benchmarking platform helps evaluating KGQA systems on datasets.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. ARUQULA Approach, Architecture, and Implementation</title>
      <p>
        As the results of SPINACH [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and the ReAct approach looked quite promising to us, we decided to
generalize the existing code to the RDF graphs given in the Text2SPARQL challenge. The system is
build around a ReAct [17, 18] configuration implemented with LangGraph 4, a SPARQL endpoint setup
realized with RPT5 &amp; Qlever [
        <xref ref-type="bibr" rid="ref14">26</xref>
        ]6, a hybrid search realized with the vector database Qdrant7, a text
search with Lucene, and an API endpoint for the Text2SPARQL challenge.
      </p>
      <sec id="sec-3-1">
        <title>3.1. ReAct with KG Exploration Utilities</title>
        <p>The ReAct (reason and act) approach [17] is built around a graph of actions the LLM can navigate
through. ReAct proposes a template consisting of groups of thoughts, actions and observations which
make up the prompt. The idea here is to not have the LLM try to solve a given task in a single attempt,
but instead allow it to split the task into smaller sub-tasks as it sees fit and give it tools to interact with
the task as well as a history of all preceding actions and resulting observations. The available actions
and how they interact with the controller (aka. the action graph) are shown in Figure 1.</p>
        <p>In the initialization, the language of the query is detected to switch between the English or Spanish
DBpedia. At the “controller” action, the LLM can choose to stop and report the final answer or invoke
one out of six knowledge graph exploration utilities:
search: search relevant entities. The search_entity action ofers a lookup for instance data,
search_property searches for properties and search_class searches for classes. The
search_entity action is implemented as full-text search while the search_property and
search_class actions are implemented with a hybrid vector Search.
inspect: get more details on knowledge graph entities. Either with an excerpt on a given entry
with get_knowledgegraph_entry or some usage examples with get_property_examples. The
action get_knowledgegraph_entry is implemented with a SPARQL query searching for outgoing
edges. The action get_property_examples is implemented with a SPARQL query for 5 usage
examples for the given property.
execute: use execute_sparql to test a SPARQL query on the KG. The action executes the given</p>
        <p>SPARQL query on the KG and returns the result.</p>
        <p>
          This actions are described as well in the controller prompt we adopted from SPINACH [
          <xref ref-type="bibr" rid="ref15">27</xref>
          ] as can be
seen in Listing 1. The ’controller’ step follows after each search/inspect/execute step to select the next
action. This process can be repeated up to 15 iterations which can be changed in the code.
3KGQA Leaderboard: https://github.com/kgqa/leaderboard
4web page: https://www.langchain.com/langgraph
5Repository: https://github.com/Scaseco/RdfProcessingToolkit/tree/master
6Repository: https://github.com/ad-freiburg/qlever
7web page: https://qdrant.tech/qdrant-vector-database/
        </p>
        <p>Listing 1: Action description from the controller prompt
− g e t _ k n o w l e d g e g r a p h _ e n t r y ( e n t i t y URI ) : Retrieves all outgoing edges (linked
entities, properties) of a specified knowledge graph entity using its full URI. Example:
‘http://dbpedia.org/resource/Sufism‘.
− s e a r c h _ e n t i t y _ b y _ l a b e l ( s t r i n g ) : Searches the {} knowledge graph for
individual real-world entities like companies, people, locations, or things (e.g. “Apple”, “Sufism”,
“Barack Obama”).
− s e a r c h _ p r o p e r t y _ b y _ l a b e l ( s t r i n g ) : Searches the {} knowledge graph for
*properties* (also called predicates or relationships) like “price”, “hasLocation”, or “producedBy”.</p>
        <p>Use this when you’re trying to find the right property to complete a triple.
− s e a r c h _ c l a s s _ b y _ l a b e l ( s t r i n g ) : Searches for *classes* (types/categories) in the
knowledge graph like “Company”, “Service”, “Book”, or “Organization”.
− g e t _ p r o p e r t y _ e x a m p l e s ( p r o p e r t y URI ) : Retrieves a few usage examples of the
specified property, given as a full URI.
− e x e c u t e _ s p a r q l ( SPARQL ) : Executes a SPARQL query on the {} knowledge graph.</p>
        <p>Use this when you’re confident in your query structure and ready to test a hypothesis.
− s t o p ( ) :</p>
        <p>Marks the most recent SPARQL query as your final answer and ends the process.</p>
        <p>The search/inspect/execute actions all take a single argument. This could be a string to search for, or
an entity to look up or a SPARQL query to execute on the SPARQL endpoint. In the Python code this
actions get translated into function calls with additional parameters like the name of the KG to use or
the language detected in the initialization.</p>
        <p>The LLM used needs to have tool support to interact with LangGraph and the tools. After some
internal evaluations of various LLMs including Llama, DeepSeek and various GPT variants we decided
to use GPT 4.1 mini as the LLM with the best cost-result-ratio for our case.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. A Dual-Strategy Approach for Semantic Grounding</title>
        <p>A critical challenge for any natural language to SPARQL system is semantic grounding: the process of
accurately mapping ambiguous or varied natural language phrases from a user’s question to the precise,
canonical IRIs of classes and properties in the knowledge graph’s schema. This process involves two
distinct sub-tasks: resolving conceptual terms (e.g., “population”, “who made this” ) to schema elements
(instances of type owl:Class, owl:ObjectProperty, owl:DatatypeProperty), and resolving proper
nouns (e.g., “Berlin”, “Google” ) to specific entity instances. Our agent employs a tailored, dual-strategy
approach, recognizing that these two tasks have diferent requirements for precision and semantic
nuance.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Strategy 1: Hybrid Vector Search for Schema Entities For grounding conceptual terms against</title>
        <p>the KG schema, where semantic ambiguity is high, we use the sophisticated hybrid search method,
natively supported by the Qdrant vector store8. This approach combines dense and lexical search to
provide a deep understanding of the user’s intent.</p>
        <p>1. Schema Indexing: We first create a searchable index in Qdrant of all schema entities (instances
8Qdrant hybrid search query docs: https://qdrant.tech/documentation/concepts/hybrid-queries/</p>
        <p>LLM</p>
        <sec id="sec-3-3-1">
          <title>LangGraph</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>Controller</title>
          <p>stop
reporter</p>
          <p>search_entity
search_property</p>
          <p>search_class
get_knowledgegraph_entry
get_property_examples
execute_sparql</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>Lucene</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>Vector DB</title>
        </sec>
        <sec id="sec-3-3-5">
          <title>SPARQL endpoint</title>
          <p>of type owl:Class, owl:ObjectProperty, owl:DatatypeProperty). For each entity, we
concatenate its rdfs:label and rdfs:comment into a single text document. This document is then
encoded into two distinct vector representations:
• Dense Vector: A transformer model (BGE Large English9) generates a dense embedding that
captures the semantic meaning of the entity. This allows for matching based on conceptual
similarity.
• Sparse Vector: A BM25 based model10generates a sparse, high-dimensional vector that excels
at keyword-centric matching, ensuring lexical precision for technical or domain-specific
terms.</p>
          <p>Both vectors are stored in a Qdrant collection, indexed by the entity’s IRI. In addition, we also
store the domain and range of properties (if available) in metadata fields of the collection.
2. Agentic Workflow for Schema Grounding: When the agent needs to resolve a term like
“population”, it generates a dense vector and performs a hybrid query using Reciprocal Rank Fusion (RRF).
This robustly identifies the correct schema element (e.g., dbo:populationTotal) by balancing
semantic relevance with keyword accuracy.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Strategy 2: Full-Text Search for Named Entity Resolution For grounding named entities, the</title>
        <p>challenge is less about conceptual ambiguity and more about eficiently matching strings against a
massive set of instances. For this task, a pragmatic and highly performant full-text search is more
appropriate.</p>
        <p>1. Instance Indexing: We use a standard Lucene index, a mature and powerful full-text search library.</p>
        <p>All entity instances from the knowledge graph are indexed. The indexed document for each
entity includes its name (rdfs:label) and description (rdfs:comment) (if available).
2. Agentic Workflow for Named Entity Resolution: When the agent’s LLM called in the controller
node extracts a proper noun like “Berlin,” it does not use the vector store. Instead, it queries the
Lucene index. This provides a fast, scalable, and lexically precise method for resolving “Berlin” to
its canonical IRI, dbr:Berlin.
9Qdrant BGE large-en model: https://huggingface.co/Qdrant/bge-large-en-v1.5-onnx
10Qdrant BM25 sparse embedding model: https://huggingface.co/Qdrant/bm25</p>
        <p>By employing this dual strategy, our agent efectively uses the right tool for the right job. It leverages
the semantic depth of hybrid vector search for the nuanced task of schema mapping, while relying
on the speed and lexical precision of Lucene for the high-volume task of named entity resolution.
This division is demonstrated when parsing “What is the population of Berlin?” : the agent uses hybrid
search to ground “population” to dbo:populationTotal and Lucene to ground “Berlin” to dbr:Berlin,
obtaining both components needed to construct the final query.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.3. KG Setup &amp; Querying</title>
        <p>
          In order to realize the inspection and execution functions, we loaded the CK25 and DB25 KGs into a
dedicated SPARQL endpoint each (without making use of named graphs). In order to make ARUQULA’s
RDF data management not only conform to the FAIR principles11 [
          <xref ref-type="bibr" rid="ref16">28</xref>
          ] but also reproducible using a
conventional build system, we employed our Maven-based data publishing workflow described in [
          <xref ref-type="bibr" rid="ref17">29</xref>
          ].
The key aspects that make the Maven ecosystem interact well with Semantic Web technology are
summarized as follows:
• Maven artifacts are addressed using Maven coordinates which can be represented as URNs of
patterns &lt;urn:mvn:{artifact}&gt;. A Maven coordinate is resolved using a well-defined conversion
that derives a relative URL and prepends it with a repository’s base URL. Multiple repositories can
be configured for artifact lookups. Changing a repository URL does not necessitate any change
in the coordinates of the artifacts.
• Maven builds are extensible using plugins and there already exist many for common tasks, such
as for signing artifacts and validating checksums. We created plugins that build and package RDF
databases from a set of RDF data dependencies.
• Maven has native support for deployment to local folders12. In combination with a generic file
server, this can be leveraged as a lightweight approach to (self-)publishing artifacts that does not
require maintenance of a dedicated repository system.
        </p>
        <p>
          For Text2SPARQL, the original dataset downloads were available from the website13. We re-published
the individual files as Maven artifacts 14, which makes it possible to use them as dependencies in a
Maven-aware15 build process. Initially, we used the tdb2-maven-plugin16 to load the data into an
instance of Apache Jena’s TDB2 database. While TDB2’s performance is suficient for small datasets,
it becomes a bottleneck for larger ones. For this reason, we created the qlever-maven-plugin17,
which features building QLever18 databases (via Docker). QLever is presently among the fastest RDF
engines and supports the processing of billions of triples on conventional hardware. Our Maven plugin
abstraction makes it easy to build a database for either system with a “push of a button”: Regardless
whether one uses TDB2 or QLever, an invocation of mvn package creates a pre-built database archive,
whereas mvn deploy deploys the archive as well as the pom.xml file to the configured repository. For
example, the QLever database is available as yet another Maven artifact19. Note, that the deployed
pom.xml file serves as a historic snapshot that can be downloaded and used to rebuild the database
locally at any point in time (provided that the Maven and Docker ecosystems still exist). By default, our
plugins place all triples of an RDF dependency into the graph with the artifact’s URN. Consequently,
the provenance of data in an RDF store can be tracked using the graph name. However, in practice,
it is often desirable to merge multiple datasets into a single graph. For this purpose, our plugins also
support the specification of the target graph. The pre-created databases are used in two ways: The
11FAIR = findable, accessible, interoperable and reusable
12mvn deploy -D'altDeploymentRepository=snapshot-repo::default::file:./repo-folder'
13https://text2sparql.aksw.org/challenge/#corporate-knowledge-small-knowledge-graph
14https://maven.aksw.org/archiva/#artifact-details-download-content~internal/org.aksw.data.text2sparql.2025/dbpedia/1.0.0
15There are several build tools that can interact with Maven repositories, such as Gradle, SBT, or bld.
16https://github.com/Scaseco/tdb2-maven-plugin
17https://github.com/Scaseco/qlever-maven-plugin
18https://github.com/ad-freiburg/qlever
19https://maven.aksw.org/archiva/#artifact-details-download-content~internal/qlever.org.aksw.data.text2sparql.2025/dbpedia/1.0.0
self-contained ARUQULA docker image is built by downloading the QLever database archives directly
from our Maven repository. Endpoints are also hosted under our public Apache Jena Fuseki setup20.
The integration of QLever into Fuseki is part of our JenaX project21. Our Maven plugins have been
published to Maven Central and are thus globally available for use in builds. Other relevant approaches
that combine Maven and Semantic Web are OntoMaven [
          <xref ref-type="bibr" rid="ref18">30</xref>
          ], which provides plugins for ontology
development and management, and the DataBus project [
          <xref ref-type="bibr" rid="ref19">31</xref>
          ], which features a large data catalogue that
reuses Maven concepts.
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>3.4. Challenge API</title>
        <p>As stated in subsection 1.1, all challengers had to provide a uniform API to expose their
service to the judges of the challenge. This API was specified via an OpenAPI conforming
JSON document found at https://text2sparql.aksw.org/openapi.json and defines a single route
/text2sparql which accepts two parameters via HTTP-GET: The name of the dataset that
should be queried (in this case limited to https://text2sparql.aksw.org/2025/DBpedia/ and
https://text2sparql.aksw.org/2025/corporate/) along with one URL-encoded question.
Assuming a base URL of http://example.com, a valid request might look like this:
http://example.com/text2sparql \
?dataset=https%3A%2F%2Ftext2sparql.aksw.org%2F2025%2FDBpedia%2F \
&amp;question=Who%20designed%20the%20Python%20programming%20language%3F
This API was implemented in Python with the Flask framework22.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>For each request that was sent to our agent, we logged every prompt, response, action taken, results
and total runtime. Since the challenge consisted of sending 250 such requests, we ended up with a
wealth of information.</p>
      <p>The first statistic we calculated from data extracted from the agent logs was the average runtime
needed to generate a SPARQL query along with the average number of steps that the agent needed
to reach this goal. The results can be seen in table 4. Given that the timeout defined by the challenge
holders was ten minutes, we can see that our agent clearly stayed well below this limit the whole time.</p>
      <p>From the data we extracted, we can run some statistical analysis. We limit ourselves here to questions
from DBpedia-EN and Corporate.</p>
      <p>Running a t-test on the number of agent steps between these two datasets reveals indeed a significant
diference. The calculation results in a  -value of −2.734 and a  = 0.00744 which supports the hypothesis
that the agent performs notably diferent between these two datasets.</p>
      <p>However, repeating this calculation for the runtimes gives us a  = −1.770 and  = 0.079 which hints
in the direction that DBpedia-en was quicker to be processed than corporate, but the evidence is not
strong enough to reach that conclusion.</p>
      <p>Given that bandwidth and transfer speed between the agent and the SPARQL endpoint is a contributing
factor, as well as the processing power of that endpoint to process a SPARQL query, we must assume
that there is a lot of noise in the duration data. Going with the findings for the number of agent steps
though, it seems promising to explore this direction further.</p>
      <p>The second avenue of analysis we want to explore is the behavior of the agent itself. Data scientists
follow a certain methodology when searching for new findings in large amounts of data, which consists
of drawing samples from the data, analysing their properties, trying to automate this process and finally
rolling out this process to a larger portion or even all the data at hand. The functions that were provided
20https://copper.coypu.org/#/dataset/text2sparql-2025-dbpedia-qlever/info
21https://github.com/Scaseco/jenax
22web page: https://flask.palletsprojects.com/en/stable/
to the LLM were created with that process in mind, giving the LLM the opportunity to approach the
process of answer extraction the same way a human would. But so far, there is no data that conclusively
shows that an LLM would indeed follow this process.</p>
      <p>Observing the LLM agent trying to answer roughly 250 questions and noting at each step which
action was taken how often, we arrive at figure 4. As first step, the agent mostly considered searching
around entities from the graph (search_entity and search_class). Step two is mainly concerned with
exploring the surroundings of entities by utilizing search_property and get_knowledgegraph_entry.
In step three and onward, we observed an increasing rate of execute_sparql, showing that the agent
is ready to send SPARQL queries to the endpoint and expect actionable results.</p>
      <p>Figure 4 shows the cumulative amount of actions taken up until a certain step index. Again, we can
see that during the first steps there is a steep increase in the total amount of subject related exploration
which starts to stagnate a bit after step seven, while the total number of executed SPARQL queries
grows at an almost steady pace once the agent is at step three. The most notable takeaway here is
that the stop action was called about 200 times by the agent on purpose, meaning that for roughly
50 questions, the agent failed to generate a conclusive SPARQL query and had to scrape one from the
conversation trace as a last hail mary.</p>
      <p>Lastly, we want to investigate further which action precedes which. To this end, we start by looking
at each stop action and count the actions that came before. We find that all 203 stop actions were
preceded by execute_sparql which shows that the agent always verifies its final SPARQL query
before issuing a stop. Looking in the other direction, the pie chart in section 4 shows the ratio of each
action following an execute_sparql step. In one out of four cases, the agent deems the query results
satisfying enough to continue with a stop action. However, 60% of all SPARQL executions are followed
by another execution immediately. The reasons for this encompass a swath of diferent things, mainly
empty result sets, syntactic errors and even timeouts from the SPARQL endpoint. A deeper analysis of
why two or more execute_sparql actions are chosen consecutively is one goal of our future research.
And lastly, we find that in one out of seven cases the agent refrains from executing SPARQL again and
instead returns to searching and inspecting entities and properties of the knowledge graph directly.
The ratios between the agent doing this on its own accord versus the controller forcing the agent to
backtrack because it executed the same step twice, is yet another interesting analysis that shall be done
in the future.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>When comparing our system’s query results to the expected results for the DB25 dataset (using the
evaluation artifact produced by the challenge organizers), we discovered several pitfalls that need to be
considered when interpreting the challenge results and the reported systems‘ performances23.</p>
      <p>In case of an automatic evaluation of Text2SPARQL performance via SPARQL result set analysis, the
23https://web.archive.org/web/20250627131218/https://text2sparql.aksw.org/assets/talks/8-edgard-marx-result-presentation.</p>
      <p>pdf
query and its result can be viable but do not match with the results of the gold query. As a result, a low
score will be assigned, albeit a good and meaningful query (w.r.t. the question) has been proposed. On
the other hand, a query can achieve a good score w.r.t. its results but in fact it has been constructed in
a way that emulates the answer (e.g. using VALUES clauses and BIND statements in combination with
internal knowledge of the LLM) without making proper use of the KG. We could observe instances of
misjudged responses of our system for both categories.</p>
      <p>When it comes to the first category, the specification of the setup and expected SPARQL query
outcome is of significant importance. Natural language questions typically lack precise instructions
w.r.t. the nature of the SPARQL result projection. E.g. in case of the question “What are the 10 most
populated countries?” it is not clear whether the results should return either the entity labels or entity
IRIs, or both of them and whether the entities should be reported with the population number, or a
further column that specifies the rank. While we consider rank and population numbers as not required,
given the question, we would also not see providing it as invalid - given the setup instructions of the
challenge. Nevertheless, queries returning labels or population numbers will have a significantly lower
score in the challenge. In terms of the nature of DB25, that consists of DBpedia-EN and DBpedia-ES,
it is unclear what kind of entities should be queried and returned. Our system was configured in a
language-aware way, such that Spanish questions would issue a search for DBpedia-ES entities, but
we also saw that the LLM gave preference to Spanish entities during querying by applying Spanish
language filters on the entity labels accordingly (thus efectively selecting/querying for DBpedia-ES
entities). However, neither the gold queries nor the challenge setup and specification seem to address
this appropriately, leading to scores of 0 for viable queries that return DBpedia-ES entities instead of
DBpedia-EN.</p>
      <p>Another problem occurs in case the question can be semantically grounded in several ways for
the KG. Unfortunately, the DB25 KG provides in several instances multiple options to be queried to
answer a particular question. In case of the population question mentioned above, there are at least
4 diferent property candidates: dbo:population, dbo:populationTotal, dbp:population, and
dbpes:población. Unfortunately, all of them return diferent results. While our approach never used
dbo:population (which is only used for 7 entities), the Spanish property actually returns factually
more accurate results than using the DBpedia Ontology property (which seems to be used in the gold
query). When using the ontology property in combination with the dbo:Country class, the majority
of the top-10 results represent associations of countries like the Commonwealth, due to an issue in
DBpedia-EN. Our approach often accounted for such data quality or schema fuzziness issues (that are
inherent to the DBpedia extraction and mapping process) and refined the query in that instance such
it would e.g. filter out dbo:Organisations in the query. This poses a significant capability that even
outperforms the correctness of the gold query, yet leading to a lower scoring. We detected several
instances where ARUQULA accounted for this with systematic and meaningful constraints on the graph,
however, we also found instances where it tried to enforce the “truthfulness” of queries by filtering
out result rows with patterns like regex filtering on instance labels or IRIs or faking the outcome with
BIND statements and VALUE clauses. While we consider the enforcement of specific query outcomes an
undesirable form of overcompensation in the context of Text2SPARQL, the frequency of this behaviour
suggests that both the DB25 dataset and the evaluation setup require refinement to enable precise and
reliable automatic evaluation. Unfortunately, the question reported as running example is only one
instance from several problematic question-resultset pairs (some of them even contained false empty
result sets). Nevertheless, we saw a manual analysis of the behaviour and responses of our system w.r.t.
DB25 still as an interesting and insightful study.
search
inspect
execute
stop
1
3
5
11
13</p>
      <p>15
7 9</p>
      <p>Number of agent steps</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Work</title>
      <p>
        In this paper, we introduced ARUQULA, a Text2SPARQL agent built upon SPINACH and enhanced
with KG exploration utilities to support RDF-based knowledge graphs beyond Wikidata. Our successful
participation in the TEXT2SPARQL challenge demonstrated the potential of the approach, and our
evaluation provided insights into agent behaviour, performance characteristics, and limitations. Looking
ahead, there are several avenues to extend and refine our work. We consider improvements to latency
or increased responsiveness of the agent as a requirement to be used within interactive settings, e.g. a
chatbot system. An in-depth comparison of diferent LLM models in conjunction with a sophisticated
selection of test data could help to better understand trade-ofs between LLM performance and cost and
how the utilities can be enhanced to further improve their helpfulness in order to reduce the number of
steps/actions performed. Furthermore, it would be interesting to evaluate whether the approach can be
transferred to knowledge graphs from other domains and integrate it into evaluation frameworks like
LLM-KG-Bench [12] or GERBIL [
        <xref ref-type="bibr" rid="ref20">32</xref>
        ]. Thus, it would be beneficial to improve the automation of setup
and deployment so that it can be easily configured and deployed for various knowledge graphs. An
evaluation of the ontology grounding by comparing diferent embedding models, embedding strategies,
and search approaches, also in comparison to POTS [
        <xref ref-type="bibr" rid="ref21">33</xref>
        ] can be beneficial especially for large knowledge
graphs that are not available in the training data of LLMs.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially supported by grants from the German Federal Ministry of Education and
Research (BMBF) to the projects ScaleTrust (16DTM312D) and KupferDigital2 (13XP5230L), as well as
from the German Federal Ministry for Economic Afairs and Climate Action (BMWK) to the KISS project
(01MK22001A), as well as from the German Federal Ministry of Transport (BMV) to the MobyDex
project (19F2266A).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT to: Grammar and spelling check,
paraphrase, and reword to improve the writing style. After using these tools/services, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.
in: M. Alam, M. Cochez (Eds.), Proceedings of the Workshop on Deep Learning for Knowledge
Graphs (DL4KG 2023) co-located with the 21th International Semantic Web Conference (ISWC
2023), Athens, November 6-10, 2023, volume 3559 of CEUR Workshop Proceedings, CEUR-WS.org,
2023. URL: https://ceur-ws.org/Vol-3559/paper-3.pdf.
[12] L.-P. Meyer, J. Frey, D. Heim, F. Brei, C. Stadler, K. Junghanns, M. Martin, LLM-KG-Bench 3.0: A
compass for semantic technology capabilities in the ocean of LLMs, 2025.
[13] J. Frey, L.-P. Meyer, F. Brei, S. Gruender, M. Martin, Assessing the evolution of llm capabilities for
knowledge graph engineering in 2023, in: The Semantic Web: ESWC 2024 Satellite Events, 2025.
doi:10.1007/978-3-031-78952-6_5.
[14] L.-P. Meyer, J. Frey, F. Brei, N. Arndt, Assessing SPARQL capabilities of large language models,
in: E. Vakaj, S. Iranmanesh, R. Stamartina, N. Mihindukulasooriya, S. Tiwari, F. Ortiz-Rodríguez,
R. Mcgranaghan (Eds.), Proceedings of the 3rd International Workshop on Natural Language
Processing for Knowledge Graph Creation co-located with 20th International Conference on
Semantic Systems (SEMANTiCS 2024), volume 3874 of CEUR Workshop Proceedings, 2024. URL:
https://ceur-ws.org/Vol-3874/paper3.pdf.
[15] D. Heim, L.-P. Meyer, M. Schröder, J. Frey, A. Dengel, How do scaling laws apply to knowledge
graph engineering tasks? the impact of model size on large language model performance, in: ESWC
2025 Workshops and Tutorials Joint Proceedings, volume 3977 of CEUR Workshop Proceedings,
2025. URL: https://ceur-ws.org/Vol-3977/elmke-3.pdf.
[16] L.-P. Meyer, J. Frey, F. Brei, D. Heim, S. Gründer-Fahrer, S. Todorovikj2, C. S. Stadler, M. Schröder,
N. Arndt, M. Martin, Evaluating large language models for RDF knowledge graph related tasks - the
LLM-KG-Bench-Framework 3, Semantic Web (2025). doi:10.5281/zenodo.16779481, submitted
for review 05/2025.
[17] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, Y. Cao, React: Synergizing reasoning
and acting in language models, in: International Conference on Learning Representations (ICLR),
2023.
[18] J. Sun, C. Xu, L. Tang, S. Wang, C. Lin, Y. Gong, L. M. Ni, H.-Y. Shum, J. Guo, Think-on-graph: Deep
and responsible reasoning of large language model on knowledge graph (2023). doi:10.48550/
ARXIV.2307.07697. arXiv:2307.07697.
[19] A. Perevalov, X. Yan, L. Kovriguina, L. Jiang, A. Both, R. Usbeck, Knowledge graph question
answering leaderboard: A community resource to prevent a replication crisis, in: N. Calzolari,
F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani,
H. Mazo, J. Odijk, S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and
Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022, European Language Resources
Association, 2022, pp. 2998–3007. URL: https://aclanthology.org/2022.lrec-1.321.
[20] P. Trivedi, G. Maheshwari, M. Dubey, J. Lehmann, Lc-quad: A corpus for complex question
answering over knowledge graphs, in: C. d’Amato, M. Fernández, V. A. M. Tamma, F. Lécué,
P. Cudré-Mauroux, J. F. Sequeda, C. Lange, J. Heflin (Eds.), The Semantic Web - ISWC 2017 - 16th
International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part
II, volume 10588 of Lecture Notes in Computer Science, Springer, 2017, pp. 210–218. doi:10.1007/
978-3-319-68204-4\_22.
[21] M. Dubey, D. Banerjee, A. Abdelkawi, J. Lehmann, Lc-quad 2.0: A large dataset for complex
question answering over wikidata and dbpedia, in: Proceedings of the 18th International Semantic
Web Conference (ISWC), Springer, 2019. doi:10.1007/978-3-030-30796-7_5.
[22] R. Usbeck, X. Yan, A. Perevalov, L. Jiang, J. Schulz, A. Kraft, C. Möller, J. Huang, J. Reineke, A.-C.</p>
      <p>Ngonga Ngomo, M. Saleem, A. Both, Qald-10 – the 10th challenge on question answering over
linked data: Shifting from dbpedia to wikidata as a kg for kgqa, Semantic Web (2023) 1–15.
doi:10.3233/sw-233471.
[23] S. Auer, D. A. C. Barone, C. Bartz, E. G. Cortes, M. Y. Jaradeh, O. Karras, M. Koubarakis,
D. Mouromtsev, D. Pliukhin, D. Radyush, I. Shilin, M. Stocker, E. Tsalapati, The sciqa
scientific question answering benchmark for scholarly knowledge, Scientific Reports 13 (2023).
doi:10.1038/s41598-023-33607-z.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Semnani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Triedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Lam, SPINACH: SPARQL-based information navigation for challenging real-world questions</article-title>
          , in: Y.
          <string-name>
            <surname>Al-Onaizan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>Y.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2024</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>15977</fpage>
          -
          <lpage>16001</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          . findings- emnlp.938.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Höfner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walter</surname>
          </string-name>
          , E. Marx,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , A.-C.
          <article-title>Ngonga Ngomo, Survey on challenges of question answering in the semantic web</article-title>
          ,
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <year>2017</year>
          )
          <fpage>895</fpage>
          -
          <lpage>920</lpage>
          . doi:
          <volume>10</volume>
          .3233/sw- 160247.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>A survey on complex question answering over knowledge base: Recent advances and challenges (</article-title>
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>2007</year>
          .
          <volume>13069</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Perevalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Both</surname>
          </string-name>
          , A.
          <string-name>
            <surname>-C. Ngonga Ngomo</surname>
          </string-name>
          ,
          <article-title>Multilingual question answering systems for knowledge graphs - a survey</article-title>
          ,
          <source>Semantic Web</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>2089</fpage>
          -
          <lpage>2124</lpage>
          . doi:
          <volume>10</volume>
          .3233/sw- 243633.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vahdati</surname>
          </string-name>
          ,
          <article-title>Language models as controlled natural language semantic parsers for knowledge graph question answering</article-title>
          , in: K.
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nowé</surname>
            ,
            <given-names>G. J.</given-names>
          </string-name>
          <string-name>
            <surname>Nalepa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Fairstein</surname>
          </string-name>
          , R. Radulescu (Eds.),
          <source>ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4</source>
          ,
          <year>2023</year>
          , Kraków,
          <source>Poland - Including 12th Conference on Prestigious Applications of Intelligent Systems (PAIS</source>
          <year>2023</year>
          ), volume
          <volume>372</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2023</year>
          , pp.
          <fpage>1348</fpage>
          -
          <lpage>1356</lpage>
          . doi:
          <volume>10</volume>
          .3233/FAIA230411.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Meyer</surname>
          </string-name>
          , C. Stadler,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Radtke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Junghanns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meissner</surname>
          </string-name>
          , G. Dziwis,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bulert</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Martin, LLM-assisted knowledge graph engineering: Experiments with ChatGPT</article-title>
          , in: C.
          <string-name>
            <surname>Zinke-Wehlmann</surname>
          </string-name>
          , J. Friedrich (Eds.),
          <source>First Working Conference on Artificial Intelligence Development for a Resilient and Sustainable Tomorrow (AITomorrow)</source>
          <year>2023</year>
          , Informatik aktuell,
          <year>2024</year>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>115</lpage>
          . doi:
          <volume>10</volume>
          . 1007/978- 3-
          <fpage>658</fpage>
          - 43705-
          <issue>3</issue>
          _
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovriguina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Teucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radyush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mouromtsev</surname>
          </string-name>
          , Sparqlgen:
          <article-title>One-shot prompt-based approach for sparql query generation</article-title>
          ,
          <source>in: International Conference on Semantic Systems</source>
          , volume
          <volume>3526</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3526</volume>
          / paper-08.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Tafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <article-title>Leveraging llms in scholarly knowledge graph question answering</article-title>
          , in: QALD/SemREC@ ISWC,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Brei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          , L.-P. Meyer,
          <article-title>Leveraging small language models for Text2SPARQL tasks to improve the resilience of AI assistance</article-title>
          , in: J.
          <string-name>
            <surname>Holze</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Tramp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Usbeck</surname>
          </string-name>
          , N. Krdzavac (Eds.),
          <source>Proceedings of the Third International Workshop on Linked Data-driven Resilience Research</source>
          <year>2024</year>
          (
          <volume>D2R2</volume>
          '24),
          <source>colocated with ESWC</source>
          <year>2024</year>
          , volume
          <volume>3707</volume>
          <source>of CEUR-WS</source>
          ,
          <year>2024</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3707</volume>
          /D2R224_paper_5.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Meyer</surname>
          </string-name>
          , J. Frey,
          <string-name>
            <given-names>K.</given-names>
            <surname>Junghanns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bulert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gründer-Fahrer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <article-title>Developing a scalable benchmark for assessing large language models in knowledge graph engineering</article-title>
          , in: N.
          <string-name>
            <surname>Keshan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gentile</surname>
          </string-name>
          , S. Vahdati (Eds.),
          <source>Proceedings of the Posters and Demo Track of the 19th International Conference on Semantic Systems (SEMANTICS</source>
          <year>2023</year>
          ), volume
          <volume>3526</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3526</volume>
          /paper-04.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          , L.-P. Meyer,
          <string-name>
            <given-names>N.</given-names>
            <surname>Arndt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bulert</surname>
          </string-name>
          ,
          <article-title>Benchmarking the abilities of large language models for RDF knowledge graph creation and comprehension: How well do llms speak turtle?,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>F.</given-names>
            <surname>Brei</surname>
          </string-name>
          , L.-P. Meyer, M. Martin,
          <article-title>Queryfy: from knowledge graphs to questions using open large language models: Enabling finetuning by question generation on given knowledge, it</article-title>
          - Information
          <string-name>
            <surname>Technology</surname>
          </string-name>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .1515/itit-2024-0079.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Röder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Conrads</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huthmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C.</given-names>
            <surname>Ngonga-Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Demmler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Unger</surname>
          </string-name>
          ,
          <article-title>Benchmarking question answering systems</article-title>
          ,
          <source>Semantic Web</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>293</fpage>
          -
          <lpage>304</lpage>
          . doi:
          <volume>10</volume>
          . 3233/sw-180312.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Buchhold</surname>
          </string-name>
          ,
          <article-title>Qlever: A query engine for eficient sparql+text search</article-title>
          ,
          <source>in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '17</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>647</fpage>
          -
          <lpage>656</lpage>
          . doi:
          <volume>10</volume>
          .1145/3132847.3132921.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <article-title>An empirical study of pre-trained language models in simple knowledge graph question answering</article-title>
          ,
          <source>World Wide Web</source>
          <volume>26</volume>
          (
          <year>2023</year>
          )
          <fpage>2855</fpage>
          -
          <lpage>2886</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [28]
          <string-name>
            <surname>M. D. Wilkinson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
          <string-name>
            <surname>Aalbersberg</surname>
            , G. Appleton,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Axton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Baak</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>J.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Boiten</surname>
            ,
            <given-names>L. B. da Silva</given-names>
          </string-name>
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>P. E.</given-names>
          </string-name>
          <string-name>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bühmann</surname>
          </string-name>
          , S. Bin,
          <article-title>FAIR data publishing with apache maven</article-title>
          , in: L.
          <string-name>
            <surname>J. Castro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Rebholz-Schuhmann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Dessì</surname>
          </string-name>
          , S. Schimmler (Eds.),
          <source>Proceedings of the Fourth Workshop on Metadata and Research</source>
          (objects)
          <article-title>Management for Linked Open Science - DaMaLOS 2024 colocated with Extended Semantic Web Conference (ESWC), PUBLISSO</article-title>
          , Hersonissos, Greece,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .4126/FRL01-006474023.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paschke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schäfermeier</surname>
          </string-name>
          ,
          <article-title>Ontomaven-maven-based ontology development and management of distributed ontology repositories, Synergies Between Knowledge Engineering and Software Engineering (</article-title>
          <year>2018</year>
          )
          <fpage>251</fpage>
          -
          <lpage>273</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Götz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <article-title>Managing and compiling data dependencies for semantic applications using databus client</article-title>
          , in: E. Garoufallou,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Ovalle-Perandones</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Vlachidis (Eds.),
          <source>Metadata and Semantic Research - 15th International Conference, MTSR</source>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          ,
          <source>November 29 - December 3</source>
          ,
          <year>2021</year>
          , Revised Selected Papers, volume
          <volume>1537</volume>
          of Communications in Computer and Information Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>114</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Röder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          , GERBIL
          <article-title>- benchmarking named entity recognition and linking consistently</article-title>
          ,
          <source>Semantic Web</source>
          <volume>9</volume>
          (
          <year>2018</year>
          )
          <fpage>605</fpage>
          -
          <lpage>625</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-170286.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ferraz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofer</surname>
          </string-name>
          ,
          <article-title>Pots - a polyparadigmatic ontology term search with fine-grained context steering using hyper-level vector spaces</article-title>
          ,
          <source>in: Companion Proceedings of the ACM on Web Conference</source>
          <year>2025</year>
          , WWW '25,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2025</year>
          , p.
          <fpage>2831</fpage>
          -
          <lpage>2834</lpage>
          . URL: https://doi.org/10.1145/3701716.3715194. doi:
          <volume>10</volume>
          .1145/3701716.3715194.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>