<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QAWiki: A Knowledge Graph Question Answering &amp; SPARQL Query Generation Dataset for Wikidata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Moya Loustaunau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidan Hogan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DCC, Universidad de Chile</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IMFD</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In this resource paper, we present QAWiki: a multilingual, handcrafted, knowledge graph question answering and SPARQL query generation dataset for Wikidata. QAWiki consists of 526 questions over Wikidata, of which 518 are associated with SPARQL queries, and 8 are disambiguation questions. Each question is presented in both English and Spanish, and includes paraphrased versions of the question, as well as annotations of entity and relation mentions for Wikidata. The dataset is hosted in a Wikibase instance, which allows for collaborative editing and refinement of the dataset by the community, among other features. Further metadata include tagging questions with issues (e.g., incompleteness, imprecision, ambiguity) as well as defining relations between questions (e.g., a question whose answers are contained in another question, etc.). QAWiki can thus be used as an evaluation (and training) dataset for knowledge graph question answering &amp; query generation systems. We provide illustrative experiments over QAWiki using GPT-4o to generate SPARQL queries over Wikidata, comparing performance with and without passing entity mentions to the model via the prompt.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge Graphs (KGs) are powerful abstractions of structured knowledge, representing
entities and the relationships between them as nodes and edges in a graph. They have found
wide application across domains such as search engines, recommendation systems, digital
assistants, and scientific knowledge management [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Prominent examples include Wikidata [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and DBpedia [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These graphs are typically queried using formal languages such as SPARQL,
the W3C-standard query language for RDF data. While SPARQL is expressive and precise, its
usage requires technical expertise that is out of reach for most end users.
      </p>
      <p>To bridge this accessibility gap, the field of Knowledge Graph Question Answering (KGQA)
has emerged, aiming to enable users to access rich, factual, and structured knowledge via natural
language questions, without requiring expertise in formal query languages [4]. Closely related
benchmark tasks are Question Answering over Linked Data (QALD) [5], and SPARQL query
generation, which addresses KGQA by translating natural language questions into SPARQL
queries [6, 7]. Neural models for these tasks have achieved good results in handling relatively
simple questions, such as those answerable by a single triple (or a single fact/claim).</p>
      <p>
        Recent research has highlighted the growing importance of addressing complex natural
language questions that require reasoning over multiple relations, constraints, and inference
conditions [8]. However, most existing QA datasets primarily consist of simple, single-hop
questions [9], whereas real-world user queries often involve multi-hop reasoning, aggregations,
temporal or comparative logic [10, 11]; indeed, the true power of knowledge graphs lies in
being able to address more complex questions. In addition, the performance of KGQA systems
heavily depends on accurate entity and relation linking: a step that remains challenging due to
ambiguity, discontinuity, and linguistic variability, particularly in multilingual or low-resource
settings [12]. Recent evaluations have also shown that existing KGQA datasets often lack the
diversity and complexity required to support compositional or zero-shot generalization, limiting
their usefulness for training systems that can handle real-world queries [
        <xref ref-type="bibr" rid="ref4">13</xref>
        ].
      </p>
      <p>
        To support the development of robust KGQA systems, datasets should ideally provide: (i)
gold-standard SPARQL queries for execution supervision [
        <xref ref-type="bibr" rid="ref5">14</xref>
        ]; (ii) explicit annotations of entity
and property mentions for robust linking [
        <xref ref-type="bibr" rid="ref6">15</xref>
        ]; (iii) diverse and expressive question forms,
including paraphrases and natural ambiguity [
        <xref ref-type="bibr" rid="ref7">16</xref>
        ]; and (iv) multilingual coverage [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In this resource paper, we introduce QAWiki: a novel, multilingual, and
collaborativelycurated dataset for benchmarking or training question answering and query generation systems
over Wikidata. QAWiki complements existing datasets by ofering a rich, diverse, and extensible
collection of questions, annotated with entity and property mentions, SPARQL queries, quality
tags, and semantic relations between questions. QAWiki includes complex multi-hop questions
that require diverse SPARQL operators to represent, and explicitly tags questions with ambiguity
or incompleteness, enabling fine-grained error analysis. It further includes annotations of entity
mentions and relation mentions to help evaluate systems in more detail, and to facilitate creating
larger synthetic datasets (e.g., by replacing the entities). Its multilingual design—with parallel
questions and annotations in English and Spanish—supports comparison across languages. A
key diference to other larger datasets is that QAWiki has been almost entirely handcrafted
(periodically, over a span of more than two years) to ensure diverse, high-quality questions and
queries with human-like phrasing. Moreover, by being hosted in a publicly accessible Wikibase
instance, QAWiki supports community-driven editing, extensibility, and provenance tracking.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>KGQA has seen significant advancements over the past decade. Nevertheless, the datasets used
for research exhibit limitations, particularly regarding complexity, diversity and quality.</p>
      <p>
        Early KGQA datasets include the following:
• WebQuestionsSP [
        <xref ref-type="bibr" rid="ref8">17</xref>
        ] and SimpleQuestions [
        <xref ref-type="bibr" rid="ref9">18</xref>
        ] provide large sets of relatively simple
question–answer pairs over Freebase.
• LC-QuAD 1.0 [
        <xref ref-type="bibr" rid="ref10">19</xref>
        ] provides DBpedia-based multi-hop questions generated using
templates, and thus exhibiting limited linguistic variety.
      </p>
      <p>• LC-QuAD 2.0 [7] incorporates crowdsourced natural language paraphrases over Wikidata.</p>
      <p>
        Initially, most KGQA datasets were English-only, limiting accessibility. The need for broader
linguistic coverage led to the inclusion of multiple languages:
• The QALD series (e.g., QALD-7 [
        <xref ref-type="bibr" rid="ref11">20</xref>
        ], QALD-9 [5], QALD-10 [
        <xref ref-type="bibr" rid="ref12">21</xref>
        ]) introduced manual,
crowdsourced translations across various languages (e.g., English, German, Chinese,
Russian, Spanish), though often at a limited scale.
• Other datasets, such as RuBQ [
        <xref ref-type="bibr" rid="ref13">22, 23</xref>
        ] and MCWQ [24], extend language support through
automatic translation and human validation.
      </p>
      <p>Several recent KGQA datasets have been developed natively over Wikidata:
• WikiWebQuestions [25]: An adaptation of WebQuestions to Wikidata, pairing
naturallanguage questions with SPARQL annotations. Despite its real-world question style, it
remains English-only and contains relatively simple questions.
• QALD Series (QALD-7, QALD-9, QALD-10): These benchmarks are manually curated and
multilingual. In particular, QALD-10 was built from scratch: English-language questions
were authored by proficient speakers and translated into multiple languages via native
translators. Final SPARQL queries were crafted manually by domain-aware experts to suit
Wikidata’s schema and handle its labeling inconsistencies. (These are the most closely
related datasets to what we provide; a comparison is provided later.)
• WikidataQA [26] is a small, handcrafted dataset with 100 questions and their
corresponding queries over Wikidata.
• SPINACH [27]: This dataset begins with real SPARQL queries harvested from Wikidata’s
“Request a Query” forum. Experts then crafted corresponding natural-language questions
that faithfully reflect each query’s intent, albeit in a more formal style than typical forum
phrasing. SPINACH exhibits high structural complexity, but is monolingual (English-only),
static, and does not include mention-level annotations.</p>
      <p>
        Construction methodologies for KGQA datasets vary widely, each with trade-ofs:
• Synthetic Generation: Datasets like LC-QuAD 1.0 [
        <xref ref-type="bibr" rid="ref10">19</xref>
        ], KQA Pro [28], MCWQ [24],
ComplexWebQuestions [
        <xref ref-type="bibr" rid="ref7">16</xref>
        ], and CFQ [29] are built using grammar rules or templates to
generate SPARQL queries and corresponding natural language questions. While efective
for generating large-scale data, they often sufer from lack of realism and diversity.
• Crowdsourcing: LC-QuAD 2.0 [7] and QALD-10 [
        <xref ref-type="bibr" rid="ref12">21</xref>
        ] leverage human contributors
for paraphrasing existing questions or directly translating them, ensuring more natural
language but potentially leading to inconsistent quality or scale limitations.
• Automatic Translation: RuBQ [
        <xref ref-type="bibr" rid="ref13">22, 23</xref>
        ] and MCWQ [24] use machine translation to
cover more natural languages. MCWQ mitigates translation errors via human review.
• Expert Curation: Datasets such as SPINACH [27] and parts of QALD-10 [
        <xref ref-type="bibr" rid="ref12">21</xref>
        ] involve
expert linguists or domain specialists to ensure high quality, correctness, and executable
SPARQL queries. This method typically results in smaller, but higher-quality datasets.
      </p>
      <p>
        A critical evaluation by Jiang and Usbeck [
        <xref ref-type="bibr" rid="ref4">13</xref>
        ] of 25 KGQA datasets revealed that most
resources lack suficient support for zero-shot or compositional generalization, often due to
template-based construction and lack of diversity. Furthermore, datasets like WebQuestionsSP
and CWQ were found to have factual correctness rates below 60%, reflecting annotation errors
and outdated knowledge graph links. A quick review of many such datasets confirms such
quality issues, where for example, in LC-QuAD 2.0, we can find many questions of a similar
form to “Where is {disciples} of {Nadia Boulanger}, which has {location of death} is {Azores} ?” [sic.]
where it seems the crowdsourced paraphrasing was not done as intended.
      </p>
      <p>
        In contrast, QAWiki is designed to be a high-quality, multilingual, and extensible resource. It
combines expert curation with community collaboration through a Wikibase instance, includes
explicit mention-level annotations, and features both simple and complex questions with
humanlike phrasing in diverse domains and of diverse forms. These design choices aim to meet the
evolving needs of KGQA research and respond directly to critiques in the literature [
        <xref ref-type="bibr" rid="ref4">13</xref>
        ]. To the
best of our knowledge, all questions and queries are handcrafted by design (the vast majority
by the authors, with some community contributions). In comparison to other datasets, we
argue that QAWiki is of higher quality and diversity, though smaller than many synthetic
datasets. However, depending on the need, QAWiki can serve as input for synthetic generation,
crowdsourcing, automatic translation or paraphrasing, etc., in order to generate larger datasets.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. QAWiki: Resource Description</title>
      <sec id="sec-3-1">
        <title>We now provide an overview of QAWiki.</title>
        <sec id="sec-3-1-1">
          <title>3.1. Question set</title>
          <p>QAWiki currently includes 526 hand-crafted questions, all written or paraphrased in fluent
English and Spanish (not automatically generated or translated). Of these, 8 questions are
disambiguation questions (i.e., not specific enough to have a clear intent). While the questions are
not equally distributed across categories, the set is composed of the following broad categories
based on what the question returns:
• Entity questions (“Which has the most . . . ?”, “Which is the latest . . . ?”): 77
• Entity-set questions (lists or tables of entities satisfying a condition): 284
• Numeric questions (e.g., “How many . . . ?”, “What is the average . . . ?”): 45
• Temporal questions (e.g., “In which year . . . ?”, “Since when . . . ?”): 54
• Boolean questions (“Is . . . deceased?”, “Does . . . exist?”): 35
• Miscellaneous attribute questions (measurements, identifiers, locations, etc.): 31</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Multilinguality and Paraphrasing</title>
          <p>All QAWiki questions are presented in both English and Spanish, and all questions also have
one or more paraphrased versions. Translations are intended to be idiomatic, and paraphrased
versions to be distinctive but still natural; for example:
• Label (EN): “How old was Daniel Day-Lewis when he won his first Academy Award?”</p>
          <p>Alias (EN): “What age was Daniel Day-Lewis when he was first awarded an Oscar?”
• Label (ES): “¿Cuántos años tenía Daniel Day-Lewis cuando ganó su primer premio Óscar?”
Alias (ES): “¿A qué edad recibió Daniel Day-Lewis su primer premio Óscar?”
s
n
o
i
t
s
e
u
q
f
o
r
e
b
m
u
N
0 1 2 3 4 5 6 7 8N9um01be11r o2f m3en4tio5ns6pe7r q8ues9tio02n 12 22 23 24 25 26 27 28 30 41
1 1 1 1 1 1 1 1
As observed here, the translation intends to be natural, not direct, where “Premio de la Academia”
in Spanish is much less idiomatic than its direct translation “Academy Award” in English, and
thus is not used in the Spanish question. Some questions are further expressed in specific
dialects where diferent; for example: “ In which countries is Elon Musk a naturalized citizen?”
is tagged @en, while “In which countries is Elon Musk a naturalised citizen?” is also provided,
tagged @en-GB. This occurs in a handful (7) of cases. Finally, the community has provided 22
questions in Italian, and 7 in Danish. QAWiki currently contains 3,000 question forms.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.3. Entity and Property Mentions</title>
          <p>QAWiki includes fine-grained annotations of entity and relation mentions. These mentions are
substrings of the English or Spanish questions and paraphrases, and are manually linked to the
corresponding Wikidata identifiers: Q-IDs for entities and P-IDs for properties.</p>
          <p>Importantly, the annotation process is independent of the SPARQL query. That is, mentions
are identified based on the natural language surface form, regardless of whether or not they are
used directly in the query. Indeed, in many cases, there may be more than one way to formulate
a query using diferent mentions; for example, if a question refers to the U.S. President, a query
might use the Wikidata entity President of the United States (Q11696), or might use United States
(Q30) and from there traverse the property head of state (P35) or indeed the property ofice held
by head of state (P1906). Mentions are thus intended to be exhaustive: overlapping entities are
included, and some entities may be annotated with several alternatives.</p>
          <p>Property mentions are considerably more complex than entity mentions, where we thus
include a number of diferent types of mentions in our dataset:
• Direct: Indicates that the property is directly mentioned, e.g., “What is the population of</p>
          <p>Qatar?” contains the direct mention “population” of population (P1082) on Wikidata.
• Inverse: Indicates that the property is inversely mentioned, e.g., “Who played Eleven in</p>
          <p>Stranger Things?” contains the inverse mention “played” of performer (P175) on Wikidata.
• Existence: Indicates that the property has some value, e.g., “Which popes were married?”
contains the existence mention “married” of spouse (P20).
• Non-existence: Indicates that the property has no value, e.g., “Which living people have
an element named after them?” contains the non-existence mention “living” of place of
death (P20), date of death (P570), etc.
• Specific value : Indicates that the property is implicitly mentioned via a specific value, e.g.,
“Who is the drummer of the band Battles?” contains the specific-value mention “ drummer”
of occupation (P106); we also indicate the value, in this case, drummer (Q386854); other
cases include gender, nationality, etc., but we exclude references to instance of (P31) and
subclass of (P279), which we assume to be so common as to be understood.
• Superlative: Indicates that the property has a superlative value, e.g., “What is the tallest
mountain in the world outside of Asia?” contains the superlative mention “tallest” of
elevation above sea level (P2044); we indicate if the superlative is a maximum or minimum.</p>
          <p>We further capture discontinuous mentions, where the mention is split across the sentence;
for example, “Where was tellurium discovered?” involves the discontinuous mention “Where [...]
discovered" of location of discovery (P189), whereby “Where” diferentiates the property
from others that indicate who discovered something, when something was discovered, etc.</p>
          <p>In total, 5,475 mentions are annotated by hand, pointing to 1,258 distinct Wikidata identifiers:
1,038 distinct items (q) and 219 distinct properties (p). All QAWiki questions currently have
mentions for English and Spanish (if applicable). Figure 1 lists the distribution of mentions per
question. We allow questions to import mentions from another similar question to save manual
efort, which results in some questions having zero mentions.</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>3.4. SPARQL Queries</title>
          <p>Each non-ambiguous QAWiki question—such that the intent is clear (518 questions in total)—is
annotated with one or more SPARQL queries that retrieve its correct answers from Wikidata.
The queries are executable, enabling both supervised training of query generation and validation
of question answering systems via their results. For example, for the question “What was the last
novel published by Harper Lee?”, we include the following query (with comments and indentation
added here for readability purposes):
SELECT DISTINCT ?sbj
WHERE { ?sbj wdt:P7937 wd:Q8261 . # form of creative work: novel
?sbj wdt:P50 wd:Q182658 . # author: Harper Lee
?sbj wdt:P577 ?obj . } # publication date
ORDER BY DESC(?obj) LIMIT 1</p>
          <p>These queries are handcrafted. An important criterion is to formulate the query as generally
as possible for the question type. For example, for the question “Which university in Pakistan
has the most students?”, the phrase “in Pakistan” is captured with the SPARQL property path:
?sbj wdt:P131*/wdt:P17?/wdt:P30? wd:Q843
which captures a wide variety of cases, including being transitively located in the place
(wdt:P131*), with the place being a region (wdt:P131), a country (wdt:P17), or continent
(wdt:P30), etc. Thus the query will function if “Pakistan” is replaced by “Islamabad”, Asia”, etc.</p>
          <p>Compared to many other KGQA benchmarks, QAWiki exhibits a rich set of query operators,
as shown in Table 1. The average number of predicates per query is 2.95 (std. 1.84), higher than
in QALD-10 (1.84) or WikiWebQuestions (1.58), but below SPINACH (4.67). Indeed, SPINACH
has more complex queries overall than QAWiki, due to how it was generated: by taking existing
SPARQL queries for Wikidata and generating questions from them. However, this leads to
rather unnatural questions, such as “Which properties are used in claims related to items that are
public elections? Include a count of the number of times it was used.”, which are more akin to an
explanation or verbalization of a query than a question a person would naturally ask.</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>3.5. Quality Tags</title>
          <p>QAWiki incorporates optional metadata in the form of quality tags, which identify issues that
may complicate question answering systems. They intend to annotate issues in questions a user
is likely to ask. For example:
• ambiguous: The question has more than one plausible interpretation or scope; for
example, the question “What is the largest country in Africa?” does not clarify if this is by
area, or by population.
• subjective criteria: The question contains non-crisp criteria; for example, the
question Which conferences focus on Machine Learning? requires a subjective interpretation
of “focus on”.</p>
          <p>Tags are also used on queries to indicate potential issues relating to the results returned:
• incomplete: The query should be expected to return incomplete answers.
• no ties: The query uses LIMIT 1 on a superlative question, and thus will not return
ties (if any).
• controversial or unconfirmed data: The query returns answers based on
unconifrmed information.</p>
          <p>SPARQL queries provided in these cases aim to provide best-efort answers to the question.
The inclusion of such questions is a deliberate design choice considering that Wikidata will
not always be able to provide sound and complete results for all users’ questions in practice.
Hence we foresee that such tags can be used to flag to the user that the answers provided are
best-efort, and may exhibit the aforementioned issues.</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>3.6. Question relations</title>
          <p>QAWiki further captures semantic relations between questions. Originally, this began as a way
to disambiguate questions, but evolved over time to capture more complex relations, including:
• disambiguates: A question may represent one way to disambiguate another more
ambiguous question; for example: “What is the largest country in Africa by area?”
disambiguates “What is the largest country in Africa?”.
• broader/narrower: The results for a narrow question may be contained in those of a
broader question; for example: “Which oficial languages of the European Union are not
Indo-European?” is narrower than ”What are the oficial languages of the European Union? ”.
• contingent: One question may assert an assumption underlying another; for example,
“Did Ian Curtis commit suicide?” verifies the assumption of “ When did Ian Curtis commit
suicide?” (the latter is contingent on the former).
• count of/boolean of, etc.: One question might count the answers of another question,
or ask if another question has any answers; for example: “How many cities are twinned
with Port-au-Prince?” counts “What cities are twinned with Port-au-Prince?”.</p>
          <p>We believe that these relations could be useful for evaluating the consistency of answers
generated by a system (e.g., to see if the results of a broader question are indeed contained in
the narrower question, or to see if the count question actually returns the number of results
in the base question), to train systems to detect and resolve ambiguity interactively with the
user (e.g., to suggest questions that disambiguate an ambiguous one), and to also avoid issues
relating to false premises in questions.</p>
        </sec>
        <sec id="sec-3-1-7">
          <title>3.7. Implementation, Availability &amp; Quality Control</title>
          <p>QAWiki is hosted on a public Wikibase instance available at https://qawiki.org/, enabling users
to collaboratively extend and refine the dataset, and thus for the dataset to evolve over time.</p>
          <p>Wikibase ofers various desirable features for such a dataset, including multilingual support,
lfexible schema, identifier schemes with autocompletion for editing, etc. It is also accompanied
by a SPARQL endpoint for querying and validating annotations, which proved to be very useful
for checking and resolving errors. In preparing this version of QAWiki, we use 14
qualitycontrol SPARQL queries1 to find, for example, mentions not contained in a question, Wikidata
item links that do not start with ‘Q’, Wikidata property links that do not start with ‘P’, etc.
We also employed LLMs to verify and suggest corrections for question phrasing, which were
manually reviewed and applied. We foresee that similar processes can be employed in future to
provide high-quality snapshots of QAWiki incorporating community contributions.</p>
          <p>The dataset is already integrated into systems like Templet [30], which enables
templatebased question answering over Wikidata powered by QAWiki (and its explicit entity mentions).</p>
          <p>The entire dataset is licensed under CC0, ensuring free and unrestricted use. We provide a
snapshot (v1) of QAWiki on Zenodo corresponding to this paper [31].
3.8. Usage
QAWiki is primarily intended to support the evaluation of knowledge graph question answering
&amp; SPARQL query generation approaches. The queries provided can be used to generate answer
sets from Wikidata for evaluating question answering. QAWiki may also be useful for training
or fine-tuning models, potentially in combination with methodologies that use QAWiki as a
seed dataset from which to synthetically generate more instances (e.g., by replacing the entity
mentions provided with parameters to create question–query templates for generating more
instances, or by using machine translation to test further natural languages, or by using LLMs
to paraphrase questions, etc.). It may also support few-shot learning or prompting.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>To illustrate the utility of the QAWiki resource, we evaluate the ability of a large language model
(GPT-4o) to generate accurate SPARQL queries from natural language questions in the QAWiki
and SPINACH datasets. We further evaluate how performance improves when relevant entity
mentions from QAWiki are passed to the model.</p>
      <sec id="sec-4-1">
        <title>4.1. Evaluation Measures</title>
        <p>To quantify the quality of the SPARQL queries generated by a large language model
(GPT4o), we might first consider comparing the predicted query against the gold standard query.
However, there is no canonical way to write a query, and the problem of deciding if two SPARQL
queries are equivalent—i.e., if they give the same results over any dataset—is undecidable [32].
In fact, even if we had an oracle for SPARQL query equivalence, it would still not sufice, as
Wikidata contains redundancy, meaning there might be several ways to achieve valid answers
via non-equivalent queries (per the U.S. President example presented in Section 3.3).</p>
        <p>We thus rather follow the same approach proposed for SPINACH [27]: comparing the table
of results for the predicted query with that of the gold standard query. This is perhaps more
challenging than it first appears: result rows may appear in any order, queries may project a
diferent number of variables in a diferent order, etc. Furthermore, in QAWiki, unless otherwise</p>
        <sec id="sec-4-1-1">
          <title>1See https://qawiki.org/wiki/QAWiki:Curation/Quality_Control_Queries</title>
          <p>required by the question, queries return Wikidata identifiers without additional information
(which is trivial to retrieve in a later step). If the predicted query returns additional information
via projected variables about the entities (e.g., their labels), this should not afect the measure.</p>
          <p>These issues are addressed by computing a matrix of recall values between all row pairs. Each
query yields a table of results, where each row is treated as a set of values (representing entity
IDs, literals, or labels). To compare two such tables—one for the predicted query and one for the
gold standard query—we compute a matrix of recall values between all row pairs. For a given
pair of rows (,  ), where  is a gold row and  is a predicted row, we define recall as [27]:
Recall(,  ) = | ∩  | .</p>
          <p>||
Thus, extra variables with auxiliary information in predicted rows do not afect the measure.</p>
          <p>We then compute a bipartite matching between gold and predicted rows that maximizes the
total recall using the Hungarian algorithm. The overall precision and recall are aggregated over
all matched pairs, and the final F1 score is computed as [27]:</p>
          <p>F1 = 2 · TP</p>
          <p>2 · TP + FP + FN
where TP is the sum of per-row recalls, FP is the number of unmatched predicted rows, and
FN accounts for unmatched gold rows and partial mismatches. We also report the Exact Match
(EM) metric, defined as 1 if the F1 score is exactly 1.0, and 0 otherwise.</p>
          <p>A limitation of this approach is that we consider rows as sets, not tuples, which leaves open
the possibility that we consider a predicted row as “correct” if it contains the correct values but
in incorrect columns. We believe such a situation to be rare in practice (the vast majority of
QAWiki queries project one column, for example). And as aforementioned, this formulation
allows us to compare result sets even when they difer in column projections, column order
or row order, or where outputs may contain additional attributes. The evaluation is based on
the execution outputs of the queries, rather than their surface form, and is thus agnostic to
syntactic or structural diferences in the queries themselves.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experimental Setup</title>
        <p>We evaluate the quality of SPARQL queries produced by GPT-4o on two datasets:
• SPINACH: A collection of expert-annotated complex queries.</p>
        <p>• QAWiki: The handcrafted set of queries presented herein.</p>
        <p>We evaluate GPT-4o under two configurations. In the Base configuration, the model is
provided only with the natural language question. In the +Linked Entities configuration (only
applicable to QAWiki), the model is also given a list of entity mentions and their corresponding
Wikidata IRIs to help formulate the query. The predicted and gold standard queries are later
evaluated over the Wikidata Query Service. To minimize potential diferences stemming from
changes on Wikidata, we evaluate all queries together in the same time frame.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results</title>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Limitations</title>
        <p>A number of limitations of this evaluation have already been discussed, namely that (1) our
measure may count as correct a row with the correct values in incorrect columns; and (2)
evaluating queries on the live Wikidata instance may lead to results changing between query
executions. We add to this two other important limitations: (3) basing the evaluation only on
results does not consider a query getting the correct results with an incorrect query, which can
be particularly problematic in the case of ASK queries, and (4) both datasets are available on the
Web, and thus may have formed part of the training data for the model (though GPT-4o only
has knowledge up to October 2023). Addressing these limitations is left for future work.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>QAWiki is an evolving, handcrafted dataset for knowledge graph question answering &amp; SPARQL
query generation over Wikidata. It currently contains 526 questions, of which 518 questions
have associated SPARQL queries and 8 are disambiguation questions. The dataset contains rich
supporting metadata for these questions and queries, including questions in both Spanish and
English, 3,000 paraphrased question forms in multiple languages, and 5,443 annotated mentions
of entities and relations. The dataset is hosted on a Wikibase instance at https://qawiki.org/,
with a snapshot also available on Zenodo [31].</p>
      <p>Our preliminary evaluation shows that QAWiki presents a challenging dataset, especially
when no additional context is provided. We further demonstrate that providing linked entity
information significantly improves performance, highlighting the importance of entity
disambiguation in query generation tasks. In future work, it would be of interest to evaluate over
more models, and also to explore the trade-ofs (in terms of precision, runtime, etc.) between
using a given LLM for direct question answering vs. generating a query to derive answers.</p>
      <p>We welcome contributions from the community—by adding questions, defining queries,
refining the dataset, adding more natural languages, etc.—and hope that QAWiki can become a
collaborative project wherein the community develops a high-quality, consensus-driven dataset
for knowledge graph question answering &amp; SPARQL query generation. This in turn will benefit
research on these topics by enabling the training of better models, and more robust evaluation
of such models. Eventually, we hope that this will result in better natural language interfaces
that unlock the true power of Wikidata for a much broader class of users.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was funded in part by ANID – Millennium Science Initiative Program – Code
ICN17_002. We thank Daniel Diomedi and all who provided questions/queries for QAWiki. We
also thank the anonymous reviewers whose feedback helped to improve the paper.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>Generative AI was used in three aspects of the research: (1) the DeepSeek API was used to
classify questions, with modifications manually applied to address reviewer feedback; (2) GPT-4o
was used to identify candidate corrections for the phrasing of the natural language questions in
the QAWiki dataset, which were manually reviewed and implemented (if applicable); (3) GPT-4o
was evaluated as part of the experiments on SPARQL query generation. In the preparation of
the paper: (4) GPT Web was used to check the grammar and improve the overall writing, where
the authors carefully revised and take responsibility for the final content.
2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November
11-15, 2007, volume 4825 of Lecture Notes in Computer Science, Springer, 2007, pp. 722–735.
doi:10.1007/978-3-540-76298-0\_52.
[4] S. Ji, S. Pan, E. Cambria, P. Marttinen, P. S. Yu, A Survey on Knowledge Graphs:
Representation, Acquisition, and Applications, IEEE Trans. Neural Networks Learn. Syst. 33 (2022)
494–514. doi:10.1109/TNNLS.2021.3070843.
[5] R. Usbeck, R. H. Gusmita, A. N. Ngomo, M. Saleem, 9th Challenge on Question Answering
over Linked Data (QALD-9) (invited paper), in: Joint proceedings of the 4th Workshop
on Semantic Deep Learning (SemDeep-4) and NLIWoD4: Natural Language Interfaces for
the Web of Data (NLIWOD-4) and 9th Question Answering over Linked Data challenge
(QALD-9) co-located with 17th International Semantic Web Conference (ISWC 2018),
Monterey, California, United States of America, October 8th - 9th, 2018, volume 2241 of
CEUR Workshop Proceedings, CEUR-WS.org, 2018, pp. 58–64.
[6] T. Soru, E. Marx, D. Moussallem, G. Publio, A. Valdestilhas, D. Esteves, C. B. Neto, SPARQL
as a Foreign Language, in: Proceedings of the Posters and Demos Track of the 13th
International Conference on Semantic Systems - SEMANTiCS2017 co-located with the
13th International Conference on Semantic Systems (SEMANTiCS 2017), Amsterdam,
The Netherlands, September 11-14, 2017, volume 2044 of CEUR Workshop Proceedings,
CEUR-WS.org, 2017.
[7] M. Dubey, D. Banerjee, A. Abdelkawi, J. Lehmann, LC-QuAD 2.0: A Large Dataset for
Complex Question Answering over Wikidata and DBpedia, in: The Semantic Web - ISWC
2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October
2630, 2019, Proceedings, Part II, volume 11779 of Lecture Notes in Computer Science, Springer,
2019, pp. 69–78. doi:10.1007/978-3-030-30796-7\_5.
[8] B. Fu, Y. Qiu, C. Tang, Y. Li, H. Yu, J. Sun, A Survey on Complex Question Answering
over Knowledge Base: Recent Advances and Challenges, CoRR abs/2007.13069 (2020).
arXiv:2007.13069.
[9] J. Berant, A. Chou, R. Frostig, P. Liang, Semantic Parsing on Freebase from
QuestionAnswer Pairs, in: Proceedings of the 2013 Conference on Empirical Methods in Natural
Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle,
Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, ACL, 2013,
pp. 1533–1544. doi:10.18653/V1/D13-1160.
[10] J. Bao, N. Duan, Z. Yan, M. Zhou, T. Zhao, Constraint-Based Question Answering with
Knowledge Graph, in: COLING 2016, 26th International Conference on Computational
Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka,
Japan, ACL, 2016, pp. 2503–2514.
[11] S. Mitra, R. R. Ramnani, S. Sengupta, Constraint-based Multi-hop Question Answering with
Knowledge Graph, in: Proceedings of the 2022 Conference of the North American Chapter
of the ACL: Human Language Technologies: Industry Track, NAACL 2022, Hybrid: Seattle,
Washington, USA + Online, July 10-15, 2022, ACL, 2022, pp. 280–288. doi:10.18653/V1/
2022.NAACL-INDUSTRY.31.
[12] L. Logeswaran, M. Chang, K. Lee, K. Toutanova, J. Devlin, H. Lee, Zero-Shot Entity Linking
by Reading Entity Descriptions, in: Proceedings of the 57th Conference of the Association
for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume
[23] I. Rybin, V. Korablinov, P. Efimov, P. Braslavski, RuBQ 2.0: An Innovated Russian Question
Answering Dataset, in: The Semantic Web - 18th International Conference, ESWC 2021,
Virtual Event, June 6-10, 2021, Proceedings, volume 12731 of Lecture Notes in Computer
Science, Springer, 2021, pp. 532–547. doi:10.1007/978-3-030-77385-4\_32.
[24] R. Cui, R. Aralikatte, H. C. Lent, D. Hershcovich, Compositional Generalization in
Multilingual Semantic Parsing over Wikidata, Trans. Assoc. Comput. Linguistics 10 (2022)
937–955. doi:10.1162/TACL\_A\_00499.
[25] S. Xu, S. Liu, T. Culhane, E. Pertseva, M. Wu, S. J. Semnani, M. S. Lam, Fine-tuned LLMs
Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing
over Wikidata, in: Proceedings of the 2023 Conference on Empirical Methods in Natural
Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, ACL, 2023, pp. 5778–
5791. doi:10.18653/V1/2023.EMNLP-MAIN.353.
[26] D. Diomedi, A. Hogan, Entity Linking and Filling for Question Answering over Knowledge
Graphs, in: Proceedings of the 7th Natural Language Interfaces for the Web of Data
(NLIWoD) co-located with the 19th European Semantic Web Conference (ESWC 2022),
Hersonissos, Greece, May 29th, 2022, volume 3196 of CEUR Workshop Proceedings,
CEURWS.org, 2022, pp. 9–24.
[27] S. Liu, S. J. Semnani, H. Triedman, J. Xu, I. D. Zhao, M. S. Lam, SPINACH: SPARQL-Based
Information Navigation for Challenging Real-World Questions, in: Findings of the ACL:
EMNLP 2024, Miami, Florida, USA, November 12-16, 2024, ACL, 2024, pp. 15977–16001.
doi:10.18653/V1/2024.FINDINGS-EMNLP.938.
[28] S. Cao, J. Shi, L. Pan, L. Nie, Y. Xiang, L. Hou, J. Li, B. He, H. Zhang, KQA Pro: A Dataset
with Explicit Compositional Programs for Complex Question Answering over Knowledge
Base, in: Proceedings of the 60th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, ACL,
2022, pp. 6101–6119. doi:10.18653/V1/2022.ACL-LONG.422.
[29] D. Keysers, N. Schärli, N. Scales, H. Buisman, D. Furrer, S. Kashubin, N. Momchev,
D. Sinopalnikov, L. Stafiniak, T. Tihon, D. Tsarkov, X. Wang, M. van Zee, O. Bousquet,
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data,
in: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa,
Ethiopia, April 26-30, 2020, OpenReview.net, 2020.
[30] F. Suárez, A. Hogan, Templet: A Collaborative System for Knowledge Graph Question
Answering over Wikidata, in: Companion Proceedings of the ACM Web Conference
2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, ACM, 2023, pp. 152–155.
doi:10.1145/3543873.3587335.
[31] A. Moya Loustaunau, A. Hogan, QAWiki v1: Knowledge Graph Question Answering
(KGQA) / SPARQL Query Generation Dataset for Wikidata (Version v1), Zenodo, 2025.
doi:10.5281/zenodo.16787599.
[32] J. Salas, A. Hogan, Semantics and canonicalisation of SPARQL 1.1, Semantic Web 13 (2022)
829–893. URL: https://doi.org/10.3233/SW-212871. doi:10.3233/SW-212871.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato, G. de Melo,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmelzeisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          , Knowledge Graphs,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2022</year>
          )
          <volume>71</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>71</lpage>
          :
          <fpage>37</fpage>
          . doi:
          <volume>10</volume>
          .1145/3447772.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandecic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . doi:
          <volume>10</volume>
          .1145/2629489.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. G. Ives,</surname>
          </string-name>
          <article-title>DBpedia: A Nucleus for a Web of Open Data, in: The Semantic Web</article-title>
          , 6th
          <source>International Semantic Web Conference</source>
          ,
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
          <fpage>3449</fpage>
          -
          <lpage>3460</lpage>
          . doi:
          <volume>10</volume>
          .18653/V1/P19-1335.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <article-title>Knowledge Graph Question Answering Datasets and Their Generalizability: Are They Enough for Future Research?</article-title>
          ,
          <source>in: SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Madrid, Spain,
          <source>July 11 - 15</source>
          ,
          <year>2022</year>
          , ACM,
          <year>2022</year>
          , pp.
          <fpage>3209</fpage>
          -
          <lpage>3218</lpage>
          . doi:
          <volume>10</volume>
          .1145/3477495.3531751.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          ,
          <article-title>Language to Logical Form with Neural Attention, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</article-title>
          ,
          <source>ACL 2016, August 7-12</source>
          ,
          <year>2016</year>
          , Berlin, Germany, Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers</given-names>
          </string-name>
          , The Association for Computer Linguistics,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .18653/V1/P16-1004.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. N.</given-names>
            dos
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Improved Neural Relation Detection for Knowledge Base Question Answering, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics</article-title>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          <year>2017</year>
          , Vancouver, Canada,
          <source>July 30 - August 4</source>
          , Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers</given-names>
          </string-name>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>571</fpage>
          -
          <lpage>581</lpage>
          . doi:
          <volume>10</volume>
          .18653/V1/ P17-1053.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Talmor</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Berant,</surname>
          </string-name>
          <article-title>The Web as a Knowledge-Base for Answering Complex Questions, in: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies</article-title>
          , NAACL-HLT
          <year>2018</year>
          , New Orleans, Louisiana, USA, June 1-6,
          <year>2018</year>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>ACL</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>641</fpage>
          -
          <lpage>651</lpage>
          . doi:
          <volume>10</volume>
          .18653/V1/N18-1059.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Suh,</surname>
          </string-name>
          <article-title>The Value of Semantic Parse Labeling for Knowledge Base Question Answering, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</article-title>
          ,
          <source>ACL 2016, August 7-12</source>
          ,
          <year>2016</year>
          , Berlin, Germany, Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers</given-names>
          </string-name>
          , The Association for Computer Linguistics,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          . 18653/V1/P16-2033.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <article-title>Large-scale Simple Question Answering with Memory Networks</article-title>
          ,
          <source>CoRR abs/1506</source>
          .
          <year>02075</year>
          (
          <year>2015</year>
          ).
          <source>arXiv:1506</source>
          .
          <year>02075</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Trivedi</surname>
          </string-name>
          , G. Maheshwari,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Lehmann,</surname>
          </string-name>
          <article-title>LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference</source>
          , Vienna, Austria,
          <source>October 21-25</source>
          ,
          <year>2017</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>10588</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2017</year>
          , pp.
          <fpage>210</fpage>
          -
          <lpage>218</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -68204-4\_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Haarmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Röder</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Napolitano, 7th Open Challenge on Question Answering over Linked Data (QALD-7)</article-title>
          , in: Semantic Web Challenges - 4th
          <source>SemWebEval Challenge at ESWC</source>
          <year>2017</year>
          , Portoroz, Slovenia, May 28 - June 1,
          <year>2017</year>
          , Revised Selected Papers, volume
          <volume>769</volume>
          of Communications in Computer and Information Science, Springer,
          <year>2017</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>69</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -69146-6\_6.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Perevalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kraft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reineke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Both</surname>
          </string-name>
          , QALD-10
          <article-title>- The 10th challenge on question answering over linked data: Shifting from DBpedia to Wikidata as a KG for KGQA</article-title>
          ,
          <source>Semantic Web</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>2193</fpage>
          -
          <lpage>2207</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-233471. arXiv:https://journals.sagepub.com/doi/pdf/10.3233/SW-233471.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>V.</given-names>
            <surname>Korablinov</surname>
          </string-name>
          , P. Braslavski,
          <article-title>RuBQ: A Russian Dataset for Question Answering over Wikidata</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2020 - 19th International Semantic Web Conference</source>
          , Athens, Greece, November 2-
          <issue>6</issue>
          ,
          <year>2020</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>12507</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>110</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -62466-8\_7.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>