<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Natural Language Querying for Humanities Knowledge Graphs: A Case Study on the GOLEM Knowledge Graph</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jose Maldonado-RodrÃŋguez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arianna Graciotti</string-name>
          <email>arianna.graciotti@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentina Presutti</string-name>
          <email>valentina.presutti@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Pianzola</string-name>
          <email>f.pianzola@rug.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LILEC, University of Bologna</institution>
          ,
          <addr-line>Via Cartoleria 5, 40124 Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Groningen</institution>
          ,
          <addr-line>Oude Kijk in Het Jatstraat 26, 9712 EK Groningen</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>02</volume>
      <issue>2025</issue>
      <abstract>
        <p>Large-scale Knowledge Graphs (KGs) are increasingly relevant for humanities research, yet querying them via SPARQL poses challenges for non-technical users. While Text-to-SPARQL studies predominantly target popular KGs such as Wikidata or DBpedia, domain-specific KGs remain underexplored. This paper introduces a bilingual (English-Spanish) dataset designed for evaluating automatic text-to-SPARQL translation on GOLEM, a humanities KG containing metadata and extracted features from fanfiction stories hosted on Archive of Our Own (AO3). The dataset includes 477 manually crafted natural language questions paired with gold SPARQL queries, augmented to 1,895 questions through automatic paraphrasing. We benchmark several Large Language Models (LLMs) with prompt-based approaches, particularly examining in-context learning methods that select prompt examples based on semantic similarity, which yield the best results. Error analysis highlights entity linking as essential for improving query generation. This work provides practical insights and opens pathways for future research on natural language interfaces for querying domain-specific KGs in Digital Humanities. The dataset and output of our experiments are available at: https://github.com/GOLEM-lab/GOLEM_Text-to-SPARQL.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Humanities Knowledge Graphs</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Text-to-SPARQL</kwd>
        <kwd>In-context learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>• A novel bilingual (English and Spanish) dataset containing 477 NL questions, each paired with
manually curated SPARQL queries, that can be used to interrogate the GOLEM KG SPARQL
endpoint. The dataset is augmented into 1,895 NL questions by automatically generated paraphrases.</p>
      <p>We find that few-shot learning methods improve performance by selecting prompt examples based
on semantic similarity to the input question. We provide a granular error analysis to identify current
method limitations, laying the basis for future improvements.</p>
      <p>The paper is structured as follows. In Section 2, we review existing text-to-SPARQL datasets,
summarise recent approaches leveraging LLM-based prompting methods for NL-to-SPARQL
translation, and position our work relative to these studies. Section 3 introduces GOLEM KG, describes
the methodology for dataset construction, and provides a qualitative and quantitative
characterisation of the resulting bilingual dataset. We then detail our experimental settings, including
prompting strategies, LLMs tested, computational infrastructure, and evaluation metrics. In Section 4, we
present experimental results along with an in-depth error analysis. Finally, we discuss conclusions
and directions for future work. Our dataset and the output of our experiments are available at:
https://github.com/GOLEM-lab/GOLEM_Text-to-SPARQL.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The transformation of NL questions into SPARQL queries (Text-to-SPARQL) is widely studied in
Knowledge Graph Question Answering (KGQA). However, existing Text-to-SPARQL methods do not
generalise well, as most research eforts rely on training or fine-tuning models [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] primarily on
mainstream KGs (e.g., Wikidata, DBpedia), restricting their transferability to domain-specific KGs.
Existing benchmarks focus on widely-used general KGs, with limited coverage of custom KGs: QALD-9
plus [6], QALD-10 [7], KQA Pro [8], LC-QuAD 2.0 [9] are based on Wikidata; LC-QuAD [10], QALD-9
on DBpedia; WebQuestionsSP [11], GraphQuestions [12], GrailQA [13], CFQ [14] on Freebase;
DBLPQUAD [15] on scholarly data.
      </p>
      <p>
        With the emergence of LLMs, Text-to-SPARQL became intensively explored as an NL interface
for KGs [16]. Prompt-based techniques fostered the development of Text-to-SPARQL resources and
experiments beyond mainstream KGs toward domain-specific use cases in healthcare, scholarly data, and
cultural heritage. Sivasubramaniam et al. [17] experimented in the medical sector with electronic health
records (EHRs) and showed that SPARQL is underrepresented in LLM pre-training data, highlighting its
complexity and demonstrating SPARQL as the most complex language for LLM-based query generation
across diferent settings (zero-shot, few-shot, presence or absence of KG schema information in prompts).
Sequeda et al. [
        <xref ref-type="bibr" rid="ref6">18</xref>
        ] focused on enterprise SQL schemas in the insurance sector. They identified an
advantage in deriving KG representations from relational databases. Their zero-shot approach involves
prompting LLMs with the OWL ontology describing the KG schema alongside instructions to generate
corresponding SPARQL queries for given NL inputs. They report improved LLM performance as
compared to direct database querying. The lack of KGQA benchmark datasets applicable beyond
mainstream KGs is also addressed in [
        <xref ref-type="bibr" rid="ref7">19</xref>
        ]. The authors introduce Spider4SPARQL, a benchmark dataset
comprising 10,181 manually curated NL questions paired with 5,693 distinct SPARQL queries at varying
levels of complexity. Mountantonakis et al. [
        <xref ref-type="bibr" rid="ref8">20</xref>
        ] propose an LLM-based method to translate NL questions
into SPARQL queries targeting cultural heritage KGs aligned with the ISO standard CIDOC-CRM
ontology. They construct a benchmark comprising 100 NL questions paired with corresponding SPARQL
queries applied to two real-world KGs representing artworks in the cultural heritage domain.
      </p>
      <p>
        Having discussed available datasets, we now turn to recent prompt-based Text-to-SPARQL approaches
leveraging LLMs. Zahera et al. [
        <xref ref-type="bibr" rid="ref9">21</xref>
        ] employed Chain-of-Thought prompting with in-context learning
and semantically similar examples, enhancing query precision and syntactic flexibility. Dabramo et
al. [
        <xref ref-type="bibr" rid="ref10">22</xref>
        ] proposed Dynamic Few-Shot Learning, combining semantic similarity with in-context learning,
achieving robust results across KGQA benchmarks. Meyer et al. [
        <xref ref-type="bibr" rid="ref11">23</xref>
        ] presented LLM-KG-Bench, a
framework evaluating the baseline SPARQL SELECT capabilities of LLMs on standard KGQA datasets.
Avila et al. [
        <xref ref-type="bibr" rid="ref12 ref13">24, 25</xref>
        ] introduced Auto-KGQA and Auto-KGQAGPT, frameworks autonomously selecting
smaller KG fragments to reduce token usage in prompts without performance loss, validated through
experiments utilizing GTP3.5 Turbo, GTP-4, and GPT-4 Turbo.
      </p>
      <p>
        In this work, we test LLMs’ capability on the Text-to-SPARQL task. In particular, we evaluate
the flexibility of in-context learning with semantic search, avoiding costly fine-tuning and extensive
collection of NL-SPARQL examples. With this objective, we adapt Dynamic Few-Shot Learning [
        <xref ref-type="bibr" rid="ref10">22</xref>
        ] to
automatically generate SPARQL queries from NL questions that can be answered by interrogating a
custom KG in the Digital Humanities realm, the GOLEM KG [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We partially address the limitations
of [
        <xref ref-type="bibr" rid="ref10">22</xref>
        ], for example, their focus on English-only input, by testing on a bilingual dataset (English and
Spanish) and experimenting with smaller LLMs. Our approach aims to reduce technical barriers in
KGQA, facilitate NL interfaces, and broaden KGs’ accessibility for humanities researchers beyond their
knowledge of the SPARQL querying language.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        The dataset construction has been curated manually by a Language and Communication Technologies
master student (author of this paper) as part of a curricular research internship. The annotator is
proficient in English and is a native Spanish speaker. This section describes our approach, introducing
GOLEM KG [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which is the core of our case study. We cover how we constructed the dataset, the
prompting strategies adopted, and our evaluation framework.
      </p>
      <sec id="sec-3-1">
        <title>3.1. The GOLEM Knowledge Graph</title>
        <p>
          The GOLEM KG contains metadata and extracted features of fanfiction stories in various languages
from the popular online platform Archive of Our Own (AO3) [
          <xref ref-type="bibr" rid="ref14">26</xref>
          ]. Metadata about works are provided
by fanfiction authors and include common metadata such as author, title, publication date, as well as a
wide array of additional information, such as content tags, characters appearing in the story, and their
relationships. In addition, information on narrative and stylistic elements, as well as reader response
data (such as syntactic complexity and lexical richness), has been added. The formal ontology used to
model the data combines existing ontologies with new classes and properties specific to the domain
of narrative and fiction [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. For example, the lrm:F1_Work class is defined by LRMoo, an extension of
the ISO standard for cultural heritage CIDOC-CRM 1; there are DCMI Metadata Terms, e.g. dct:title or
dct:creator 2; and new classes like gc:G1_Character (a crm:E89_Propositional_Object). A description of
other predicates used in the KG can be found in [
          <xref ref-type="bibr" rid="ref15">27</xref>
          ]. GOLEM KG is available via a search interface3
and a public SPARQL endpoint4.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset construction</title>
        <p>Since the data in the GOLEM KG is multilingual, and potential end-users may want to query the
KG in their native language, we aim to develop a multilingual dataset. We constructed NL questions
in English and Spanish and paired them with their corresponding SPARQL query. We formulated the
questions based on typical queries that end users interested in fanfiction might ask of the KG. Our
methodology can be extended to additional languages. Dataset statistics are reported in Table 1. The
following paragraphs describe how the dataset was constructed based on the included question types.
1https://cidoc-crm.org//extensions/lrmoo/html/LRMoo_v1.0.html
2https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
3http://search.golemlab.eu:3006/en/
4http://graph.golemlab.eu:8890/sparql</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Template questions</title>
          <p>The first steps towards creating a multilingual dataset of NL question-SPARQL query pairs in English
and Spanish involved producing 39 generic template questions per language. Spanish questions are
translations of English questions. These templates use placeholders instead of specific KG entities.
There are several types of placeholders, each referring to one GOLEM predicate:
• [[fandom]], corresponding to https://golemlab.eu/graph/fandom;
• [[story]], corresponding to https://golemlab.eu/graph/title;
• [[character]], corresponding to https://golemlab.eu/graph/character;
• [[keyword]], corresponding to https://golemlab.eu/graph/keyword.</p>
          <p>Below is one example of a template question per language:
• How many [[fandom]] stories are there?
• ¿CuÃąntas historias de [[fandom]] hay publicadas?</p>
          <p>Among the template questions, some do not contain placeholders because they do not require the
inclusion of any KG entities. Below is one example per language:
• How many stories are tagged as explicit?
• ¿CuÃąntas historias estÃąn marcadas como explÃŋcitas?</p>
          <p>Before execution on the KG’s SPARQL endpoint, these generic template questions require an
instantiation step, described in the following paragraph.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Instantiated questions</title>
          <p>We instantiate the template questions by replacing placeholders with actual entities from the KG when
required. These entities are obtained through a series of simple queries on the KG, performing one
query per placeholder type. Below is an example of a query built for this purpose:
prefix golem: &lt;https://golemlab.eu/graph/&gt;
SELECT DISTINCT ?fandom
WHERE {</p>
          <p>?story golem:fandom ?fandom .</p>
          <p>}</p>
          <p>The resulting lists of instances are then used at random to replace the placeholders in the dataset,
both in the NL question and its corresponding SPARQL query.</p>
          <p>We refer to the resulting questions as instantiated questions, which can directly query the KG via the
SPARQL endpoint. We created 10 instantiated questions for each template question. Each instantiated
question has been paired with a manually crafted SPARQL query. If the corresponding SPARQL query
generated an empty answer, the sample was removed. Below, we report two examples:
• How many 1984 - George Orwell stories are there?
prefix golem: &lt;https://golemlab.eu/graph/&gt;
prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt;
SELECT (COUNT(DISTINCT ?story) as ?uploads)
WHERE {</p>
          <p>?story golem:fandom "1984 - George Orwell" .
• ¿CuÃąntas historias de Valley of Tears (TV) hay publicadas?
prefix golem: &lt;https://golemlab.eu/graph/&gt;
prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt;
SELECT (COUNT(DISTINCT ?story) as ?uploads)
WHERE {</p>
          <p>?story golem:fandom "Valley of Tears (TV)" .
prefix golem: &lt;https://golemlab.eu/graph/&gt;
prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt;
SELECT COUNT(?story) as ?explicit_stories
WHERE {</p>
          <p>?story golem:rating "Explicit" .</p>
          <p>}</p>
          <p>The template questions that do not contain any placeholders do not need to undergo any instantiation
process and are directly associated with the corresponding SPARQL queries, as in the example below:</p>
          <p>The two questions in English and Spanish reported in the above example are semantically equivalent
(the Spanish one is the translation of the English one) and are therefore paired with the same SPARQL
query.</p>
          <p>The SPARQL queries corresponding to the instantiated questions are executed against the GOLEM
KG SPARQL endpoint, and their responses are collected and evaluated.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Paraphrased questions</title>
          <p>Instantiated questions are paraphrased via data augmentation using deepseek-r1-7b. However,
Spanish questions are frequently misparaphrased into English. We address this by verifying the
correctness of the paraphrased questions’ language using a cross-check that implements the language
detection library lingua5. Paraphrases detected by lingua in languages difering from the manually
annotated source are discarded.6 Below, we list some examples of paraphrased questions:
• What is the number of George Orwell 1984 works available?
• How many George Orwell’s 1984 pieces exist?
• ¿CuÃąntas narrativas de *Valley of Tears* (TV) existen ?</p>
          <p>The prompt used in the paraphrasing step can be seen in A.1.1.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Experimental Setting</title>
        <p>
          In this section, we first introduce the prompting approaches tested (Zero-shot, Naive Few-shot, and
an adaptation of Dynamic Few-shot Learning [
          <xref ref-type="bibr" rid="ref10">22</xref>
          ]). Then, we introduce the models used. Finally, we
describe the evaluation metrics chosen.
5https://github.com/pemistahl/lingua-py
6We discarded 457 paraphrases: 25 expected in English but detected otherwise; 450 expected in Spanish but detected otherwise.
        </p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Prompting approaches</title>
          <p>Zero-shot In the Zero-shot (ZS) prompting approach, we include in the prompt only the task
instructions and the list of predicates from the KG. We report an example of ZS prompt per language in
Appendix A.1.2.</p>
          <p>Naive few-shots In the Naive Few-shot (NFS) prompting approach, we include in the prompt the
task instructions and the list of predicates from the KG. Plus, we include three random examples in the
prompt, taken from the same language as the question being processed. We report an example of an
NFS prompt per language in Appendix A.1.3.</p>
          <p>
            Dynamic Few-shot Learning (adapted) In the adapted Dynamic Few-shot Learning (a-DFSL)
prompting approach, we include in the prompt the task instructions and the list of predicates from the
KG. Additionally, adapting what is described in [
            <xref ref-type="bibr" rid="ref10">22</xref>
            ], we also include examples from our dataset. To select
the examples, we encode all dataset questions using paraphrase-multilingual-mpnet-base-v2
as a sentence encoder model. The input question is then encoded and compared against the dataset
using cosine similarity. The 3 most similar questions are selected as examples. To simulate real-world
complexity, questions sharing the same SPARQL query as the input are excluded. Example selection is
constrained to samples in the same language as the input question. Contrary to [
            <xref ref-type="bibr" rid="ref10">22</xref>
            ], we do not provide
in the prompt the gold relation and the gold entity to be used in the target query. We report prompt
examples for an English and a Spanish question in Appendix A.1.4.
3.3.2. Models
paraphrase-multilingual-mpnet-base-v2 This sentence embedding model7 is used in the scope
of this work to perform the semantic search required to select the most similar examples to the
input sentence for inclusion in the a-DFSL prompt. We chose this model because of its multilingual
specialisation.
deepseek-coder-v2 This open-weight Mixture-of-Experts (MoE) LLM8 specializes in coding and
mathematical tasks. We pull it from its Ollama repository in its 16B-parameter version. In our work, it
is used to transform the NL questions into SPARQL queries via diferent prompting approaches.
deepseek-r1:7b and :70b We use these reasoning open-weight LLMs to transform NL questions into
SPARQL queries via diferent prompting approaches. We pull them from the related Ollama repository 9
in their 7B and 70B parameters versions.
llama3.1:70b We use this open-weight LLM to transform NL questions into SPARQL queries using
diferent prompting approaches. We pull it from its Ollama repository 10.
3.3.3. System
Experiments were conducted on a server equipped with an Intel i9-11900KF CPU, 128GB RAM, and the
GPUs NVIDIA GeForce RTX 3090 (24GB VRAM).
7Available at https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2
8Available at https://ollama.com/library/deepseek-coder-v2
9Available at https://ollama.com/library/deepseek-r1
10Available at https://ollama.com/library/llama3.1:70b
3.3.4. Metrics
We evaluate performance by comparing the results obtained by executing the gold SPARQL queries
in our dataset against the GOLEM KG with those obtained by executing the automatically generated
SPARQL queries on the same KG on a per-sample basis. For each sample, an exact string match between
the two responses yields a true positive (TP). If the generated response is non-empty yet difers from
the ground truth, a false positive (FP) and a false negative (FN) are recorded; if the generated response
is empty, only FN is incremented.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>In this section, we present the results of our experiments, and we perform a detailed analysis of the
errors.</p>
      <sec id="sec-4-1">
        <title>4.1. Results</title>
        <p>Precision consistently exceeds recall across all models and prompting approaches, indicating more
frequent query failures or empty results rather than incorrect answers. For all the models tested across
all prompting approaches (except the zero-shot approach, which is less relevant due to its extremely
low performance), performance on the English-language subset is better than on the Spanish-language
subset.</p>
        <p>
          Thus, we select deepseek-coder-v2 (a-DFSL) for further experiments on the augmented dataset
(Table 3). Performance notably declines on augmented data due to entity alterations introduced by
paraphrasing. Entity mismatches between paraphrases and KG canonical forms cause the SPARQL
query to obtain erroneous results. Such results closely align with the ablation studies performed in the
original DFSL study [
          <xref ref-type="bibr" rid="ref10">22</xref>
          ], which reported lower accuracy on QALD-9 DB (49.59% accuracy) when the
prompt did not include the gold entities and relations. Future work should include entity linking to
explicitly incorporate entities’ canonical forms in prompts.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Error Analysis</title>
        <p>We conducted a systematic error analysis on all 70 errors produced by our best-performing model
(deepseek-coder-v2), which optimises the accuracy-runtime tradeof. Notably, our analysis could not
spot the "triple-flip" errorâĂŤa well-documented phenomenon in text-to-SPARQL conversion where
subject and object positions are reversed in generated triples, resulting in empty or incorrect query
results. The identified errors were instead categorised into two primary classes: (i) Failed execution,
covering queries unable to execute successfully against the SPARQL endpoint (e.g., QueryBadFormed,
timeouts), and (ii) Incorrect results, referring to queries executed successfully but returning answers
difering from the gold annotations. The second macro-category was further divided into specific error
types (see Table 4), namely Failed KG Entity Recognition, where the model incorrectly recognized entities
from the NL question; Wrong Predicate, where the model selected an incorrect predicate among those
provided in the prompt; Incomplete Query, where generated queries lacked suficient complexity to
retrieve all expected results; and SPARQL Syntax Error, involving syntactic mistakes in the generated
query that led to incorrect results. Additionally, we identified errors attributable to the evaluation
approach itself (Evaluation Method Errors), where queries returned correct results but were mistakenly
lfagged as wrong due to minor variations in variable naming or result grouping. Future work will
address these evaluation inaccuracies.</p>
        <p>Table 5 illustrates each error type among queries executed successfully but returning incorrect results.
In the first example, comparing the gold SPARQL query with the generated one reveals that both queries
are structurally identical, difering only in the KG entity used as the object of predicate golem:title.
The generated query incorrectly uses the entity "Wolfstar prompts" instead of the correct entity
"(fanart) Wolfstar prompts", causing an empty (incorrect) result.</p>
        <p>Another common issue is represented by queries of the error type Incomplete Query, exemplified
by subjective NL questions such as "What is Forbidden Like The Forest about?", reported as the second
example in the table. The annotator associated this question with a highly articulated SPARQL query,
which retrieves multiple detailed elements (e.g., keywords, romantic categories, content warnings,
collections, series, and summaries). In contrast, the model produced a minimal query, interpreting the
ambiguous phrase "about" simply as retrieving the story’s summary. This discrepancy highlights a
modelling issue where subjective interpretations by annotators may lead to overly complex gold standard
queries compared to the minimalistic outputs generated by the model. In future work, annotation
guidelines will be refined to address this kind of subjectivity.</p>
        <p>A final error type illustrated in the table is Wrong Predicate, exemplified by the fourth and last
query pair. The gold SPARQL query correctly uses the predicate golem:author, whereas the
modelgenerated query mistakenly selects the predicate dc:creator. In such cases, the model erroneously
selects a predicate from among those provided in the prompt or hallucinates predicates that do not exist
in the KG, leading to incorrect or incomplete results.</p>
        <p>To analyse the errors of the run on the paraphrased question dataset, reported in Table 3, we selected a
sample of 30 erroneous cases. In this sample, we do not consider those errors due to the failed execution
of the query. Questions were chosen to maximise semantic diversity. Semantic embeddings of questions
were computed using the same model used for selecting a-DFSL examples. A greedy selection method
then iteratively picked the most semantically distinct questions, measured by cosine distance, ensuring
coverage of the broadest possible range of error cases. This approach allowed us to eficiently maximise
the variety of errors considered while minimising redundancy in our analysis.</p>
        <p>As we report in Table 4.2, unlike the analysis of instantiated questions, no instances of Incomplete
Query errors emerged, possibly due to the smaller sample size. However, we identified a new error
category, Misleading Paraphrase, grouping queries arising from flawed or unclear paraphrased questions.
Typical examples include nonsensical, mixed-language, or malformed paraphrases such as “It never
entered my mind’s key points or topics”, “¿CÃşmo many kudos se le otorgan a If only?”, or “¿QuÃľ pÃąjaros
aparecen en If You Could Be Anywhere?”. Future analyses will focus on addressing this error type through
improved paraphrase quality control.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Limitations and Future Work</title>
      <p>One limitation of our dataset construction is that the questions were not directly informed by domain
experts or end-users from computational literature or fanfiction communities. Instead, they were
primarily inspired by information already present in the KG, which could potentially introduce bias and
limit method efectiveness. Future work should involve gathering explicit requirements from target
endusers to create more representative queries. Additionally, the dataset augmentation process produced
paraphrases with varying linguistic quality, including language inconsistencies such as mixed-language
outputs (e.g., English instead of expected Spanish). To address this, future eforts should integrate
automatic quality control mechanisms, such as leveraging LLMs as evaluators, or conduct manual
assessments to enhance dataset quality. Some collected NL questions exhibit subjective interpretations,
for example, queries about the content of stories, which lead to subjective gold SPARQL annotations.
Future work will refine the annotation guidelines to minimise ambiguity.</p>
      <p>Another limitation concerns the systemâĂŹs generalizability beyond the semi-templated scope of the
constructed dataset. While the tested methodology demonstrates efectiveness within the instantiated
questions, its performance degrades on paraphrased variants, indicating limited robustness to linguistic
variation. Furthermore, its behaviour in more open-ended or less structured query scenarios remains
underexplored. Future work will involve evaluating system performance on more naturalistic and
user-authored queries.</p>
      <p>We acknowledge the explicit predicate listing as a simplification. In deployed applications, these
would be stored as pre-defined variables hidden from end-users. Future work should examine how
prompt complexity afects performance and explore more compact schema representations.</p>
      <p>
        The best-performing prompting approach (a-DFSL) is a simplified version of the original DFSL [
        <xref ref-type="bibr" rid="ref10">22</xref>
        ]
implementation. Future improvements include a more mature, sophisticated prompting strategy,
beginning with automatic entity linking. Entity linking by either LLM or specialized entity linkers
can identify and inject KG entities in their canonicalised form directly into the prompts, reducing
recognition errors. More advanced examples of retrieval methods will also be explored.
      </p>
      <p>The current evaluation strategy compares answers from generated and gold SPARQL queries, but has
limitations. Queries returning identical results with diferent groupings or slight variable diferences
are incorrectly marked as errors. Future work will refine the evaluation method for greater flexibility
and accuracy.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        In this work, we introduced a bilingual dataset of NL in English and Spanish questions paired with
corresponding SPARQL queries. The dataset targets the GOLEM KG, containing metadata and features
extracted from fanfiction hosted on the Archive of Our Own (AO3) platform. The dataset comprises
instantiated questions, manually crafted and automatically populated with KG entities, and an augmented
version generated via automatic paraphrasing. We used the dataset to benchmark various LLMs on
the text-to-SPARQL task, exploring several prompting strategies. An adapted, simplified version of
DFSL [
        <xref ref-type="bibr" rid="ref10">22</xref>
        ], which selects prompt examples via semantic similarity to input questions, demonstrated
superior performance. Error analysis revealed that integrating entity linking is critical to improving
query generation quality. This case study in Digital Humanities provides practical insights and suggests
pathways for future research on NL interfaces for querying knowledge graphs through text-to-SPARQL
methods.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This work is part of the Graphs and Ontologies for Literary Evolution Models (GOLEM) project funded by
the European Commission. Jose Maldonado-RodrÃŋguez is supported by the Erasmus Mundus Masters
Program in Language and Communication Technologies (LCT), EU grant no. 2019-1508. Arianna
Graciotti is supported by the European UnionâĂŹs Horizon 2020 research and innovation programme
under grant agreement No 101004746.
[6] A. Perevalov, D. Diefenbach, R. Usbeck, A. Both, Qald-9-plus: A multilingual dataset for question
answering over dbpedia and wikidata translated by native speakers, in: 2022 IEEE 16th
International Conference on Semantic Computing (ICSC), 2022, pp. 229–234. doi:10.1109/ICSC52841.
2022.00045.
[7] L.-A. Kafee, S. Razniewski, P. Vougiouklis, R. Usbeck, X. Yan, A. Perevalov, L. Jiang, J. Schulz,
A. Kraft, C. MÃűller, J. Huang, J. Reineke, A.-C. N. Ngomo, M. Saleem, A. Both, Qald-10 âĂŞ
the 10th challenge on question answering over linked data: Shifting from dbpedia to wikidata
as a kg for kgqa, Semantic Web 15 (2024) 2193–2207. URL: https://doi.org/10.3233/SW-233471.
doi:10.3233/SW-233471. arXiv:https://doi.org/10.3233/SW-233471.
[8] S. Cao, J. Shi, L. Pan, L. Nie, Y. Xiang, L. Hou, J. Li, B. He, H. Zhang, KQA pro: A dataset
with explicit compositional programs for complex question answering over knowledge base, in:
S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational
Linguistics, Dublin, Ireland, 2022, pp. 6101–6119. URL: https://aclanthology.org/2022.acl-long.422/.
doi:10.18653/v1/2022.acl-long.422.
[9] M. Dubey, D. Banerjee, A. Abdelkawi, J. Lehmann, Lc-quad 2.0: A large dataset for complex
question answering over wikidata and dbpedia, in: C. Ghidini, O. Hartig, M. Maleshkova, V. Svátek,
I. Cruz, A. Hogan, J. Song, M. Lefrançois, F. Gandon (Eds.), The Semantic Web – ISWC 2019,
Springer International Publishing, Cham, 2019, pp. 69–78.
[10] P. Trivedi, G. Maheshwari, M. Dubey, J. Lehmann, Lc-quad: A corpus for complex question
answering over knowledge graphs, in: C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudré-Mauroux,
J. Sequeda, C. Lange, J. Heflin (Eds.), The Semantic Web – ISWC 2017, Springer International
Publishing, Cham, 2017, pp. 210–218.
[11] J. Berant, A. Chou, R. Frostig, P. Liang, Semantic parsing on Freebase from question-answer
pairs, in: D. Yarowsky, T. Baldwin, A. Korhonen, K. Livescu, S. Bethard (Eds.), Proceedings
of the 2013 Conference on Empirical Methods in Natural Language Processing, Association
for Computational Linguistics, Seattle, Washington, USA, 2013, pp. 1533–1544. URL: https://
aclanthology.org/D13-1160/.
[12] Y. Su, H. Sun, B. Sadler, M. Srivatsa, I. Gür, Z. Yan, X. Yan, On generating characteristic-rich
question sets for QA evaluation, in: J. Su, K. Duh, X. Carreras (Eds.), Proceedings of the 2016
Conference on Empirical Methods in Natural Language Processing, Association for Computational
Linguistics, Austin, Texas, 2016, pp. 562–572. URL: https://aclanthology.org/D16-1054/. doi:10.
18653/v1/D16-1054.
[13] Y. Gu, S. Kase, M. Vanni, B. Sadler, P. Liang, X. Yan, Y. Su, Beyond i.i.d.: Three levels of generalization
for question answering on knowledge bases, in: Proceedings of the Web Conference 2021, WWW
’21, Association for Computing Machinery, New York, NY, USA, 2021, p. 3477âĂŞ3488. URL:
https://doi.org/10.1145/3442381.3449992. doi:10.1145/3442381.3449992.
[14] D. Keysers, N. SchÃďrli, N. Scales, H. Buisman, D. Furrer, S. Kashubin, N. Momchev, D. Sinopalnikov,
L. Stafiniak, T. Tihon, D. Tsarkov, X. Wang, M. van Zee, O. Bousquet, Measuring compositional
generalization: A comprehensive method on realistic data, 2020. URL: https://arxiv.org/abs/1912.
09713. arXiv:1912.09713.
[15] D. Banerjee, S. Awale, R. Usbeck, C. Biemann, DBLP-QuAD: A question answering dataset over
the DBLP scholarly knowledge graph, in: BIR 2023: 13th International Workshop on
Bibliometricenhanced Information Retrieval at ECIR 2023, 2023.
[16] H. Khorashadizadeh, F. Z. Amara, M. Ezzabady, F. Ieng, S. Tiwari, N. Mihindukulasooriya, J. Groppe,
S. Sahri, F. Benamara, S. Groppe, Research trends for the interplay between large language models
and knowledge graphs, 2024. URL: https://arxiv.org/abs/2406.08223. arXiv:2406.08223.
[17] S. Sivasubramaniam, C. E. Osei-Akoto, Y. Zhang, K. Stockinger, J. Fuerst,
Sm3-textto-query: Synthetic multi-model medical text-to-query benchmark, in: A.
Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, C. Zhang (Eds.),
Advances in Neural Information Processing Systems, volume 37, Curran Associates, Inc.,
2024, pp. 88627–88663. URL: https://proceedings.neurips.cc/paper_files/paper/2024/file/</p>
    </sec>
    <sec id="sec-9">
      <title>A. Appendix</title>
      <sec id="sec-9-1">
        <title>A.1. Prompting approaches</title>
        <p>In the sections below, we report the prompts used in the diferent experimental settings tested in this
paper.</p>
        <sec id="sec-9-1-1">
          <title>A.1.1. Data augmentation prompt examples ENGLISH</title>
          <p>1 Generate 10 paraphrases for the following text, making the responses short and
querylike. Please do not say anything after &lt;/think&gt;, only the paraphrases. If there are
any words within double square brackets (such as [[story]] or [[character]] ),
please do not modify them.:
2
3 How many 1984 - George Orwell stories are there?</p>
          <p>Listing 1: English question ata augmentation prompt example</p>
          <p>SPANISH
1 Generate 10 paraphrases for the following text, making the responses short and
querylike. Please do not say anything after &lt;/think&gt;, only the paraphrases. If there are
any words within double square brackets (such as [[story]] or [[character]] ),
please do not modify them.:
2
3 Â£CuÃąntas historias de Valley of Tears (TV) hay publicadas?</p>
          <p>Listing 2: English question ata augmentation prompt example</p>
          <p>Listing 3: English question ZS Prompt Example</p>
          <p>Listing 4: Spanish question NFS Prompt Example</p>
          <p>Question: How many chapters does Guardian of Hogwarts have?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT DISTINCT ?chapters WHERE { ?
story golem:title "Guardian of Hogwarts" . ?story golem:numberOfChapters ?
chapters . }
&lt;/SPARQL&gt;
Question: What is the average number of comments for stories from Glory of the</p>
          <p>Special Forces (TV)?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT AVG(?comments) AS ?
avg_number_of_comments WHERE { ?story golem:fandom "Glory of the Special Forces
(TV)" . ?story golem:numberOfComments ?comments . }
&lt;/SPARQL&gt;
Question: Is the story Luna Lovegood and the Chamber of Innocence completed?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT DISTINCT ?status WHERE { ?
story golem:title "Luna Lovegood and the Chamber of Innocence" . ?story golem:
publicationStatus ?status . }
&lt;/SPARQL&gt;
77
78
79 ###
80
81
82 Question: How many 1984 - George Orwell stories are there?
83
84 Query:</p>
          <p>Listing 5: English question NFS Prompt Example</p>
        </sec>
        <sec id="sec-9-1-2">
          <title>SPANISH</title>
          <p>1 Your task is to translate a question in natural language into a SPARQL query for the</p>
          <p>GOLEM knowledge graph.
2 The query must follow these guidelines:
3 1. SPARQL queries must include the following prefix:
4 prefix golem: &lt;https://golemlab.eu/graph/&gt;
5 2. Enclose SPARQL queries within &lt;SPARQL&gt; &lt;/SPARQL&gt; tags.</p>
          <p>Question: Â£CuÃąntos capÃŋtulos tiene this title is a wip of which i hate, like this
fic and my life?
53
54
55
56</p>
          <p>Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT DISTINCT ?chapters WHERE { ?
story golem:title "this title is a wip of which i hate, like this fic and my
life" . ?story golem:numberOfChapters ?chapters . }
&lt;/SPARQL&gt;
60
61
62
Question: Â£EstÃą completada la historia A Small Steep Valley?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT DISTINCT ?status WHERE { ?
story golem:title "A Small Steep Valley" . ?story golem:publicationStatus ?
status . }
&lt;/SPARQL&gt;
77
78
79 ###
80
81
82 Question: Â£CuÃąntas historias de Valley of Tears (TV) hay publicadas?
83
84 Query:</p>
          <p>Listing 6: Spanish question NFS Prompt Example</p>
        </sec>
        <sec id="sec-9-1-3">
          <title>A.1.4. a-DFSL Prompt Examples</title>
          <p>Below are two examples of prompts for transforming natural language questions into SPARQL queries,
one in English and one in Spanish.</p>
        </sec>
        <sec id="sec-9-1-4">
          <title>ENGLISH</title>
          <p>1 Your task is to translate a question in natural language into a SPARQL query for the</p>
          <p>GOLEM knowledge graph.
2 The query must follow specific guidelines to ensure accuracy and correctness:
3 1. SPARQL queries must include the following prefix:
4 prefix golem: &lt;https://golemlab.eu/graph/&gt;
5 2. Enclose SPARQL queries within &lt;SPARQL&gt; &lt;/SPARQL&gt; tags.
6 3. You must generate 1 query(ies).
7 4. It is very important that you use only the predicates provided below.
8 5. Examples are provided below for guidance.
9
10 ###
11 Predicates:
12 https://golemlab.eu/graph/numberOfComments
13 https://golemlab.eu/graph/numberOfKudos
14 https://golemlab.eu/graph/publicationStatus
15 https://golemlab.eu/graph/dateModified
16 https://golemlab.eu/graph/characters</p>
          <p>Question: How many stories are there on Archive of Our Own?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT COUNT(DISTINCT ?story) as ?
stories WHERE { ?story golem:story_id ?id }
&lt;/SPARQL&gt;
###
Question: How many åĳĂåŸŐ stories are there?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT (COUNT(DISTINCT ?story) as ?
uploads) WHERE { ?story golem:fandom "åĳĂåŸŐ" . }
&lt;/SPARQL&gt;
###</p>
          <p>Question: How many Kudos do Mr.Vampire (1985) stories get on average?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT AVG(?kudos) AS ?
average_number_of_kudos WHERE { ?story golem:fandom "Mr.Vampire (1985)" . ?story
golem:numberOfKudos ?kudos . }
&lt;/SPARQL&gt;</p>
          <p>Listing 7: English question a-DFSL Prompt Example</p>
        </sec>
        <sec id="sec-9-1-5">
          <title>SPANISH</title>
          <p>1 Your task is to translate a question in natural language into a SPARQL query for the</p>
          <p>GOLEM knowledge graph.
2 The query must follow specific guidelines to ensure accuracy and correctness:
3 1. SPARQL queries must include the following prefix:
4 prefix golem: &lt;https://golemlab.eu/graph/&gt;
5 2. Enclose SPARQL queries within &lt;SPARQL&gt; &lt;/SPARQL&gt; tags.
6 3. You must generate 1 query(ies).
7 4. It is very important that you use only the predicates provided below.
8 5. Examples are provided below for guidance.
9
10 ###
11 Predicates:
12 https://golemlab.eu/graph/numberOfComments
13 https://golemlab.eu/graph/numberOfKudos
14 https://golemlab.eu/graph/publicationStatus
15 https://golemlab.eu/graph/dateModified
16 https://golemlab.eu/graph/characters
17 https://golemlab.eu/graph/collections
18 https://golemlab.eu/graph/fandom
19 https://golemlab.eu/graph/publisher
20 https://golemlab.eu/graph/rating
21 https://golemlab.eu/graph/series
22 https://golemlab.eu/graph/story_id
23 https://golemlab.eu/graph/summary
24 https://golemlab.eu/graph/numberOfChapters
25 https://golemlab.eu/graph/datePublished
26 https://golemlab.eu/graph/keyword
27 https://golemlab.eu/graph/contentWarning
28 https://golemlab.eu/graph/numberOfWords
29 https://golemlab.eu/graph/socialRelationships
30 https://golemlab.eu/graph/datePackaged
31 https://golemlab.eu/graph/romanticCategory
32 https://golemlab.eu/graph/noOfPairings
33 https://golemlab.eu/graph/topPartner
34 https://golemlab.eu/graph/topPartnerPairings
35 https://golemlab.eu/graph/averageWordLength
36 https://golemlab.eu/graph/MSTTR
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT (COUNT(DISTINCT ?story) as ?
uploads) WHERE { ?story golem:fandom "[[fandom]]" . }
&lt;/SPARQL&gt;
Question: Â£CuÃąntas historias se publican al aÃśo?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT (COUNT(DISTINCT ?story) as ?
uploads) WHERE { ?story golem:story_id ?id }
&lt;/SPARQL&gt;
Question: Â£CuÃąntos autores han publicado alguna historia?
Query:
&lt;SPARQL&gt;
prefix golem: &lt;https://golemlab.eu/graph/&gt; prefix dc: &lt;http://purl.org/dc/terms/&gt;
prefix gc: &lt;https://ontology.golemlab.eu/&gt; SELECT (COUNT(DISTINCT ?author) as ?
uploads) WHERE { ?story golem:author ?author . }
&lt;/SPARQL&gt;</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. DâĂŹamato, G. D.
          <string-name>
            <surname>Melo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>J. E. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>A.-C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Rashid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Schmelzeisen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.1145/3447772. doi:
          <volume>10</volume>
          .1145/3447772.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F. P.</given-names>
            <surname>Franziska</surname>
          </string-name>
          <string-name>
            <surname>Pannach</surname>
          </string-name>
          , Luotong Cheng,
          <article-title>The golem knowledge graph: Exploring fanfiction narratives through structured data</article-title>
          , in: W. Haverals,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koolen</surname>
          </string-name>
          , L. Thompson (Eds.),
          <source>Proceedings of the Computational Humanities Research Conference</source>
          <year>2024</year>
          , volume
          <volume>3834</volume>
          <source>of CEUR Workshop Proceedings, CEUR Workshop Proceedings (CEUR-WS.org)</source>
          , Aarhus, Denmark,
          <year>2024</year>
          , pp.
          <fpage>462</fpage>
          -
          <lpage>471</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3834</volume>
          /paper80.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , J. Ma,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sui</surname>
          </string-name>
          ,
          <article-title>A survey on in-context learning</article-title>
          , in: Y.
          <string-name>
            <surname>Al-Onaizan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>Y.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>1107</fpage>
          -
          <lpage>1128</lpage>
          . URL: https://aclanthology. org/
          <year>2024</year>
          .emnlp-main.
          <volume>64</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .emnlp-main.
          <volume>64</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <article-title>Modern baselines for sparql semantic parsing</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>2260âĂŞ2265</fpage>
          . URL: https://doi.org/10.1145/3477495.3531841. doi:
          <volume>10</volume>
          .1145/ 3477495.3531841.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Enhancing sparql query generation for knowledge base question answering systems by learning to correct triplets</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>14</volume>
          (
          <year>2024</year>
          ). URL: https://www.mdpi.com/2076-3417/14/4/1521. doi:
          <volume>10</volume>
          .3390/app14041521. a182a8e6ebc91728b6e6b6382c9f7b1e-Paper-Datasets_and_Benchmarks_Track.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Allemang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <article-title>A benchmark to understand the role of knowledge graphs on large language model's accuracy for question answering on enterprise sql databases</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2311.07509. arXiv:
          <volume>2311</volume>
          .
          <fpage>07509</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kosten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>CudrÃľ-Mauroux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stockinger</surname>
          </string-name>
          ,
          <article-title>Spider4sparql: A complex benchmark for evaluating knowledge graph question answering systems</article-title>
          ,
          <source>in: 2023 IEEE International Conference on Big Data (BigData)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>5272</fpage>
          -
          <lpage>5281</lpage>
          . doi:
          <volume>10</volume>
          .1109/BigData59044.
          <year>2023</year>
          .
          <volume>10386182</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mountantonakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <article-title>Generating sparql queries over cidoc-crm using a two-stage ontology path patterns method in llm prompts</article-title>
          ,
          <source>J. Comput. Cult. Herit</source>
          .
          <volume>18</volume>
          (
          <year>2025</year>
          ). URL: https: //doi.org/10.1145/3708326. doi:
          <volume>10</volume>
          .1145/3708326.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [21]
          <string-name>
            <surname>H. M. Zahera</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Sherif</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Moussallem</surname>
            ,
            <given-names>A.-C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <article-title>Generating sparql from natural language using chain-of-thoughts prompting</article-title>
          ,
          <source>in: SEMANTICS</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>353</fpage>
          -
          <lpage>368</lpage>
          . URL: https://doi.org/10.3233/SSW240028.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [22]
          <string-name>
            <surname>J. D'Abramo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zugarini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Torroni</surname>
          </string-name>
          ,
          <article-title>Dynamic few-shot learning for knowledge graph question answering</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407.01409. arXiv:
          <volume>2407</volume>
          .
          <fpage>01409</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Meyer</surname>
          </string-name>
          , J.
          <string-name>
            <surname>Frey</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Brei</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Arndt</surname>
          </string-name>
          ,
          <source>Assessing SPARQL Capabilities of Large Language Models, in: NLP4KGC: 3rd International Workshop on Natural Language Processing for Knowledge Graph Creation in conjunction with SEMANTiCS 2024 Conference</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C. V. S.</given-names>
            <surname>Avila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M. P.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <article-title>A framework forÂăquestion answering onÂăknowledge graphs using large language models</article-title>
          , in: A.
          <string-name>
            <surname>Meroño Peñuela</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Simperl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tamma</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          <string-name>
            <surname>Nuzzolese</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Poveda-Villalón</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sabou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>I. Celino</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Revenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Raad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sartini</surname>
          </string-name>
          , P. Lisena (Eds.),
          <source>The Semantic Web: ESWC 2024 Satellite Events</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>168</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>C. V. S.</given-names>
            <surname>Avila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Franco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          ,
          <article-title>Experiments with text-to-sparql based on chatgpt</article-title>
          ,
          <source>in: 2024 IEEE 18th International Conference on Semantic Computing (ICSC)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>277</fpage>
          -
          <lpage>284</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICSC59802.
          <year>2024</year>
          .
          <volume>00050</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fiesler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Morrison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Bruckman</surname>
          </string-name>
          ,
          <article-title>An archive of their own: A case study of feminist hci and values in design</article-title>
          ,
          <source>in: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI '16</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2016</year>
          , p.
          <source>2574âĂŞ2585. doi:10.1145/2858036</source>
          .2858409.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Solissa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Cranenburgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V. D.</given-names>
            <surname>Ree</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pianzola</surname>
          </string-name>
          ,
          <article-title>The golem triple store: A graph-based representation of narrative and fiction</article-title>
          , in: B.
          <string-name>
            <surname>Sartini</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Raad</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Lisena</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. M. PeÃśuela</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Beetz</surname>
            ,
            <given-names>I. Blin</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          , J. de Berardinis,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gottschalk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ilievski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. KÃĳmpel</surname>
            , E. Motta,
            <given-names>I. Tiddi</given-names>
          </string-name>
          , J.-P. TÃűberg (Eds.),
          <source>ESWC 2024 Workshops and Tutorials Joint Proceedings</source>
          , volume
          <volume>3749</volume>
          <source>of CEUR Workshop Proceedings, CEUR Workshop Proceedings (CEUR-WS.org)</source>
          , Hersonissos, Greece,
          <year>2024</year>
          . URL: https://hdl.handle.net/11370/ f5b70d22-cc55
          <string-name>
            <surname>-</surname>
          </string-name>
          4dac
          <source>-a5a4-155c1e515b4c.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>