<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Boosting Knowledge Graph Question Answering with Open Source Lightweight Large Language Models and RAG techniques⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario Caruso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgia Lodi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Macis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Persiani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentina Presutti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>BUP srl</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CNR- Institute of Cognitive Sciences and Technologies</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Bologna</institution>
          ,
          <addr-line>Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Linked Open Data (LOD) ecosystem provides a vast potential for structuring and sharing knowledge. However, accessing and querying this data remains challenging, particularly in contexts like the public sector, where technical capabilities are often limited. To bridge this gap, we introduce a Knowledge Graph Question Answering (KGQA) system that leverages open-source and lightweight Large Language Models (LLMs), along with Retrieval-Augmented Generation (RAG) techniques, to automatically produce SPARQL queries. Our experiments demonstrate the efectiveness of RAG-enhanced few-shot learning and Low-Rank Adaptation (LoRA) fine-tuning for generating precise, ontology-specific SPARQL queries. The presented results are obtained by employing our system within the cultural heritage domain, using ArCo, the Italian Ministry of Culture's open knowledge graph of cultural properties.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph</kwd>
        <kwd>Question Answering</kwd>
        <kwd>Retrieval Augmented Generation</kwd>
        <kwd>Linked Open Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Notus 7B-v1 model, show that even lightweight LLMs can support reliable access to cultural heritage
data for general users.</p>
      <p>The rest of this paper is structured as follows. Section 2 presents the state of the art. Section 3
details our methodology for the development of the proposed KGQA system. Section 4 presents the
preliminary results obtained using the open ArCo knowledge graph. Finally, Section 5 concludes the
paper by outlining future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Research on KGQA has evolved significantly, spanning from early rule-based approaches to LLM
methods [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Prominent datasets like QALD [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], LC-QuAD [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and Mintaka [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have been foundational
for evaluating these systems, with the latter emphasizing natural and multilingual queries for complex
SPARQL generation. Neural KGQA systems have increasingly incorporated pre-trained language models
to improve generalization and linguistic flexibility [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, most solutions rely on heavyweight
LLMs (e.g., GPT-3, T5), which limit accessibility due to computational costs. Our work employs LoRA [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
to enable fine-tuning of smaller models, like Notus 7B-v1 1, without full retraining. Retrieval-Augmented
Generation (RAG) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has also proven efective for enhancing few-shot prompting [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and it is gaining
traction in semantic QA to provide context from KGs rather than from unstructured text [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Our
system combines this with prompt engineering and lightweight adaptation for a resource-conscious
deployment scenario.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The methodology we follow consists of the following steps: (1) analysis of the domain ontology of
reference. This step serves as the basis for defining relevant user questions and corresponding SPARQL
queries; (2) analysis of open source LLMs. This analysis is meant to select models that are capable
and lightweight enough to allow us to carry out the necessary fine-tuning operations with limited
computational resources, while still obtaining acceptable results; (3) construction of the training and
test dataset; (4) fine-tuning of the selected LLMs based on the dataset.</p>
      <sec id="sec-3-1">
        <title>3.1. Methodology application to the Italian LOD on cultural heritage</title>
        <p>
          As mentioned earlier, we have applied the methodology in the real-case context of the Italian cultural
heritage domain. To this regard, there exists a network of ontologies that have been developed over years;
namely, Cultural-ON2 on cultural institutes and events [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and ArCo3 on single cultural properties [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
These ontologies have been used to build a knowledge graph of cultural properties, both movable and
immovable.
        </p>
        <p>Question Design In this context, we have started from the Cultural-ON ontology, selecting two
main classes of it: CulturalInstituteOrSite and CulturalHeritageObject. These classes are richly
represented in the data and semantically interconnected with other entities. In fact, from their definition,
we have chosen a set of linked entities such as locations, ticket pricing, opening conditions and visual
depiction. These ontology’s elements allow us to model user questions and SPARQL queries.
Testing and LLM selection Open-source LLMs vary in size and capabilities. At the time of this
study, models in the 7B parameter range have been considered the smallest that could be reliably used
in real-world application scenarios. These models typically support input contexts of 2048 to 4096
tokens, suficient for few-shot learning but not for full ontology ingestion. For the sake of conciseness,</p>
        <sec id="sec-3-1-1">
          <title>1https://huggingface.co/argilla/notus-7b-v1</title>
          <p>2https://dati.cultura.gov.it/cultural-ON/ENG.html
3http://wit.istc.cnr.it/arco/primer-guide-v2.0-en.html
we do not report the comparative analysis of the models, also employing RAG techniques, we have
conducted. However, based on it, we have selected Notus 7B-v1 that outperformed alternatives under
similar conditions (i.e. LLaMa 2 7B and 13B, Falcon 7B, Zephyr 7B, Mistral 7B).</p>
          <p>
            Dataset Construction To construct our training and test dataset, we have been inspired by the
Mintaka dataset [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], which contains 20.000 natural language questions/answers pairs. These pairs are
divided in eight complexity categories; among them we have selected the following ones: counting,
superlative, comparative, multi-hop, yes/no, diference, and a generic one. However, building a dataset
of natural language questions and relative SPARQL code can be a daunting challenge. To this end,
we leverage GPT-3 in creating SPARQL code in response to user competency questions, these latter
formulated when developing the target ontology. The result is an initial set of 30 Italian NLQs targeting
the Cultural-ON ontology and data, each tagged with relevant entities and complexity categories. The
following is an example of a manually crafted triple (translated into English for clarity) where count is
the complexity category.
          </p>
          <p>Tags
[count, sites, city]</p>
          <p>NL Question
How many cultural institutions are there in Florence?
SPARQL
SELECT (COUNT ...</p>
          <p>Several combinations of these tags were automatically created by a script and used as input to GPT-3
to generate more complex NLQs and corresponding NLQ-SPARQL pairs, thereby expanding the dataset.
Overall, we guided the generation process by supplying GPT-3 with the initial 30 examples, selecting
target categories, combining up to five other tags per instance and then letting GPT-3 generate the
question-SPARQL pairs based on this input.</p>
          <p>The resulting dataset, consisting of 383 instances, was manually validated and corrected; nonetheless,
during evaluation (section 4), we observe that, in some cases, the expected query (ground truth) still
contains mistakes (see Evaluation Code 3 in Table 1), revealing residual imperfections despite the initial
validation. To further augment the dataset, each question was paraphrased either automatically via
GPT-3 or manually, efectively doubling its size. The final dataset contains 766 triplets of tags, NLQs,
and SPARQL queries. The complexity of the questions varies by design, with most examples combining
three to five tags (excluding the complexity category).</p>
          <p>The dataset was partitioned into training and test subsets using an 80/20 random split. To ensure
robust model evaluation and prevent data leakage, each example and its corresponding paraphrased
version were constrained to reside within the same subset. Both subsets were stored as a pair of JSONL
text files to be later ingested by the LLM finetuning script, while only the training set was converted
into a collection of 768-dimensional vector embeddings that were stored inside a Qdrant4 DB instance.
Specifically, to support a vanilla RAG approach, each NLQ was embedded using a BERT-based model
named nickprock/sentence-bert-base-italian-uncased5. The latter was chosen both because of
its compatibility with the sentence-transformers6 Python library and because its author fine-tuned it on
an Italian sentence similarity dataset.</p>
          <p>
            LLM fine tuning with LoRA To keep our setup lightweight and reproducible, we fine-tuned Notus
7B-v1 using LoRA [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] Quantized Low-Rank Adapters (QLoRA) [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] . This approach freezes the original
model weights and trains only a small number of additional weight matrices inserted into specific
transformer layers. Although LoRA is commonly applied to large models, recent research [15] shows
that it performs well even with smaller architectures. With just a tiny fraction of the original parameters
involved in the training, LoRA significantly reduces resource requirements; this allows us to successfully
run the fine-tuning on a single consumer 8 GB GPU.
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>4https://github.com/qdrant/qdrant 5https://huggingface.co/nickprock/sentence-bert-base-italian-uncased 6https://github.com/UKPLab/sentence-transformers</title>
          <p>User
question
start
VectorDB
of examples
GUI
end
NL answer
for the user
few-shot
learning
prompt
SPARQL
query
Answering
LLM
CSV data
Informed
question
prompt
Virtuoso
endpoint</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Demo development</title>
        <p>To showcase the capabilities of our KGQA system in practice to also people of the Italian Ministry of
Culture, we have developed ASK ArCo, a GUI-based demo built as a Python gradio7 app with a focus
on the Cultural-ON ontology. Users input questions in natural language, assisted by example prompts
matching the training data style. Each question is wrapped in a RAG-based few-shot prompt and passed
to the LLM to generate a SPARQL query. If the query is valid, it is executed on a SPARQL endpoint and
returns a CSV-formatted result. A second LLM prompt then synthesizes a final answer from the CSV.
The interface progressively displays, as they get generated, the SPARQL code, a table showing the raw
CSV response data, and the final answer (see Figure 1). Failures are diagnosed and flagged depending
on whether they occurred during query generation, execution, or interpretation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. System Evaluation and Discussion</title>
      <p>To assess the SPARQL query generation performance of our system, we conducted four distinct
evaluation tests, comparing both untrained and fine-tuned models, each with and without the integration of
the RAG technique.</p>
      <p>In our RAG pipeline, the retrieval step does not directly extract ontology fragments (e.g., classes
or properties). Instead, it retrieves validated natural language question-SPARQL pairs from a vector
database. During both runtime and evaluation, the SPARQL generation prompt is dynamically
constructed as follows. Using the same model cited in Section 3.1, the user’s input question is firstly
embedded as a 768-dimensional vector, which is then used to query the vector database for the top
eight most similar questions based on cosine similarity. These retrieved questions, together with the
corresponding ground-truth SPARQL queries, are subsequently incorporated into the prompt as
contextual demonstrations. This approach ensures that the language model (LLM) is consistently exposed to
ontology-aligned query patterns, avoiding the need to ingest large ontology fragments that could easily
exceed the model’s context window.</p>
      <p>The evaluation dataset consists of 154 unseen questions: 77 original user questions and their
corresponding paraphrases. These were not included in the training data and were never used in any
RAG context. Each generated SPARQL query and its output was manually categorized into one of ten
evaluation result types, as detailed in Table 1. These categories support multiple analysis perspectives
depending on whether the focus is on query validity or the correctness of the answer.</p>
      <p>Generated query is functionally identical to the expected one.</p>
      <p>Generated query shows potential improvements over the expected one (e.g. it adds a DISTINCT
clause that we instead forgot to include).</p>
      <p>Generated query is correct, whereas the expected one contains mistakes.</p>
      <p>Generated query only answers the user’s question indirectly, as the final result must be inferred or
computed (e.g. when asked to make a comparison, it provides a list of values to be compared
instead of selecting the best one right away).</p>
      <p>Lexical error (e.g. issues in the natural language strings used for regex or exact matching).
Logical or ontological error (e.g. wrong manipulation of the data to be retrieved; non-pertinent,
wrongly named or made up properties and classes).</p>
      <p>Minor syntax error, easily reparable mistake (e.g. unbalanced parenthesis, keyword typos).
Major syntax error, irreparable mistake (e.g. usage of undefined IRI prefixes, wrong punctuation,
invalid keyword positioning).</p>
      <p>Total hallucination, gibberish or empty LLM response.</p>
      <p>To systematically evaluate our results, we organized the evaluation codes into two groupings that
assess diferent aspects of query generation performance.</p>
      <p>• Grouping 1 in Table 2 distinguished between formally valid and invalid SPARQL code. This
dichotomy is fundamental to assess the model reliability in code production, and to find out what
might cause bad SPARQL generation, regardless of the query semantics.
• Grouping 2 in Table 3 distinguishes fully acceptable results from those that could potentially
be improved using post-processing techniques, and from those that are considered unusable
for answering the user’s question. The second group includes outputs or code that could be
repaired, for instance, by prompting the LLM to correct its own errors or by applying automated
SPARQL parsing and correction. While these techniques may incur some runtime cost, they
require minimal implementation efort, and we therefore track these cases as potentially positive
outcomes. The third group, on the other hand, comprises outputs that we do not expect to easily
recover.</p>
      <p>In the following we briefly analyze each of the four tests we conducted on the test dataset.
Test 1 analysis (no train, no RAG) The first test yielded a 0% success rate across all groupings, an
expected outcome given the model’s lack of SPARQL and domain knowledge. Notably, all outputs were
classified as evaluation code 0 (hallucinations), confirming our hypothesis that the untuned model lacks
the necessary grounding to generate meaningful and domain-aware queries.</p>
      <p>Test 2 analysis (no finetuning, with RAG) The second test evaluates the Notus 7B-v1’s capability
to fare well when relevant examples are presented inside its context window. Results are summarized
in Table 4. Data shows good results in producing functioning SPARQL code (58,2%), but most of these
successes contain logical flaws or ontology-specific errors (36,60%). Thus, results are usable only in
∼17% of the cases with a maximum of ∼30% if actions to fix queries or data were applied at the end of
the pipeline. Even with limited adherence to our query style (∼15%), the results are encouraging.
Test 3 analysis (finetuned model, no RAG) With the trained model we noticed progress in the
provision of SPARQL code (∼65% vs. ∼58% of working SPARQL queries) but no significant progress with
their results (still ∼20%) (see Table 4). We notice a significant improvement in terms of hallucinations
against RAG testing (∼-6%), seemingly compensated with a similar increase in major code errors (∼+7%).
Adherence to our style shows little improvement (+1%). Surprisingly, only ∼2% of the queries ended
up being afected by minor code errors. This also means that there is little room for improvement in
recovering slightly wrong queries or results with subsequent techniques. In other words, fine-tuning
the model proved more decisive than the few-shot learning technique, but only in terms of the SPARQL
syntax. The model now either fails greatly in writing a functioning query, or does not fail at all, with
little in-between. Even so, knowledge of the ontology is still insuficient, as errors of type 6 are pretty
much invariant, as well as the quality of the data retrieved.</p>
      <p>Test 4 analysis (finetuned model, with RAG) This test highlights the power of combining fine
tuning with RAG for our purposes (see Figure 2). Working SPARQL queries are produced in ∼88% of
the test cases, and their results are usable in ∼69% of the cases, with ∼14% cases possibly recoverable for
a potential total success rate of ∼83% (see Table 4). Specifically, both hallucinations and major code
errors are significantly reduced, as well as all other error categories, while adherence to our writing
style shows a massive boost (∼58% compared to ∼15.5% of previous experiments). While the observed
improvements are promising, a careful reflection on the scope and potential limitations of these results
is necessary before reaching final conclusions.</p>
      <p>100%
80%
60%
40%
20%
0%</p>
      <p>T2
valid</p>
      <p>T3
not valid</p>
      <p>T4</p>
      <p>T2
admissible</p>
      <p>T3
fixable</p>
      <p>T4
not admiss.
Dataset Limitations Our artificially generated questions, created through the combinatorial pairing
of tags, sometimes lack a natural feel. While this may limit the system’s ability to handle diverse user
queries, it provides significant advantages for internal knowledge graph testing and development. A
uniform SPARQL coding style, while limiting generalizability, is essential for maintaining the consistency
needed for our lightweight model’s performance. Additionally, we have carefully designed our queries
to handle variations in the knowledge graph’s data quality, such as inconsistent date formats and special
characters.</p>
      <p>Production Feasibility Finally, our findings indicate that modest scaling may enable
productionready deployment. The lightweight approach using 7B models and LoRA fine-tuning ofers cost-efective
deployment for public administrations, researchers, and businesses.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Concluding remarks and future work</title>
      <p>In this paper we presented a KGQA system to automatically produce SPARQL queries. This study shows
that fine-tuned lightweight LLMs, when combined with RAG techniques, can efectively bridge the gap
between general users and LOD. Specifically, our system achieved a 69% success rate in generating
correct SPARQL queries, supporting accessibility to KGs in the cultural domain for non-technical users.
Although dataset limitations exist, the system provides a foundation for developing production-level
systems that can democratize access to LOD across domains.</p>
      <p>The combination of LoRA fine-tuning and RAG-enhanced prompting ofers a scalable solution for
organisations, also of the public sector, seeking to make their semantic data more accessible while
maintaining technical and economic feasibility. The system’s modularity enables adaptation to other
intuitive graphs with similar entity-relationship structures, potentially broadening access to various
LOD sources.</p>
      <p>Future work includes improving dataset realism through user-sourced questions and structured input
interfaces, such as drop-down-based builders or controlled natural languages like SQUALL [16]. We
plan to evaluate our approach on established datasets and benchmarks such as QALD, LC-QuAD, and
Mintaka to better assess its generalizability and comparability. Potential technical enhancements include
incorporating SPARQL-specialized language models, leveraging automatic query repair, extending the
RAG method to include retrieval of ontology snippets (such as class or property definitions) to further
refine query generation and minimize logical errors and improving data access through structured
graph exploration.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>llms, CoRR abs/2305.14314 (2023). URL: https://doi.org/10.48550/arXiv.2305.14314. doi:10.48550/
ARXIV.2305.14314. arXiv:2305.14314.
[15] S. Sun, D. Gupta, M. Iyyer, Exploring the impact of low-rank adaptation on the performance,
eficiency, and regularization of RLHF, CoRR abs/2309.09055 (2023). URL: https://doi.org/10.48550/
arXiv.2309.09055. doi:10.48550/ARXIV.2309.09055. arXiv:2309.09055.
[16] S. Ferré, SQUALL: A controlled natural language for querying and updating RDF graphs, in:
T. Kuhn, N. E. Fuchs (Eds.), Controlled Natural Language - Third International Workshop, CNL
2012, Zurich, Switzerland, August 29-31, 2012. Proceedings, volume 7427 of Lecture Notes in
Computer Science, Springer, 2012, pp. 11–25. URL: https://doi.org/10.1007/978-3-642-32612-7_2.
doi:10.1007/978-3-642-32612-7\_2.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Quarati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Albertoni</surname>
          </string-name>
          ,
          <article-title>Linked open government data: Still a viable option for sharing and integrating public data?</article-title>
          ,
          <source>Future Internet</source>
          <volume>16</volume>
          (
          <year>2024</year>
          )
          <article-title>99</article-title>
          . URL: https://doi.org/10.3390/fi16030099. doi:
          <volume>10</volume>
          .3390/FI16030099.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Soru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moussallem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Publio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valdestilhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Esteves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. B.</given-names>
            <surname>Neto</surname>
          </string-name>
          ,
          <article-title>SPARQL as a foreign language</article-title>
          ,
          <source>CoRR abs/1708</source>
          .07624 (
          <year>2017</year>
          ). URL: http://arxiv.org/abs/1708.07624. arXiv:
          <volume>1708</volume>
          .
          <fpage>07624</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ma</surname>
          </string-name>
          , Y. Chen,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Large language models meet knowledge graphs for question answering: Synthesis and opportunities</article-title>
          ,
          <source>CoRR abs/2505</source>
          .20099 (
          <year>2025</year>
          ). URL: https: //doi.org/10.48550/arXiv.2505.20099. doi:
          <volume>10</volume>
          .48550/ARXIV.2505.20099. arXiv:
          <volume>2505</volume>
          .
          <fpage>20099</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Röder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Conrads</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huthmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Demmler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Unger</surname>
          </string-name>
          ,
          <article-title>Benchmarking question answering systems</article-title>
          ,
          <source>Semantic Web</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>293</fpage>
          -
          <lpage>304</lpage>
          . URL: https://doi. org/10.3233/SW-180312. doi:
          <volume>10</volume>
          .3233/SW-180312.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Trivedi</surname>
          </string-name>
          , G. Maheshwari,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>Lc-quad: A corpus for complex question answering over knowledge graphs</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference</source>
          , Vienna, Austria,
          <source>October 21-25</source>
          ,
          <year>2017</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>10588</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2017</year>
          , pp.
          <fpage>210</fpage>
          -
          <lpage>218</lpage>
          . URL: https://doi.org/10. 1007/978-3-
          <fpage>319</fpage>
          -68204-4_
          <fpage>22</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -68204-4\_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Aji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Safari</surname>
          </string-name>
          ,
          <article-title>Mintaka: A complex, natural, and multilingual dataset for end-to-end question answering</article-title>
          ,
          <source>in: Proceedings of the 29th International Conference on Computational Linguistics</source>
          ,
          <string-name>
            <surname>COLING</surname>
          </string-name>
          <year>2022</year>
          , Gyeongju, Republic of Korea,
          <source>October 12-17</source>
          ,
          <year>2022</year>
          ,
          <source>International Committee on Computational Linguistics</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1604</fpage>
          -
          <lpage>1619</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          . coling-
          <volume>1</volume>
          .
          <fpage>138</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , J. Leskovec,
          <article-title>QA-GNN: reasoning with language models and knowledge graphs for question answering, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online</article-title>
          , June 6-11,
          <year>2021</year>
          , Association for Computational Linguistics,
          <year>2021</year>
          , pp.
          <fpage>535</fpage>
          -
          <lpage>546</lpage>
          . URL: https://doi.org/10.18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>45</volume>
          . doi:
          <volume>10</volume>
          .18653/V1/
          <year>2021</year>
          .NAACL-MAIN.
          <year>45</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Lora:
          <article-title>Low-rank adaptation of large language models</article-title>
          ,
          <source>CoRR abs/2106</source>
          .09685 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2106.09685. arXiv:
          <volume>2106</volume>
          .
          <fpage>09685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive NLP tasks</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems</source>
          <year>2020</year>
          ,
          <article-title>NeurIPS 2020</article-title>
          , December 6-
          <issue>12</issue>
          ,
          <year>2020</year>
          , virtual,
          <year>2020</year>
          . URL: https: //proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lomeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dwivedi-Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          , E. Grave,
          <article-title>Few-shot learning with retrieval augmented language models</article-title>
          ,
          <source>CoRR abs/2208</source>
          .03299 (
          <year>2022</year>
          ). URL: https://doi.org/10.48550/arXiv.2208.03299. doi:
          <volume>10</volume>
          .48550/ARXIV.2208. 03299. arXiv:
          <volume>2208</volume>
          .
          <fpage>03299</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Mavromatis</surname>
          </string-name>
          , G. Karypis, GNN-RAG:
          <article-title>graph neural retrieval for large language model reasoning</article-title>
          ,
          <source>CoRR abs/2405</source>
          .20139 (
          <year>2024</year>
          ). URL: https://doi.org/10.48550/arXiv.2405.20139. doi:
          <volume>10</volume>
          .48550/ARXIV. 2405.20139. arXiv:
          <volume>2405</volume>
          .
          <fpage>20139</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lodi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Asprino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Veninata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Orsini</surname>
          </string-name>
          , Semantic Web for Cultural Heritage Valorisation, Springer International Publishing, Cham,
          <year>2017</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>37</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -54499-
          <issue>1</issue>
          _1. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -54499-
          <issue>1</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Carriero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Mancinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Veninata</surname>
          </string-name>
          , et al.,
          <article-title>Pattern-based design applied to cultural heritage knowledge graphs</article-title>
          ,
          <source>SEMANTIC WEB 12</source>
          (
          <year>2021</year>
          )
          <fpage>313</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pagnoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          , L. Zettlemoyer, Qlora: Eficient finetuning of quantized
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>