<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Improving Subgraph Extraction Algorithms for One-Shot SPARQL Query Generation with Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmitrii Pliukhin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniil Radyush</string-name>
          <email>daniil.radyush@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liubov Kovriguina</string-name>
          <email>lkovriguina@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Mouromtsev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Scholarly Knowledge Graphs, Knowledge Graphs Question Answering, SPARQL query generation,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>30167 Hannover</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ITMO University</institution>
          ,
          <addr-line>Kronverksky Pr. 49, bldg. A, St. Petersburg, 197101</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Independent Researcher</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>6</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>Question answering over scholarly knowledge graphs involves many challenges: complex graph patterns, long-tail distributed data, revision and evolution of the scholarly ontologies, and knowledge graphs incompleteness due to constant research dynamics. In this work, we present an LLM-based approach for SPARQL query generation over Open Research Knowledge Graph (ORKG) for the ISWC SciQA Challenge. Our approach proposes a couple of improvements to the recently published SPARQLGEN approach, that performs one-shot SPARQL query generation by augmenting Large Language Models (LLMs) with the relevant context within a single prompt. Similar to SPARQLGEN, we include heterogeneous data sources in the SPARQL generation prompt: a question itself, an RDF subgraph required to answer the question, and an example of a correct SPARQL query. In the current work, we focused on designing subgraph extraction algorithms, that are close to real-life scenarios of generative KGQA, and replaced the random choice of example question-query pair with similarity scoring.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Scholarly knowledge graphs has become a recent trend and inspired the adaptation of KGQA
systems to new complex domains, that are constantly evolving, bringing new facts and concepts
to the knowledge graph. Currently, there is a number of scholarly knowledge graphs, that difer
in metadata and coverage, being built on top of various ontologies and data sources [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ].
For knowledge graph question answering (KGQA) systems, such landscape creates a lot of
challenges due to variative graph patterns, ambiguity, and complex user questions. Previous
KGQA benchmarks (i.e. QALD series and LC-QuAD datasets) were build upon DBpedia and
Wikidata and allowed cumulative improvements of KGQA systems, especially for template-based
approaches. With the diversity of scholarly KGs, template-based and pre-trained approaches may
not work as good as before due to adaptation costs. We suggest to employ generative approaches
CEUR
Workshop
Proceedings
to KGQA, namely within the augmented large language models paradigm, when the LLM gets all
the context required to generate the SPARQL query, in a prompt. LLMs augmenting approaches
try to mitigate the fundamental defect of LLMs: pure statistical language modeling over limited
context size, by providing LLMs information from relevant data sources. Several strategies
are known to do the augmentation: (i) retrieval-augmented language models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], knowledge
injection [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], reasoning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], etc. According to the classification, proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], providing LLMs
with extra context via prompting belongs to augmenting with eliciting reasoning.
      </p>
      <p>
        In the current paper, we propose further improvements to SPARQLGEN, a recent one-shot
approach for generating SPARQL queries with prompting LLMs [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. SPARQLGEN approach
lies in augmenting LLMs with a knowledge graph fragment, required to construct the query,
and a question-subgraph-query example within a single prompt (see Sec.2). This knowledge
graph fragment, further referred as subgraph, contains all the triples that are required to build
the correct SPARQL query (that is, to answer the question correctly), but no irrelevant triples,
and extracting this subgraph is quite challenging. Since the motivation behind SPARQLGEN
was to evaluate, whether an LLM can make use of the provided subgraph and infer graph
patterns during SPARQL query generation, the algorithm of subgraph extraction is based on
target SPARQL queries of QALD-9 and QALD-10 datasets1. The disadvantage of this approach
is that it will not work on inference, since it requires the ground truth data. For the SciQA
challenge, we have designed a subgraph extraction algorithm, combining similarity search with
Hierarchical Navigable Small World (HNSW) method, that can be used as a replacement of the
subgraph extraction algorithm in SPARQLGEN and which matches better the real-life KGQA
scenarios (see Sec.5). The random selection of the guiding example for one-shot prompting is
replaced with selection based on the Levenstein distance.
      </p>
      <p>Main contributions of the paper are the following:
• improvement of SPARQLGEN, a one-shot method for SPARQL query generation with
prompting LLMs, that can be quickly adapted to other datasets and knowledge graphs,
• subgraph extraction algorithm, that can be used in LLM-augmented SPARQL query
generation scenarios,
• evaluation of the improved SPARQLGEN method on the ORKG benchmark.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Fine-tuning LLMs to generate SPARQL queries has already a proven track record with
topleaderboard results on QALD-9 with SGPT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and LC-QuAD with GETT-QA [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Besides that,
there are also no-SPARQL KGQA approaches, allowing to load the knowledge graph into the
question answering LLM pipeline in a retrieval fashion, i.e. see2, but we excluded them for now
despite considering promising.
      </p>
      <p>
        Rony et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] propose SGPT, an approach using a stack of Transformer encoders to embed
linguistic features from natural language questions, as well as entity and relation information,
to the GPT-2 model. While entities and relations representations are fed to the model in SGPT,
1See algorithm description in the repository: https://github.com/danrd/sparqlgen
2https://github.com/mommi84/rdf-qa
providing their connections in the underlying KG is missing. Thus, generating correct triple
sequences in the final SPARQL queries is error prone due to unknown graph structures. Another
strong approach was introduced in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], where authors improve on the state of the art KGQA3
by training the T5 model to generate skeleton SPARQL queries and truncated KG embeddings,
that are used to fetch candidate entities for the skeleton query. However, all these approaches
assume training or fine-tuning an existing model.
      </p>
      <p>
        The recent approach, which we are extending upon, is SPARQLGEN [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], that doesn’t require
any training and instructs LLMs to generate SPARQL queries by providing them the underlying
knowledge graph, guiding examples and information about SPARQL grammar within a single
prompt. Assembling all the context, required to generate a query, in a single prompt, is performed
via loosely coupled heterogeneous structured information snippets, further referred as prompt
elements. The prompt element is represented as a structure, having description and source and a
set of pre-processing methods (i.e. for sampling, serializing to string, ranking, linearizing), that
are specific to the prompt element and allow to combine heterogeneous data sources within
a single prompt. The description field depicts the source data, i.e. ”The RDF knowledge graph”,
and the source contains the data itself, i.e. triples. This altogether allows to quickly and flexibly
build custom prompt templates with an arbitrary order and number of elements.
      </p>
      <p>The findings of SPARQLGEN approach show that the model struggles to deal with an unknown
knowledge graph. Namespace errors, incomplete triples and ignoring KG structure errors
occur more frequently for the unseen dataset and it seems beneficial to introduce the model
to the KG in pre-training already (see supplementary material in in the repository: https:
//github.com/danrd/sparqlgen).</p>
    </sec>
    <sec id="sec-4">
      <title>3. SciQA Challenge and Datasets</title>
      <p>This section presents an overview of the SciQA Challenge and the datasets used in this study.</p>
      <sec id="sec-4-1">
        <title>3.1. SciQA Challenge</title>
        <p>The primary objective of this research endeavor was to participate in the Scholarly
QuestionAnswering over Linked Data (Scholarly QALD) challenge. This challenge is centered around
Knowledge Graph Question Answering utilizing the ORKG Scholarly Knowledge Graph. The
challenge was structured into two distinct stages.</p>
        <p>The first stage aimed to develop a model and conduct validation using a held-out subset with
known true labels, thereby establishing an initial leaderboard. The second stage focused on
assessing the model’s quality and generating final results for comparison among the various
solutions developed. The central challenge task involved transforming a user’s natural language
question into a SPARQL query and executing this query on the provided knowledge graph to
furnish a response. The eficacy of the proposed solutions was evaluated through a set of 200
questions, employing the F1-score metric in both stages.</p>
        <sec id="sec-4-1-1">
          <title>3https://github.com/KGQA/leaderboard</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Datasets</title>
        <p>To train and evaluate the developed models, the SciQA dataset was designated as the target data
source. This dataset is publicly available in the repository: https://zenodo.org/record/7744048 .
The SciQA dataset encompasses a rich assortment of records. Each record comprises a question
expressed in natural language, the corresponding SPARQL query, and a list of knowledge graph
triples that represent the answer to that question.</p>
        <p>Notably, the SciQA dataset is characterized by a significant diversity of questions, spanning
various domains, response types, and sizes. The questions also exhibit structural and
computational complexity and a degree of ambiguity. In addition to the SciQA dataset, the repository
contains a link to the dump of the knowledge graph designed for question answering. This
knowledge graph is primarily structured around scholarly data, including papers, contributions,
and connections between them, accompanied by pertinent metadata.</p>
        <p>The knowledge graph contains a total of 1,133,217 triples and 21,243 contributions, with
an overall dump size of 152 megabytes. This knowledge graph is inherently heterogeneous,
encompassing diverse data pertaining to papers from an array of research fields. Its structure
is notably intricate, rendering the challenge particularly demanding. Specifically, the graph
incorporates both discrete and continuous data entries, encompassing a wide range of primitive
types such as numbers in various formats, dates, strings, and binary labels.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Architecture Description</title>
      <p>Our approach is implemented as a modular architecture with the following components: (1)
subgraph retrieval (2) example selection context to populate prompt elements, (3) prompt
building, (4) prompt execution, (5) removing hallucinations and validating the query, (6) query
execution.</p>
      <p>During steps (1) and (2), each data point in the SciQA test set was augmented with the
context, represented as (i) the subgraph, required to execute the query (see Sec. 5), and (ii) a
question-query pair, similar to the question (see below). Then the prompt builder (3) constructs
a prompt from the augmented SciQA datapoint and a guiding example with the order of the
prompt elements, defined in the experiment config, as shown in Fig. 1), and the serialized prompt
is sent to the GPT-3.5 completions endpoint. The resulting query is validated (5) and executed
(6). Validation includes removing hallucinated symbols (i.e. generated text prior the query, like
System:, Query:, randomly inserted newlines, etc.) The pipeline architecture is shown in Fig. 1.
Subgraph extraction is described in more detail in Sec. 5.</p>
      <p>Following the SPARQLGEN approach, to combine heterogeneous data sources in the prompt,
we used an abstract structure, called prompt element, that is instantiated during the experiment.
Each prompt element has fields description and source, as well as methods for pre-processing
the source data (see prompt elements Example, Instruction, Question and Knowledge Graph in
Fig. 1). A prompt in SPARQLGEN is a serialized sequence of prompt elements. The implemented
structure allows to configure experiments with minimal changes in the code structure and
quickly design custom prompt templates.</p>
      <p>Example Selection For one-shot prompting, the example was sampled from the train subset
of the proposed dataset by computing Levenstein distance between input question and every
question from the train subset. Subsequently the list of questions was sorted by Levenstein
distance, and record with the lowest distance was selected for prompt generation. The approach
based on Levenstein distance instead of semantic similarity has been selected due to the low
diversity of the dump dataset, and a lot of instances in the test subset which had at least one
training sample with the identical structure except a few property values.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Subgraph Extraction Algorithms</title>
      <p>
        Our approach is based on finding for each question semantically related objects in ORKG, namely
resources, properties, papers and contributions. For this purpose, ’all-mpnet-base-v2’ sentence
transformer [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] model was leveraged to measure the similarity between question and object
labels embeddings. Furthermore, Hierarchical Navigable Small World (HNSW) method [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
was used for ORKG objects indexing to facilitate vector search that constitutes preliminary (0)
step for the subgraph extraction approaches.
      </p>
      <p>Essentially, the process of subgraph retrieval comprises the following steps. (1) question
processing: filtering out parts of speech, except nouns, adjectives and verbs; bigrams generation
and encoding. (2) relevant objects extraction: defining most relevant to the question ORKG
objects based on cosine similarity between its bigrams embeddings and embeddings of ORKG
object labels leveraging initially constructed with HNSW indexes. (3) subgraphs retrieval:
deriving 2-hop paths containing given similar objects. (4) subgraphs merging and converting
into string with postprocessing for prompting.</p>
      <p>In our experiments we implemented two approaches for involving derived triples in subgraph
construction process as part of step (3). The first approach considers papers and contributions
as a main source of structured information for LLMs. Therefore, it relies on retrieving a number
of papers and contributions with related titles from ORKG, excluding triples with irrelevant
predicates. However, in some cases titles are not specific enough and do not contain needed
keywords. To address this issue, the second approach is to directly extract triples containing
resources and properties with high similarity to the question, though in some cases it requires
particular triple patterns. More detailed description of the implemented approaches is provided
with Figure 2, whereas the whole process of subgraph retrieval in general is presented in Fig. 3:</p>
    </sec>
    <sec id="sec-7">
      <title>6. Experiments and Results</title>
      <p>During the experiments, we evaluated, how the subgraph extraction algorithm influences the
SPARQL generation quality. In the baseline version, no subgraph was provided, only an example.
The results are summarized in Table 1.</p>
      <p>According to the experimental results, even without involving subgraph extraction algorithms
the proposed architecture is capable to obtain relatively decent result with 0.922 F1 score.
Moreover, the table shows that the second approach to subgraph extraction leads to minor
F1 score decreasing. This probably means that structured information, extracted from ORKG,
predominantly contributes additional noise to the prompts. Consequently, this approach needs
further revision or adaptation to the given KG. On the other hand, the first approach fosters
slight F1 metric increase demonstrating its usefulness in general that requires investigation in
future works.</p>
      <p>The idea behind hyperparameters selection for the experiments on the one hand was to
comply with 4096 tokens limit for GPT-3.5 and at the same time to measure the contribution
changing the balance between diferent parameters. Table 2 indicates that the results mostly
robust to variation in hyperparameters that at least partly could be caused by relatively low
impact of the subgraph extraction algorithms on the system performance.</p>
    </sec>
    <sec id="sec-8">
      <title>7. Error analysis</title>
      <p>Using the optimal configuration of the subgraph extraction algorithm, all generated queries
(100%) contained one or more missing prefixes, necessitating their manual addition at the
beginning of each query to obtain meaningful results from the database. Two generated queries
(1%) were entirely incorrect, specifically AQ1806 and AQ1787. These queries consisted solely of
lists of nodes from the graph database. This could be attributed to the lack of examples in the
training set that exhibit semantic similarity to the desired questions, with a similarity score high
enough to facilitate the reuse of provided examples for new query generation. Consequently,
the model might have attempted to deduce the values that the target query should have returned
or replicated elements of the provided subgraph.</p>
      <p>However, upon a more detailed investigation, the evidence suggests that the training subset
contains relevant examples with Levenshtein similarity scores of 0.944 and 0.8947, indicating
very high similarity. The significantly high similarity and structural homogeneity were also
confirmed through manual evaluation. Therefore, the model’s inability to utilize these examples
and the provided subgraph to generate a correct SPARQL query should be attributed to its
stochastic nature.</p>
      <p>Furthermore, regarding other configurations of the subgraph retrieval algorithm, the following
types of inconsistencies were observed:</p>
      <p>1. Incorrect namespace prefixes (such as myontology, ex, foaf) appeared in 2 instances,
possibly due to the model’s bias;</p>
      <p>2. Instances, in which the model requested additional data using natural language for query
generation, occurred once. Notably, in this case, the structure of the provided example closely
resembled the format of the input question, and thus, the model should have been capable of
handling this case correctly, given that it processed most of the dataset without errors. Such
behavior can also be attributed to the inherent stochasticity of the model’s behavior, and an
additional request should be suficient to resolve the observed issue in most cases;
3. Hallucinated node identifiers occurred in one instance, likely due to the complexity of
the provided query. In this case, the question was particularly challenging to handle without
extensive reference to database content, as the node identifiers could not be straightforwardly
derived from text labels. The model essentially had to guess, as the required instances arguably
were absent from the provided subgraph.</p>
      <p>For more details regarding the classification system used for categorizing errors refer to the
supplementary materal for SPARQLGEN paper 4.</p>
      <p>As a result, the majority of the model’s errors were a consequence of its random nature, with
only a negligible fraction of cases (0.25% of the total amount of handled questions) in which
the model has completely failed to respond correctly. The residual error should be studied
with greater attention. Evidently, in these cases, the model has failed to reproduce the query</p>
      <sec id="sec-8-1">
        <title>4https://github.com/danrd/sparqlgen/blob/main/errors.png</title>
        <p>structure, translate the necessary parameter values to corresponding query fields, or eficiently
utilize the provided subgraph due to incompleteness of the train corpus, insuficient architecture
complexity and other constraints.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>8. Conclusion and Future Work</title>
      <p>
        The presented work shares results on one-shot SPARQL query generation with large language
models for the fraction of the ORKG, proposed at the SciQA challenge. The results, which
were obtained, require further assessment. First of all, the performance of all participating
systems, evaluated with F1-score, looks very high for such a complex domain as scholarly
knowledge graphs. Our approach was ranked last with 0.935 F1-score, and the winner has
the F1-score of 0.99, which is not that far for the one-shot approach without pre-training or
ifne-tuning. For other benchmarks (i.e. QALD series), the leaderboard is much lower 5. This
might be the result of the lack of structural diversity in the test set. Second, bringing the
subgraph to the prompt doesn’t result in that considerable performance increase, as it was
shown for SPARQLGEN [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. There, adding the subgraph to the prompt has led to 2.5x increase in
accuracy on QALD-9, whereas in the ORKG experiments the best performed subgraph extraction
algorithm has achieved only 0.013 F1-score improvement over the baseline without subgraph.
The performance of the baseline, implemented as one-shot prompting, is also unexpectedly
high, 0.922. For instance, a plausible explanation is the high structural similarity between the
questions in the train and test sets of the SciQA challenge.
      </p>
      <p>Given the ongoing research activity on enhancing LLMs with KGs, future work is quite
extensive. First of all, we would like to proceed with fine-tuning open source LLMs on multiple
knowledge graphs and compare the performance vs. the connecting knowledge graphs to the
LLM in a retrieval fashion, as well as evaluate the generalization of the fine-tuned LLM on more
datasets.</p>
      <p>Developing robust subgraph extraction algorithms, that produce relevant knowledge snippets
to augment the LLMs across diferent tasks, remains another perspective research direction.
The requirement to have only relevant triples in the subgraph is also quite challenging to
achieve with a single algorithm. Thus, some other subgraph reduction procedures can be added:
i.e. instructing the LLM to prune some of the irrelevant triples6, using KG embeddings-based
classifier to estimate the relevance of the triple, or some actor-critic approach to iteratively
refine the subgraph content.</p>
      <sec id="sec-9-1">
        <title>5https://github.com/KGQA/leaderboard</title>
        <p>6This was suggested by one of the reviewers.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lamprecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Aung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <article-title>Semopenalex: The scientific landscape in 26 billion rdf triples</article-title>
          ,
          <source>arXiv preprint arXiv:2308.03671</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessí</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Reforgiato</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , E. Motta,
          <article-title>Cs-kg: A large-scale knowledge graph of research entities and claims in computer science</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2022</year>
          , pp.
          <fpage>678</fpage>
          -
          <lpage>696</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vahdati</surname>
          </string-name>
          , G. Palma,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Nath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , M.-E. Vidal,
          <article-title>Unveiling scholarly communities over knowledge graphs</article-title>
          ,
          <source>in: International Conference on Theory and Practice of Digital Libraries</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lomeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dwivedi-Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          , E. Grave, Atlas:
          <article-title>Few-shot learning with retrieval augmented language models</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2208</volume>
          .
          <fpage>03299</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Emelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bonadiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Alqahtani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Mansour,
          <article-title>Injecting domain knowledge in language models for task-oriented dialogue systems</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2212</volume>
          .
          <fpage>08120</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ichter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain-ofthought prompting elicits reasoning in large language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2201</volume>
          .
          <fpage>11903</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mialon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dessì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lomeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nalmpantis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pasunuru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raileanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Rozière</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dwivedi-Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Celikyilmaz</surname>
          </string-name>
          , et al.,
          <article-title>Augmented language models: a survey</article-title>
          ,
          <source>arXiv preprint arXiv:2302.07842</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovriguina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Teucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radyush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mouromtsev</surname>
          </string-name>
          , Sparqlgen:
          <article-title>One-shot prompt-based approach for sparql query generation (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. R. A. H.</given-names>
            <surname>Rony</surname>
          </string-name>
          , U. Kumar,
          <string-name>
            <given-names>R.</given-names>
            <surname>Teucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovriguina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>Sgpt: A generative approach for sparql query generation from natural language questions</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>70712</fpage>
          -
          <lpage>70723</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3188714</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <article-title>Modern baselines for sparql semantic parsing</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2260</fpage>
          -
          <lpage>2265</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: http://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Malkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yashunin</surname>
          </string-name>
          ,
          <article-title>Eficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligenc</source>
          PP (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2018</year>
          .
          <volume>2889473</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>