<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Pizzo Calabro (VV),
Italy
" manuel.borroto@unical.it (M. A. Borroto); francesco.ricca@unical.it (F. Ricca); bernardo.cuteri@unical.it
(B. Cuteri)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A system for translating natural language questions into SPARQL queries with neural networks: Preliminary results</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Alejandro Borroto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Ricca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuteri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Calabria</institution>
          ,
          <addr-line>Via Pietro Bucci, Rende, Cosenza, 87036</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The development of knowledge bases has gathered nowadays large volumes of information concerning multiple domains. Unfortunately, access to this information is complicated for those users unfamiliar with the SPARQL query language and the knowledge base definition. In this paper, we present preliminary results on a system for automatic translation of natural language questions into SPARQL queries. Our method uses Neural Machine Translation and Named Entity Recognition tasks that complement each other to obtain a final query ready to be executed. We demonstrate the potential of our approach by presenting its results on the Monument dataset, which is a recently released dataset for Question Answering on the well-known DBpedia knowledge base.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge base</kwd>
        <kwd>Question Answering</kwd>
        <kwd>Neural network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Today we live in what is known as the Digital Age. How knowledge is generated and shared
has changed dramatically, digital formats and the internet have made information much more
accessible than the old non-virtual format. As evidence of this, we now have vast and complex
knowledge bases which allow gathering large volumes of information through the
intercommunication of thousands of datasets referring to various domains in what is known as Linked Data.
This means that the people have access to a large amount of information never thought, and
the DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] project is a real example of that, which is one of the most popular knowledge
bases nowadays.
      </p>
      <p>
        The problem is that the search and retrieval of the information stored in this way can be a
hard task for lay users because it is necessary to know the structure of the knowledge base and
the appropriate query languages, such as SPARQL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As a result, natural language Question
Answering (QA) has taken a central role in the area of the Semantic Web to address such issues.
A group of QA approaches, especially the most recent, have begun to take advantage of the
great development achieved by Deep Learning and started to use deep neural networks to tackle
the problem, proposing systems for the automatic translation from natural language questions
to SPARQL queries, removing all technical complexity to the final users.
      </p>
      <p>
        In this context, we propose a system for the automatic translation of natural language
questions into SPARQL queries. More specifically we employ LSTM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] neural networks due to
the proven efectiveness for Natural Language Processing that they have demonstrated. The
system is consists of two parts. The first one translates the questions in natural language into
a SPARQL template using an LSTM encoder-decoder model, which is the state-of-the-art for
these types of tasks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Whereas the second part is a model for Named Entity Recognition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
also based on LSTM networks, and responsible for extracting the entities from the question
to finally combine the results and create a SPARQL query ready to be executed. Besides, we
introduce a formal definition of a dataset format that greatly reduces the output space and is
essential for the proper functioning of the system and also allows us to tackle the problem with
the out-of-vocabulary (OOV) words of the training set, a major weakness of the majority of the
related approaches today.
      </p>
      <p>We demonstrate the potential of our approach by presenting its results on the Monument
dataset, which is a recently released dataset for Question Answering on the well-known DBpedia
knowledge base.</p>
      <p>The remainder of the paper is structured as follows. Section 2, we talk about related works. In
section 3, we go into the particular details of our approach. Section 4 focuses on the discussion
of experiments and results. Finally, we provide some conclusions and aspects for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Pattern-based. The idea of employing query patterns for mapping questions to
SPARQLqueries was already exploited in the literature [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. The approach presented by Pradel and
Ollivier [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] also adopts named entity recognition but applies a set of predefined rules to obtain
all the query elements and their relationships. The approach by Steinmetz et. al [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] has 4 phases,
ifrstly, the question is parsed and the main focus is extracted, then general queries are generated
from the phrases in natural language according to predefined patterns, and finally, makes a
subject-predicate-object mapping of the general question to triples in RDF. Despite both of the
above-mentioned approaches performed well in selected benchmarks, they rely on patterns
and rules defined manually for all existing types of questions. A limit that is not present in our
proposal.
      </p>
      <p>
        Deep Learning-based. In the Seq2SQL approach [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] an LSTM Seq2Seq model is used to
translate from natural language to SQL queries. The interesting thing about this approach is
that they use Reinforcement Learning to guide the learning. The usage Encoder-Decoder model
based in LSTM with an attention mechanism to associate a vocabulary mapping between natural
language and SPARQL also was proposed in the literature [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] obtaining good results.
      </p>
      <p>
        The Neural SPARQL Machines (NSpM) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] approach is based on the idea of modifying the
SPARQL queries to treat them as a foreign language. To achieve this, they encoded the brackets,
URIs, operators, and other symbols, making the tokenization process easier. The resulting
dataset was introduced in a Seq2Seq model responsible for performing the question-query
mapping. The same authors created the DBNQA dataset[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and their model was tested on a
subdomain referring to monuments and evaluated using the purely syntactic BLEU score [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
As a consequence, it performs well in reproducing the syntax of the gold query but is less able
to generalize to unseen natural language questions and OOV words when compared with our
approach.
      </p>
      <p>
        The query building approach by Chen et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] features two stages. The first stage consists
of predicting the query structure of the question and leverages the structure to constrain the
generation of the candidate queries. The second stage performs a candidate query rank. As in
our approach, Chen et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] uses BiLSTM networks, but query representation is based on
abstract query graphs.
      </p>
      <p>
        Also, we report that eight diferent models based on RNNs and CNNs were compared by Yin
and colleagues [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In this large experiment, the ConvS2S [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] model proved to be the best.
      </p>
      <p>
        For completeness, we studied another related line of work that aims to translate the natural
language questions into SQL queries. The work proposed by Yu et. al [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] introduces a
largescale, complex, and cross-domain semantic parsing and text-to-SQL dataset. To validate the work
contribution, they used the proposed dataset to train diferent models to convert text to SQL
queries. Most of the models were based on a Seq2Seq architecture with attention, demonstrating
an adequate performance. Another interesting case of study is the editing-based approach for
text-to-SQL generation introduced by Zhang et. al [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. They implement a Seq2Seq model
with Luong’s attention, using BiLSTMs and BERT embeddings. The approach demonstrates to
perform well on SParC and Spider datasets, outperforming the related work in some cases.
      </p>
      <p>Our architecture addresses the issues connected with the translation resorting to specific
tools, an aspect that is not present in mentioned works. Moreover, existing approaches based
on NMT do nothing special to deal with OOV words.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Translating Natural Language Questions to SPARQL</title>
      <p>Knowledge bases (KB) are a rich source of information related to a great variety of domains,
which can be accessed by experts of formal query languages. The potential of exploiting
knowledge bases can be greatly increased by allowing any user to query the ontology by posing
questions in natural language.</p>
      <p>In this paper, this problem is seen as the following Natural Language Processing task: Given
an RDF knowledge base  and a question  in natural language (to be answered using
), translate  into a SPARQL query  such that the answer to  can be obtained by
running  on the underlying ontology .</p>
      <p>The starting point is a training set containing a number of pairs ⟨,  ⟩, where 
is a natural language question, and  is a SPARQL query, called the gold query. The
gold query is a SPARQL query that models (i.e., allows to retrieve from ) the answers to
. The training set has to be used to learn how to answer questions posed in natural
language using , so that, given a question in natural language , the QA system can
generate a query ′ that is equivalent to the gold query  for , i.e., such that
(′ ) = ( ).1 In particular, we approach this problem as a machine
translation task, that is, we compute ′</p>
      <p>as ′ =  (), where   is
the translation function implemented by our QA System, called sparql-qa.</p>
      <p>In the remainder, we first present an intermediate format conceived to boost the training
time of the entire process and reduce the impact of words that reference individuals that are not
mentioned in the training set, in turn, we describe our translation modules that take as input
the dataset in the new format.</p>
      <sec id="sec-3-1">
        <title>3.1. A new data set format</title>
        <p>In general, NL to SPARQL datasets are composed of a set of pairs ⟨,  ⟩. In such a
common type of representation, the named entities found in the question are typically represented
directly by their URIs in the SPARQL query, but this transformation is hard to learn from mere
examples, and the trained system would fail if the transformation can not be described as simple
rules. This is an issue, especially in large ontologies, where there is a huge number of resources.</p>
        <p>A dataset in QQT is composed of a set of triples in the form ⟨,  ,
 ⟩, where  is a natural language question, and   marks which parts
of  are entities, and   is a SPARQL query template with the following
modifications: () The KB resources are replaced by one or more variables; () A new triple is
added for each variable in the form "?var rdfs:label placeholder".  ℎ are meant to be
replaced by substrings of  depending on  .</p>
        <p>In Table 1 we show an example of a ⟨, ⟩ pair for the question Who painted the
Mona Lisa?, while Table 2 shows the corresponding ⟨,  ,  ⟩
triple in the QQT format.</p>
        <p>In table 2 the term $1 denotes a placeholder, where 1 means that it has to be replaced by the
ifrst entity occurring in the question, that is Mona Lisa as represented by  and  in Tagging.
Note that, in the QQT format, the query template does not contain any DBpedia resource, thus
the learning model (which is the neural network in our case) does not need to understand that
1Note that we are interested in computing the answers and not in syntactically reproducing the gold query.
Mona Lisa stands for the dbr:Mona_Lisa resource and the   is exactly the same
for all questions asking the author of a given artwork.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. The translation modules</title>
        <p>
          Our approach consists of two deep neural networks, the first one specialized in Neural Machine
Translation (NMT) based on the well-known Seq2Seq [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] model and the second one used for
extracting the entities from the question using the Named Entity Recognition (NER) technique.
        </p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Neural Machine Translation</title>
          <p>
            The network focused on NMT is used to translate the question into a SPARQL  .
The network is based on an Encoder-Decoder model with Luong’s attention [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], in which the
Encoder extracts semantic content from the question in natural language and encodes it into
a fixed-dimensional vector representation  . Instead, the Decoder tries to decode  into a
sequence in the output language ( ).
          </p>
          <p>
            The Encoder is composed of an input layer that receives a question in natural language
converted into a sequence of word-embeddings obtained by mean of FastText [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ], in the form
{1, 2, ..., }, where  is the vector representation of the word  in the sentence. Next, we
use a Bidirectional LSTM (BiLSTM) to summarize {1, 2, ..., } into  , in forward and reverse
orders.  is formed by concatenating the last hidden states in the two directions.
          </p>
          <p>On the other hand, during the training process, the Decoder is responsible for calculating the
word-embeddings of the output language tokens (SPARQL), which is used together with the
vector  , provided by the Encoder, as input to a Luong-Decoder layer. This layer is responsible
for decoding the sentence supported by the attention mechanism. Finally, the values are feed
to a Fully Connected Network with a Softmax activation function that predicts the output
sequence by calculating the conditional probability over the output vocabulary. Figure 1 shows
the described network architecture.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Named Entity Recognition</title>
          <p>
            To perform the entity recognition, we created a BiLSTM-CRF [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] network that constitutes
state-of-the-art for this type of task. In this case, we again used FastText to obtain the
wordembeddings and deal with OOV words. The model is composed of an input layer that receives
the sequences of embeddings, followed by a BiLSTM connected to a Fully Connected layer.
Finally, the information flows through a CRF layer that predicts the final sequence of tags.
Figure 2 shows the described network architecture.
          </p>
          <p>Finally, we mixed the results of both networks to obtain the final query ′ . Here, the
placeholders in the   are replaced by the corresponding entities obtained with
the NER network.</p>
          <p>To better understand how sparql-qa works, we can translate, for example, the question: Where
is Washington Monument located? First, the question is cleaned and split into tokens and then
converted into a sequence of fastText word-embeddings. Then, the sequence is processed by
the two networks used by our system. The NMT network translates the question into the
corresponding QueryTemplate:
SELECT DISTINCT ?a WHERE { ?w dbo:location ?a . ?w rdfs:label $1 }
successively the NER network calculates the tagging sequence: O O B I O, where it is indicated
that the entity to be considered is Washington Monument (Positions 2 and 3 of the tagging
sequence, respectively). Finally, in the composition phase, the results are mixed, obtaining the
ifnal query:
SELECT DISTINCT ?a WHERE { ?w dbo:location ?a .</p>
          <p>?w rdfs:label "Washington Monument"@en }</p>
          <p>It is important to note that the previous example describes the translation operation in general
terms and does not go into the more complicated details of the process.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments on Monument dataset</title>
      <p>
        Setup. The Monument dataset was proposed as part of the Neural SPARQL Machines (NSpM)
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] research. It contains 14,778 question-query pairs about the instances of type monument
present in DBpedia.
      </p>
      <p>
        We compared our system with the state-of-the-art, thus, we have trained the Learner Module
of NSpM as it was done in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], where the authors proposed two instances of the Monument
dataset that we will denote by Monumet300 and Monument600 containing 8,544 and 14,788
pairs, respectively. In both cases, the dataset split fixes 100 pairs for both validation and test set
and keeps the rest for the training set. All the data is publicly available in the NSpM GitHub
project.2
      </p>
      <p>We have implemented our system, called sparql-qa, by using Keras, a well-known framework
for machine learning, on top of TensorFlow. We trained the networks by using Google
Collaboratory, which is a virtual machine environment hosted in the cloud and based on Jupyter
2https://github.com/LiberAI/NSpM/tree/master/data
Notebooks. The environment provides 12GB of RAM and connects to Google Drive. To train our
system, we first performed hyperparameter tuning focused on three metrics: embedding-size of
the target language, batch size, and LSTM hidden units. The task was performed by using a
grid search method. We set the number of epochs to 5, shufling the dataset at the end of each
one. After tuning, we set the hyperparameters of the two networks as follows: embedding-size
is set to 300, LSTM hidden units are set to 96, and batch size is set to 64.</p>
      <p>For comparing performance, we adopted the macro precision, recall, and F1-score measures,
which are the most used ones to assess this kind of system.</p>
      <p>Results. Results of the execution reported in Table 3 show that sparql-qa performs reasonably
well, reaching F1-score values greater than 0.7. On the other hand, NSpM achieves better results.</p>
      <p>We have investigated why our system could not provide an optimal answer for some
questions. This analysis evidenced that the performance of our approach is mainly afected by
problems in the dataset. Indeed, there is a set of questions that lacks context to determine
specific expected URIs. For example, for the question “What is Washington Monument related
to?” our system uses “Washington Monument”, but the gold query uses the specific URI:
Washington_Monument_(Baltimore). Note that there is no reference to Baltimore in the question
text, and there are Washington Monuments also in Milwaukee and Philadelphia, according to
DBPedia. Surprisingly, the compared system can often use the specific URI of the gold query
even without context. Thus, we run another experiment to better outline the issue. We create a
new test set of 200 pairs by using the templates provided by NSpM and a randomly selected set
of unseen monument entities extracted from DBpedia. Table 4 shows that our approach has the
same good performance (F1 score greater than 0.78) and performs much better than NSpM that
is not able to generalize to deal with OOV (F1 of 0.11).</p>
      <p>
        Another cause that afects our approach is the correctness with which the named entities are
written. Sometimes the entities mentioned in the question do not match the rdfs:label property
value, and sometimes they are referenced using acronyms. In these cases, our system will not
give the expected answers because it cannot reference the right DBpedia resources. To address
these issues, we plan to use Named Entity Linking (NEL) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which allows us to determine
accurately which DBpedia resources are present in the question.
      </p>
      <p>Finally, for completeness, we report that the intermediate format allows us to save 40% of
training time.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>The paper presents preliminary results on an approach for querying SPARQL knowledge bases
by using natural language. We combine in our system both neural machine translation and
named entity recognition modules and focus on attenuating the impact of the OOV words, an
important issue that is not well considered in existing approaches. Our system showed good
preliminary results on the Monument dataset and demonstrated a more general and robust
behavior than state-of-the-art approaches.</p>
      <p>In future work, we plan to extend our system to improve translation performance by
integrating other NLP tools, such as Named Entity Linking and BERT contextual word embeddings.
We also plan to extend our experiments by considering other well-known QA benchmarks.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the Italian Ministry of Economic Development (MISE)
under projects “MAP4ID - Multipurpose Analytics Platform 4 Industrial Data”, N.
F/190138/0103/X44.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Kleef</surname>
          </string-name>
          , et al.,
          <article-title>Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          ,
          <source>Semantic Web</source>
          <volume>6</volume>
          (
          <year>2015</year>
          )
          <fpage>167</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <issue>W3C</issue>
          , Semantic web standards,
          <year>2014</year>
          . URL: https://www.w3.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Long short-term memory</article-title>
          ,
          <source>Neural Computation</source>
          <volume>9</volume>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Sequence to sequence learning with neural networks</article-title>
          ,
          <source>in: NIPS</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Bidirectional LSTM-CRF models for sequence tagging</article-title>
          ,
          <source>CoRR abs/1508</source>
          .
          <year>01991</year>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pradel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Haemmerlé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <article-title>Natural language query interpretation into sparql using patterns</article-title>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Steinmetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sattler</surname>
          </string-name>
          ,
          <article-title>From natural language questions to SPARQL queries: A pattern-based approach</article-title>
          , in: BTW, volume P-289
          <string-name>
            <surname>of</surname>
            <given-names>LNI</given-names>
          </string-name>
          , Gesellschaft für Informatik, Bonn,
          <year>2019</year>
          , pp.
          <fpage>289</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <article-title>Seq2sql: Generating structured queries from natural language using reinforcement learning</article-title>
          ,
          <source>CoRR abs/1709</source>
          .00103 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Luz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Finger</surname>
          </string-name>
          ,
          <article-title>Semantic parsing natural language into SPARQL: improving target language representation with neural attention</article-title>
          , CoRR abs/
          <year>1803</year>
          .04329 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Soru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moussallem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Publio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valdestilhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Esteves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. B.</given-names>
            <surname>Neto</surname>
          </string-name>
          ,
          <article-title>SPARQL as a foreign language</article-title>
          ,
          <source>SEMANTiCS 2017 - Posters and Demos</source>
          (
          <year>2017</year>
          ). URL: https://arxiv. org/abs/1708.07624.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          , E. Marx, T. Soru,
          <article-title>Generating a large dataset for neural question answering over the DBpedia knowledge base (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hua</surname>
          </string-name>
          , G. Qi,
          <article-title>Formal query building with query structure prediction for complex question answering over knowledge base</article-title>
          ,
          <source>in: IJCAI</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gromann</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Rudolph,</surname>
          </string-name>
          <article-title>Neural machine translating from natural language to SPARQL</article-title>
          , CoRR abs/
          <year>1906</year>
          .09302 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Auli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grangier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yarats</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. N.</given-names>
            <surname>Dauphin</surname>
          </string-name>
          ,
          <article-title>Convolutional sequence to sequence learning</article-title>
          ,
          <source>in: ICML</source>
          , volume
          <volume>70</volume>
          <source>of Proc. of ML Research</source>
          , PMLR,
          <year>2017</year>
          , pp.
          <fpage>1243</fpage>
          -
          <lpage>1252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roman</surname>
          </string-name>
          , et al.,
          <article-title>Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task</article-title>
          , arXiv preprint arXiv:
          <year>1809</year>
          .
          <volume>08887</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Yu,
          <string-name>
            <given-names>H. Y.</given-names>
            <surname>Er</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. V.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radev</surname>
          </string-name>
          ,
          <article-title>Editing-based sql query generation for cross-domain context-dependent questions</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>00786</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Efective approaches to attention-based neural machine translation</article-title>
          ,
          <source>arXiv preprint arXiv:1508.04025</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information</article-title>
          ,
          <source>TACL</source>
          <volume>5</volume>
          (
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , J. Han,
          <article-title>Entity linking with a knowledge base: Issues, techniques, and solutions</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>27</volume>
          (
          <year>2014</year>
          )
          <fpage>443</fpage>
          -
          <lpage>460</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>