<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Translating data science queries from natural language into graph analytics queries using NLDS-QL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Genoveva Vargas-Solar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karim Dao</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Javier A. Espinosa-Oviedo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNRS</institution>
          ,
          <addr-line>Univ Lyon, UCBL, INSA Lyon, LIRIS, UMR5205, Villeurbanne</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CPE Lyon</institution>
          ,
          <addr-line>Villeurbanne</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University Paris Dauphine</institution>
          ,
          <addr-line>Tunis</addr-line>
          ,
          <country country="TN">Tunisia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces NLDS-QL1, a translator of data science questions expressed in natural language (NL) into data science queries on graph databases. Our translator is based on a simplified NL described by a grammar that specifies sentences combining keywords to refer to operations on graphs with the vocabulary of the graph schema. This paper shows NLDS-QL in action within a scenario to explore and analyse a graph base with patient diagnoses generated with the open-source Synthea.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;data science queries</kwd>
        <kwd>graph analytics</kwd>
        <kwd>natural language processing</kwd>
        <kwd>graph stores</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>questions, formulated by non-technical experts over data
collections. Allowing their expression in natural
lanThe volume of connected data, often modelled as graphs, guage (NL) would optimize data accessibility. However,
has grown exponentially. The availability of these graphs this facility implies complex NL analysis, particularly
data collections has been democratised through social when such questions are intended to be transformed into
networks and knowledge graphs used to explore con- sophisticated workflows. Thus, to achieve our goal, we
tent (e.g. scientific papers, clinical cases). Although this address the problem under a reverse-engineering
strataccessibility is promising, it introduces a barrier for non- egy: we build a simplified natural language (NL) grammar
experts, who have to familiarise with the nature of the and map NL data science questions to graph data science
data, the way they have been represented in the database queries as those proposed by Neo4J DS templates. The
and the specific query languages or user interfaces to users can then express their questions using this
simpliaccess them. ifed NL with sentences that correspond to the DS Neo4J</p>
      <p>Besides, the emergence of data science has brought a templates, seeing and assessing the results and proposing
new type of ’complex’ queries embodying a data analysis new questions over an enriched vocabulary that can extend
scenario. A data science query generally refers to a work- the NL query language treated by our tool.
lfow of tasks including exploration, data cleaning and
preparation, sampling and analysis. These workflows
include visualisation and evaluation tasks that involve the
calculation of scores and metrics. Implementing these
workflows is a challenge even for engineers and data
scientists. In most cases, users should have advanced
skills in querying and analysing the data according to
their needs and the type of search questions to be
answered. Requiring formal and technical expertise from
non-computer scientists is not obvious or reasonable.</p>
      <p>
        In our work, the goal is to identify important research
a collection of data corresponding to medical follow- 1. Approaches that address NL translation on
ups, doctors (users) express the questions whose answers SPARQL for knowledge graph queries, such as [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
would be helpful for the elaboration of a diagnosis and concerning machine learning techniques
(Treetheir decision making. Questions described in written or LSTM and neural networks), and methods based
spoken English can denote navigational, aggregation and on grammar and logical predicates [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ].
data science queries requiring centrality and clustering 2. Approaches that translate questions into
strucalgorithms to be expressed and answered. Given the am- tured queries using NL processing methods such
biguity of the NL, the translation of questions can lead to as: named entity recognition, binary relationship
several Cypher (data science) queries. So the use of NLDS- (pattern) extraction, key entity identification, and
QL is proposed under a conversational pipeline where relationship mapping to graph components [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
tbduheisfeeesrmrtescn.aoNtcrCqroeunysa-ppeihnoxentperdedqsrtwutuoeirtsthieherCses,iycrapanenhxaepldyreesccceatiadnttheiceothionroslorebestesetufhtolhetresesaqyenusxtdeeercmtyuhtterihnunangt
iftrehmmeTetihrhsgeaseutlneitthceoeerfaroatefunsrusdewalttaseagrmrsvecaaeitlecsinhdtcahtethai,oettnwqli,utoite.lereq.yuwh’seoosiwtrniktoecnhnasant?saarWaiudsisedte:hrreHtschsooeenwditc5SSSQnhheeecLgecccoTotttaoinhNiiimnooosceLenndnlpurolt2t34edehonmemsddeeysqeeaetnoustsainhcctnresthrerrkdaaiieybsbtpedsiereajoiplusssapnroesnoettsfthahrpgttetneouahhhdnsgaeeeeedgexiedmrnpepxdseaqtpeaiorpsurietancirereumluairsfsmasnateeisrimnssreocelsnthnsaioaltfi.tlirloitieeogeosrssncacnsttoenoeuofnanisrfsntwaseeiordnoldoiuegorfpraakifrrNasstono.pLrafceSpoDdedNerldsSclofLsorr-taieDiQowncosmShgLsns--.... wlftsuNmaooasLhtermeOiemteorehxnpu-nxufoplrrpd?lidrianwereetetgHanseohslsddiroasanklinwtdbyogdhaypniwtitrnnaagooatinohnpteemhnalrdoxifl-gsNoypattehercdslLsoatceiessnhmrala,aensndavetsliaaindisiconstpeiadiunctntoleaea,ptenrldilcxraaiqioslpfclneuoacteaateirtersovinrsttpoeishisdeenn?reascrge.tHatemvasInsatoeeetsdnrwiodacssdimriebtectc-hoepnaeheelrctncaerpeqbolgipleaucnioqanrsseswusireisscoeadetatrnftierhoioilrnieocenrneseunidgas?rl-,.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
    </sec>
    <sec id="sec-3">
      <title>3. NLDS-QL</title>
      <p>Existing work has addressed data mining using NL, but
not yet the expression of data science questions. There
are diferent approaches to developing a NL interface
for database queries, in general for relational systems.</p>
      <p>
        Georgia Kutrika [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] describes the process of handling
NL queries with a workflow that consists of three steps.
      </p>
      <p>
        Given a relational database and assuming knowledge of
the schema vocabulary: 1. Analysis of the NL query
expression; 2. Disambiguation and interpretation, which
produces a set of ranked interpretations; 3. Finally, the
translation into SQL and its execution. Three
generations of NL Query to SQL Transformation systems can
be identified [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], namely:
The general process implemented by NLDS-QL is shown
in Figure 1. The first two phases of our translation
approach are devoted to analysing the NL query, which is
expressed as a text (see the NL processing and NL parsing
phases in Figure 1). The text can be written or defined
as a voice message and transcribed into text. Therefore,
the NL processing phases implement the classical text
processing of syntactic analysis to produce an expression
tree that represents the query (this is done by a parser
as shown in Figure 1). The tree is then processed to
produce one or more corresponding Cypher queries in the
query generation phase (see the query generation phase
in Figure 1). Finally, the queries are evaluated on Neo4J
1. Keyword based i.e. information retrieval tech- (see the query evaluation phase in Figure 1).
niques to evaluate queries. For example,
systems like Discover (query interpretations as
subgraphs), DiscoverIR [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], Spark (ranking and
fast execution) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Overview of NLDS-QL expressions. The expression
of NLDS-QL questions is based on the way data science
operations are applied on graphs in Neo4J. Neo4J
de2. NL processing based like NaLIR (parser) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], fines a general template including several commands for
      </p>
      <p>
        ATHENA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (ontologies and mappings). expressing the execution of a DS query.
3. Machine translation using neural networks [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], DS operations are generally applied on graph views
like NL to SQL conversion as a language trans- created in memory from persistent graphs. The views
lation problem. The challenge is training a neu- require main memory space to be allocated for creating
ral network on a large number of NL/SQL query them and main memory resources for using algorithms
pairs. Approaches like SQLNet [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Hydranet with specific execution conditions expressed in
paramadopt this strategy. eters. Thus, Neo4J provides commands for performing
On the other hand, we identify two families of works these estimations and then calling DS operations with
concerning NL Graph Querying. given parameters’ values. Finally, DS operations can yield
      </p>
      <sec id="sec-3-1">
        <title>Grammar constructor</title>
        <sec id="sec-3-1-1">
          <title>GraphQueries</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Taxonomy</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>Building Grammar</title>
          <p>Query rules &amp;
vocabulary</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Grammar building</title>
      </sec>
      <sec id="sec-3-3">
        <title>Speech NL query</title>
      </sec>
      <sec id="sec-3-4">
        <title>Recognition</title>
        <sec id="sec-3-4-1">
          <title>Original folder</title>
        </sec>
        <sec id="sec-3-4-2">
          <title>Conversion to way folder</title>
        </sec>
        <sec id="sec-3-4-3">
          <title>Resampling</title>
        </sec>
        <sec id="sec-3-4-4">
          <title>Speech to Text</title>
          <p>Speech to Text
cleaning</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>NL processing</title>
        <sec id="sec-3-5-1">
          <title>Graph database</title>
        </sec>
        <sec id="sec-3-5-2">
          <title>Cypher query execution</title>
          <p>Query
evaluation</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Applying NLDS-QL for</title>
      <p>exploring a medical graph
new graphs that can be named and persist or not. The
creation of new graphs and whether to persist them is
expressed as function call commands.</p>
      <p>Consequently, the definition of data science questions
in NL include expressions for specifying the commands
specified in the Neo4J template. The most simple
expression for defining a graph view and estimating the memory
required assuming that it is stored in Neo4J and that the
graph schema with the nodes and relations is available,
is defined with the following English expression:
We set up an experiment to validate our approach.
Therefore we use the patient part of the Synthea Generic study.</p>
      <p>The Synthea’s Generic Graph2 models various diseases
conditions that contribute to the medical history of
synthetic patients 800K vertices and nearly 2000K edges.</p>
      <p>Querying graphs is based on navigational queries,
which retrieve information already "contained" in a graph.
• Create and estimate memory for the graph view In the Synthea graph, it is possible to ask simple queries
&lt;subgraph-name&gt; [named as &lt; view name &gt;] like : How many patients are there in the Synthea study?
with the node &lt; node name &gt; and the relationship Which allergies are identified in the Synthea study
pa&lt; relationship name &gt; [oriented] tients? But it is also possible to go further and ask
analytical type questions that involve classification tasks</p>
      <p>The data science operation task includes estimating such as What are the most frequently prescribed drugs
the cost in memory of applying a graph data science al- for patients in the Synthea study? Answering this type
gorithm on the graph, using the algorithm on the graph of query involves performing a sequence of tasks ordered
view. According to specific keywords, NLDS-QL can in a pipeline, which we call a data science query.
determine the type of algorithm that can be applied.
Keywords like most important, most popular, most influential
refer to centrality algorithms such as PageRank and
Louvain and classify, communities, group can refer to
clustering algorithms like Label Propagation, as illustrated by
the following three questions:
• 1 : Estimate the required memory for applying
&lt; DS algorithm name&gt; on the graph view &lt; view
name &gt;
• 2 : Find the most important/most popular &lt; node
name &gt; with &lt; relation name &gt; [in the graph &lt;
graph name &gt;] with &lt; number of iterations &gt;
maximum of iterations and with a damping factor
&lt; floating number &gt;
• 3 : Classify/Find groups/communities of &lt; node
name &gt; within the view &lt; view name &gt; with
relation &lt; relation name &gt; with &lt; number of 2https://xilinx.github.io/graphanalytics/recom-tg3/
iterations &gt; maximum of iterations synthea-overview.html</p>
      <p>Use case The use case is based on the Synthea patients
graph shown in Figure 2. It describes the immunisations,
allergies, conditions, studies, procedures and care plans
of patients. Each entity and its relations are characterised
by properties that describe them. The patient graph has
approximately 100 thousand nodes and 37 thousand
relations stored on Neo4J.</p>
      <p>The use case of NLDS-QL on the Synthea patients
graph is based on a conversational pipeline where expert
and non-expert users can ask questions to start
exploring the graph (see Figure 3). The use case environment
initially shows the Synthea patients graph, and users can
ask for details about the description of the graph, like the
number of nodes and relations. Then, the user can ask a
question. The system generates one or several queries,
Inmunizations
Allergies</p>
      <p>Procedures
Patient_had_immunization
Patient_had_allergy
Patient_had_condition
• Specifying a graph view from the patient graph,
as Neo4J works with graph views stored in RAM
when data science algorithms are applied.
• Then it is possible to generate two queries that
call the page rank algorithm to process the
keyword "most important" with the possibility to
make the view persistent and consider the
constraints related to the parameters of the Pagerank
algorithm.</p>
      <p>Centrality.</p>
      <p>Find the most popular Encounters for Medications in the
graph.</p>
      <p>MATCH
(n:Encounters)[r:ENCOUNTER_FOR_MEDICATION]-()
with n,count(*) as degree
return id(n), degree
ORDER BY (degree) DESC
and then the user can either choose one or several queries
to be adjusted or executed and then modified (see right
side of Figure 3). For every choice, the user can
evaluate the system’s performance with stars that show the Find the most important Drugs prescribed for the PATIENT
degree of satisfaction. with a maximum of 25 iterations and a damping factor of</p>
      <p>For exploring the Synthea graph, the use case proposes 0.60.
a set of queries that include navigational queries of the
type selection, projection, aggregation.</p>
      <p>Selection. Find the Medications for which the
DESCRIPTION is Lisinopril 10 MG Oral Tablet and the REASON of
the DESCRIPTION is Hypertension.</p>
      <p>CALL gds.graph.create(
’my_graph’,
’Medications’, {</p>
      <p>PATIENT_HAS_MEDICATION: {
orientation: ’NATURAL’
MATCH (n:Allergies)
return n.DESCRIPTION
MATCH (n:Patients)
return n.BIRTHPLACE
Projection. Which is the birthplace of the PATIENTS in
the study?
Selection and Projection. Find the Encounters
DESCRIPTION node where the DESCRIPTION of the drugs is
Amlodipine 5 MG Oral Tablet.</p>
      <p>MATCH
(n:Encounters)-[*]-&gt;
(m:Medications {</p>
      <p>DESCRIPTION:</p>
      <p>’Amlodipine 5 MG Oral Tablet’
})
return n.DESCRIPTION, m.DESCRIPTION
Aggregation. How many patients are caucasian?
})</p>
      <p>}
CALL gds.pageRank.write.estimate(
’my_graph’, {
writeProperty: ’pageRank’,
maxIterations: 25,
dampingFactor:0.60
})
YIELD nodeCount, relationshipCount,
bytesMin, bytesMax,
requiredMemory
CALL gds.pageRank.</p>
      <p>stream(’my_graph’)
YIELD nodeId, score
RETURN gds.util.</p>
      <p>asNode(nodeId).name AS name,
score
ORDER BY score DESC LIMIT 10
MATCH (n:Patients {RACE:’white’}) In this example, NSDL-QL generates the template that
return count(n) includes first the graph "my_graph". Then it computes
For data science queries, the use case shows NLDS-QL the estimation of required memory, number of nodes and
questions that refer to centrality type operations. Note relations, minimum and maximum bytes that will yield
that the translation is quite complex as it involves: the resulting graph when executing PageRank with the</p>
      <p>NLDS-QL:&gt; Show the Synthea graph
specified parameters. Then the call to the algorithm with
the result format with the top 10 nodes associating each
node with its score.</p>
      <p>Community detection. The translation also involves
several operations as described in the definition of data
science queries.</p>
      <p>Get the subgroup of Patients who have
TIENT_HAS_CAREPLAN in the graph with
iterations 20</p>
      <p>CALL gds.labelPropagation.</p>
      <p>write.estimate(
’my_graph’, {</p>
      <p>writeProperty: ’community’
})
YIELD nodeCount, relationshipCount,
bytesMin, bytesMax,
requiredMemory
CALL gds.labelPropagation.</p>
      <p>stream(’my_graph’,
{maxIterations: 20})
YIELD nodeId, communityId
RETURN communityId,</p>
      <p>count(nodeId) AS size</p>
      <p>ORDER BY size DESC LIMIT 5</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Results</title>
      <p>This paper introduced NLDS-QL and showed through a
use case how to map NL data science questions (using
an adapted vocabulary) to Neo4J data science query
templates. The use case aimed at querying and analysing a
graph in the medical domain. Users with medical and
non-medical backgrounds can define a sequence of
natural language queries executed step by step to explore
the graph, as in data science questions. Thereby users
PA- can acquire an understanding of medical prescriptions
max proposed to patients by classifying their treatment, their
physiological characteristics to better understand how
diseases are diagnosed and treated according to patients
conditions. In this way, we showed the essential aspects
of a data science query template expressed in NL.</p>
      <p>The approach is flexible and can be enhanced for
processing documents with richer NL vocabulary and more
complex templates. The intervention of a human in
handling natural language queries calls for the design of
an interactive strategy based on conversation. We have
started to design a more evolved conversational
interface considering human in the loop and user profiling
techniques.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>G. Koutrika,</surname>
          </string-name>
          <article-title>The rise of intelligent data assistants: Democratizing data access - keynote</article-title>
          , 4th
          <source>International Workshop on Big Data Visual Exploration and Analytics - EDBT/ICDT Workshops</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Gkini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Belmpas</surname>
          </string-name>
          , G. Koutrika,
          <string-name>
            <surname>Y. Ioannidis,</surname>
          </string-name>
          <article-title>An in-depth benchmarking of text-to-sql systems</article-title>
          ,
          <source>in: Proceedings of the 2021 International Conference on Management of Data</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>632</fpage>
          -
          <lpage>644</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schilder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Smiley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zielund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bretz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Duprey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          , et al.,
          <article-title>Tr discover: A natural language interface for querying and analyzing interlinked datasets</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2015</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lakhanpal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <article-title>Discover trending domains using fusion of supervised machine learning with natural language processing</article-title>
          ,
          <source>in: 2015 18th International Conference on Information Fusion (Fusion)</source>
          , IEEE,
          <year>2015</year>
          , pp.
          <fpage>893</fpage>
          -
          <lpage>900</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Thomas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Natural</given-names>
            <surname>Language Processing with Spark</surname>
          </string-name>
          <string-name>
            <surname>NLP</surname>
          </string-name>
          :
          <article-title>Learning to Understand Text at Scale, "</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc."</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <article-title>Nalir: an interactive natural language interface for querying relational databases</article-title>
          ,
          <source>in: Proceedings of the 2014 ACM SIGMOD international conference on Management of data</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>709</fpage>
          -
          <lpage>712</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Floratou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sankaranarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U. F.</given-names>
            <surname>Minhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Özcan</surname>
          </string-name>
          ,
          <article-title>Athena: an ontologydriven system for natural language querying over relational data stores</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>9</volume>
          (
          <year>2016</year>
          )
          <fpage>1209</fpage>
          -
          <lpage>1220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Baevski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Auli</surname>
          </string-name>
          , wav2vec
          <volume>2</volume>
          .
          <article-title>0: A framework for self-supervised learning of speech representations</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>11477</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          , Sqlnet:
          <article-title>Generating structured queries from natural language without reinforcement learning</article-title>
          ,
          <source>arXiv preprint arXiv:1711.04436</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stockinger</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. M. de Farias</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Anisimova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gil</surname>
          </string-name>
          ,
          <article-title>Querying knowledge graphs in natural language</article-title>
          ,
          <source>Journal of big Data</source>
          <volume>8</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Neural-machine-translation-based commit message generation: how far are we?</article-title>
          ,
          <source>in: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>373</fpage>
          -
          <lpage>384</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Oro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rufolo</surname>
          </string-name>
          ,
          <article-title>A natural language interface for querying rdf and graph databases, Consiglio Nazionale delle Ricerche Istituto di Calcoloe Reti</article-title>
          and Alte
          <string-name>
            <surname>Prestazioni</surname>
          </string-name>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>