<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Natural Language Data Interfaces: From Keyword Search to ChatGPT, are we there yet?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>George Katsogiannis-Meimarakis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christos Tsapelas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgia Koutrika</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Athena Research Center</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Enabling users to query data in a relational database using natural language has long been considered the holy grail of the database community. Towards this direction, there has been an increasing research focus on Natural Language Data Interfaces that allow users to pose queries in natural language and translate these queries to the underlying database query language. Several approaches have emerged especially due to the recent advances in deep neural networks. Despite this blooming, not only these systems are very complicated and dificult to understand, but they have yet to deliver their promise of enabling users to use natural language to access data easily. Hence, they have failed to see widespread adoption. A question naturally arises: is natural language access to data going to be the elusive holy grail of databases? We hope not. With the aim of fostering research on these open issues, in this position paper, we discuss (currently unmet) requirements for efective natural language data exploration and highlight promising research directions. Finally, we describe how to rethink the text-to-SQL problem and how this should be realized as an integral capability of a DBMS for the realization of a system that fully supports natural language queries over data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;natural language interfaces</kwd>
        <kwd>data exploration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
deep learning based systems, have not seen widespread
adoption and have yet to deliver their promise of
en“If we are to satisfy the needs of casual users of databases, abling users use natural language to search data. In this
we must break the barriers that presently prevent these position paper, we delve into the limitations of existing
users from freely employing their native language" (E. F. approaches and we discuss lessons learnt. We argue that
Codd, 1974) [1]. Enabling users to query data using natu- despite the recent bloom of approaches, these focus on
ral language has long been the “holy grail” of the database certain aspects of the problem (e.g., improving the
transcommunity. Research on Natural Language Data Inter- lation accuracy over a specific dataset [ 14]) and largely
faces (NLIDBs) started almost as early as the first DBMS miss the big picture. We identify important requirements
emerged, in the 70’s [2]. Early systems enabled keyword for natural language data interfaces and highlight open
searches. They relied on data indexes to find relations challenges and promising research directions.
that contained the query keywords and on the database
schema to join them and return the answer to a query
(e.g., [3, 4]). Parsing-based approaches parsed the input 2. The Inherent Challenges of NL
question to understand its grammatical structure and Data Interfaces
them map it to the structure of the desired SQL query
(e.g., [5, 6]). Recently, there has been a growing interest
in neural machine translation approaches for learning
natural language data interfaces [7, 8, 9] (see [10] for a recent
survey). Advances in NLP, such as the introduction of
Transformers [11], has given a boost in the area, while
the latest developments such as ChatGPT [12] seem to
promise that human-like conversation is more plausible.</p>
      <p>But are we really close to “talking to our databases” [13]?</p>
      <p>Unfortunately, existing eforts, including the latest</p>
      <p>On the other hand, SQL has strict syntax, which leads queries the methods can understand as well as the
diferto limited expressivity compared to natural language. Fur- ent databases they can work on. There are also important
thermore, while a sentence in natural language may con- limitations. In practice, all methods focus on
limitedtain some mistakes, and still be understood by a human, scope problems and their accuracy severely degrades
a SQL query needs to be syntactically and semantically with more complex and diverse NL and SQL queries [17].
correct in order to be executable over the underlying data. Furthermore, they depend on training data and cannot
In fact, the above limitations create enormous challenges cope with unseen databases and queries. For example,
for translating NL to SQL queries. SQL’s strict syntax Spider [18], a large-scale text-to-SQL dataset that is very
may lead to cumbersome translations. A relatively simple popular for training and evaluating text-to-SQL systems,
NL query may map to a complex SQL query. For example, contains queries over 200 relational databases from 138
“Return the movie with the best rating” maps to a nested diferent domains. However, these are toy databases with
SQL. While the original NL query is simple, building the simple schemas and small sizes that fail to reflect the
complex SQL query may be a tough call for the system. characteristics and dificulties of real-world DBs.</p>
      <p>All these challenges make the text-to-SQL problem so Furthermore, models used so far are typically quite
hard. Not only it is dificult to understand a NL query but complex and large, questioning their practical use in a
it is also dificult to build the correct SQL query. Even complex system, like a database engine 1. Moreover, most
similar questions may lead to a diferent outcome over dif- ML approaches used support poor and size-limiting input
ferent databases: one may be translated over one database representations that cannot possibly leverage the wealth
and the other may not, due to issues such as ambiguity, of database information comprising hundreds of tables
paraphrasing, and diferent schemas. and attributes, data values, and queries.</p>
      <p>These limitations become highly relevant when
applying a text-to-SQL system to an actual database [14] used
3. Existing Approaches in a business, research or any other real-world use case.
Such databases can pose dificulties not encountered in
the datasets used to train and evaluate such systems, for
example, a large number of tables and attributes and table
and column names that use domain-specific terminology.</p>
      <p>The “Database way”. One category of approaches tackle
the text-to-SQL problem as a mapping problem [3, 4]:
how to map query elements to database elements (tables,
columns and values) and then find the desired
interconnections of these data elements that capture the user
intent. In addition, parsing-based approaches parse the 4. Going Forward: Requirements
input question to understand its grammatical structure,
which is then mapped to the desired SQL query [5, 6]. and Opportunities</p>
      <p>The “Machine Learning way”. The other approach The challenges of the text-to-SQL problem, the
aforemenis to tackle the text-to-SQL problem as a language trans- tioned observations as well as our own experience with
lation problem, and train a neural network on a large working with and evaluating several text-to-SQL systems
amount of {NL query/SQL} pairs [10]. Originally, these [10, 16] point to a set of requirements for a NLIDB.
systems ignored the underlying database, and they did
not ensure that the generated SQL is syntactically and R1. Query expressivity: Using a query language such
semantically correct, i.e., executable over this database. as SQL, the user knows exactly what queries are possible.
To address this problem, recent approaches employ two In a similar vein, the set of NL and SQL queries that a
additional techniques. First, schema linking aims at the NLIDB supports should be clearly defined so that a user
discovery of possible mentions of database elements in is aware of the available query capabilities.
the NLQ. These discovered schema links, along with the R2. Data independence: A NLIDB should support the
rest of the inputs, are fed into the neural network that same query expressivity for diferent databases. In other
is responsible for the translation. Second, output refine- words, the same type of NL query should be possible over
ment can be applied on a trained model to avoid produc- any database. For example, if the user could ask “what
ing incorrect SQL queries [15]. is the average X of Y” in one database, then this type of</p>
      <p>Limitations and Lessons Learnt. The “Database query should be possible in any other database.
way” can handle diferent types of SQL queries and can R3. Performance: Allowing the users to express
queswork on any database. However, existing approaches tions in NL should free them from using SQL but also,
struggle with more complex and diverse NL queries and from how their question will be executed eficiently. The
cannot easily cope with NL challenges, such as synonyms, system should transparently find the most eficient way
paraphrasing and typos [16].</p>
      <p>The “Machine Learning way” promises to be more
generalizable both in terms of the diferent types of NL
1The training cost as well as the energy consumption [19] of such
big models are important concerns.
to answer a NL query, minimizing both the translation pendence and some other parts may use neural models to
overhead and the execution cost of retrieving the results. generalize system knowledge, for example on the
diver</p>
      <p>R4. Scalability: A NLIDB should be feasible and scal- sity and complexity of NL queries. How would a system
able over any database. that combines such capabilities look like?
Requirement R3 poses a serious challenge. While the</p>
      <p>Requirement R1 is important because up to now al- state-of-the-art systems are still dealing with “getting
most none of the known text-to-SQL systems provides the answer right”, they are mostly overlooking the
“geta clearly defined query language or specification of its ting the answer fast”. Improving translation speed by
query capabilities. For the user, it is a trial-and-error pro- building eficient methods is necessary. But this may
cess to see what queries can be understood and answered not be enough. Text-to-SQL systems originating from
by the system. Is it possible to come up with a query the DB community not only tried to generate correct
language specification that systems can refer to in order SQL queries but also optimal in terms of execution speed.
to describe their query expressivity? Hence, many of them contained logic for generating code</p>
      <p>Towards R1, a query categorization in the spirit of [16] that would return the desired results fast. This may be
may be a good starting point. This could enable the cre- necessary for a NLIDB. The database community could
ation of appropriate benchmarks for the comparison of come up with benchmarks that focus on eficiency (not
the query capabilities of diferent systems. Even devising just efectiveness) and allow evaluating systems based
an appropriate query categorization and an appropriate on execution time and resource consumption.
benchmark raises several challenges: what categories to R4 highlights the need for realistic solutions. Deep
choose, what queries should be in each category, which learning text-to-SQL systems typically rely on very
comdatasets to use. Furthermore, one should take into ac- plex models, which have been trained and evaluated on
count SQL equivalence (diferent SQL queries that return toy databases (contained in existing benchmarks). In
sevthe same results), and NL ambiguity (a NL query may eral cases, it may not be possible to have the required
have more than one correct translation over the data). resources to train such enormous models. Furthermore,
Unfortunately, existing benchmarks fail to address the since these models require that the database schema
query expressivity question. For instance, Spider has four is given as input, they do not scale well to very large
very coarse-grained classes of queries. databases, with hundreds of attributes and tables (such</p>
      <p>Requirement R2 complements R1 in saying that the as astrophysics and biological data). Instead of focusing
same query expressivity should be supported over any on increasing the model complexity aiming at translation
database. This comes naturally with query languages accuracy, we need to design solutions that also take into
such as SQL. For instance, SPJ queries can be supported account system eficiency, complexity, and scale.
over any data. For a NLIDB, that does not hold. Going To further move the needle, we may need to rethink
from one database to another, the same type of queries our approach to the problem. The text-to-SQL problem
may not be supported. As we have already pointed out, has been seen as a mapping or a language translation
this is a major concern for deep learning systems. A problem. This is an oversimplification, and in fact the
system trained over Spider will not work over a new text-to-SQL problem comprises (at least) three (connected)
domain such as astrophysics or cancer research. problems: a representation problem (what is asked), a</p>
      <p>One could build specialized, domain-specific bench- planning problem (how to answer it) and an
optimizamarks for training and evaluating text-to-SQL systems in tion problem (how to execute it eficiently). By
decomdomains, such as scientific databases. Manually crafting posing the problem into its sub-problems, we can focus
such benchmarks is prohibitive, especially in these kinds on each one and find the best solution, either a DB or ML
of domains. Data augmentation, i.e. automatic bench- technique or combination of both. We can investigate
mark generation, is an open research direction [20]. How- knowledge representation schemes that can scale well to
ever, this is where the power of benchmarks as a means very large databases. For planning and optimization, we
to demonstrate query expressivity ends. How does one can focus on system eficiency, complexity, and scale.
ensure data independence is a diferent beast and finding We also believe that natural language query capabilities
better training datasets is not the solution to the problem. should be implemented closer to the DBMS. All the data
Rethinking the system design is needed instead. (and information about the data, such as statistics and</p>
      <p>Towards this direction, approaches that have been pro- metadata) as well as data operations are part of the DBMS.
posed by the DB community have been shown to be more Querying data using natural language requires all the
efective from the data independence perspective, since knowledge that a DBMS has on the data as well as its
they rely on the information that the database provides. processing capabilities (and will considerably enhance
This potentially points to the need of re-thinking our all of them in the process).
approach to the text-to-SQL problem. Some parts of the As the system processes NL queries, it should learn
solution may require DB methods to ensure data inde- and improve its query capabilities as well. At the same
time, it can leverage this knowledge for learning how Query synthesis from natural language, PACMPL
to translate SQL queries to NL. To have a fully natural (2017) 63:1–63:26.
language access to a database, we also need to consider [7] V. Zhong, C. Xiong, R. Socher, Seq2sql: Generating
the SQL-to-NL problem, i.e., how the system can gen- structured queries from natural language using
reerate NL descriptions of SQL queries that it executes. inforcement learning, 2017. arXiv:1709.00103.
This is useful so that the system can explain the results [8] B. Wang, R. Shin, X. Liu, O. Polozov, M.
Richardthe user receives or when the NL query leads to several son, Rat-sql: Relation-aware schema
encodinterpretations. ing and linking for text-to-sql parsers, 2020.
arXiv:1911.04942.
[9] J. Guo, Z. Zhan, Y. Gao, Y. Xiao, J.-G. Lou, T. Liu,
5. Conclusions D. Zhang, Towards complex text-to-sql in
crossdomain database with intermediate representation,
In this position paper, we revisit the “holy grail” of
databases: natural language interfaces. We evaluate exist- [10] 2G0.1K9a.tasroXgiiavn:n1i9s-0M5.ei0m8a2r0a5k.is, G. Koutrika, A
suring works, we highlight their limitations, and we discuss vey on deep learning approaches for text-to-sql,
lessons learnt so far. We identify important requirements The VLDB Journal (2023) (????). doi:https://doi.
for a NLIDB and highlight open challenges and promising
research directions. To move the needle, we revisit the [11] oAr. gV/a1s0w.a1n0i,0N7/. sS0h0a7ze7e8r-,0N2.2P-a0r0m7a7r6,-J.8U.szkoreit,
text-to-SQL problem, and we argue that natural language L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin,
Ataccess should be realized closer to a DBMS rather than as
an external system that provides a NL interface to data. [12] tcehnattigopnti,s a2ll0y2o3u. nUeeRdL,:20h17tt.pasr:/X/oipve:n1a7i0.c6o.m0/3b7lo6g2/.
Our intention with this paper is to stir the waters and chatgpt/.
give a flavor of an exciting research territory. The data [13] A. Simitsis, Y. E. Ioannidis, Dbmss should talk back
interfaces of the future will be more human-like. too, in: Fourth Biennial Conference on
Innovative Data Systems Research, CIDR 2009, Asilomar,
Acknowledgments CA, USA, January 4-7, 2009, Online Proceedings,
www.cidrdb.org, 2009. URL: http://www-db.cs.wisc.</p>
      <p>This work has been partially funded by the European edu/cidr/cidr2009/Paper_119.pdf.
Union’s Horizon 2020 research and innovation program [14] M. Hazoom, V. Malik, B. Bogin, Text-to-SQL in
(grant agreement No 863410). the wild: A naturally-occurring dataset based on
stack exchange data, in: 1st Workshop on Natural
Language Processing for Programming (NLP4Prog
References 2021), 2021, pp. 77–87.</p>
      <p>[15] C. Wang, K. Tatwawadi, M. Brockschmidt, P.-S.
[1] E. F. Codd, Seven steps to rendezvous with the ca- Huang, Y. Mao, O. Polozov, R. Singh, Robust
textsual user, in: J. W. Klimbie, K. L. Kofeman (Eds.), to-sql generation with execution-guided decoding,
Data Base Management, Proceeding of the IFIP 2018. arXiv:1807.03100.</p>
      <p>Working Conference Data Base Management, 1974, [16] O. Gkini, T. Belmpas, Y. Ioannidis, G. Koutrika, An
1974, pp. 179–200. in-depth benchmarking of text-to-sql systems, in:
[2] I. Androutsopoulos, G. D. Ritchie, P. Thanisch, Nat- SIGMOD Conference, ACM, 2021.
ural language interfaces to databases - an introduc- [17] H. Kim, B.-H. So, W.-S. Han, H. Lee, Natural
lantion, Natural Language Engineering 1 (1995) 29–81. guage to sql: Where are we today?, Proc. VLDB
URL: https://doi.org/10.1017/S135132490000005X. Endow. 13 (2020) 1737–1750.</p>
      <p>doi:10.1017/S135132490000005X. [18] T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang,
[3] V. Hristidis, L. Gravano, Y. Papakonstantinou, Ef- Z. Li, J. Ma, I. Li, Q. Yao, S. Roman, Z. Zhang,
ifcient IR-style keyword search over relational D. Radev, Spider: A large-scale human-labeled
databases, in: VLDB, 2003, pp. 850–861. dataset for complex and cross-domain
se[4] Y. Luo, X. Lin, W. Wang, X. Zhou, Spark: Top-k mantic parsing and text-to-sql task, 2019.
keyword query in relational databases, in: ACM</p>
      <p>SIGMOD, 2007, pp. 115–126. [19] aOr.XSihvar:i1r,8B0.9P.e0l8eg8,8Y7.. Shoham, The cost of
train[5] F. Li, H. V. Jagadish, Constructing an interactive ing NLP models: A concise overview, CoRR
natural language interface for relational databases, abs/2004.08900 (2020). URL: https://arxiv.org/abs/
PVLDB 8 (2014) 73–84. 2004.08900. arXiv:2004.08900.
[6] N. Yaghmazadeh, Y. Wang, I. Dillig, T. Dillig, Sqlizer: [20] N. Weir, P. Utama, A. Galakatos, A. Crotty,
A. Ilkhechi, S. Ramaswamy, R. Bhushan, N. Geisler,
B. Hättasch, S. Eger, U. Çetintemel, C. Binnig,
Dbpal: A fully pluggable NL2SQL training pipeline,
in: Proceedings of the 2020 International
Conference on Management of Data, SIGMOD Conference,
ACM, 2020, pp. 2347–2361.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>