-

Natural Language Data Interfaces: From Keyword Search to ChatGPT, are we there yet?

George Katsogiannis-Meimarakis

Christos Tsapelas

Georgia Koutrika

0 0 Athena Research Center

Enabling users to query data in a relational database using natural language has long been considered the holy grail of the database community. Towards this direction, there has been an increasing research focus on Natural Language Data Interfaces that allow users to pose queries in natural language and translate these queries to the underlying database query language. Several approaches have emerged especially due to the recent advances in deep neural networks. Despite this blooming, not only these systems are very complicated and dificult to understand, but they have yet to deliver their promise of enabling users to use natural language to access data easily. Hence, they have failed to see widespread adoption. A question naturally arises: is natural language access to data going to be the elusive holy grail of databases? We hope not. With the aim of fostering research on these open issues, in this position paper, we discuss (currently unmet) requirements for efective natural language data exploration and highlight promising research directions. Finally, we describe how to rethink the text-to-SQL problem and how this should be realized as an integral capability of a DBMS for the realization of a system that fully supports natural language queries over data.

eol>natural language interfaces data exploration

1. Introduction deep learning based systems, have not seen widespread adoption and have yet to deliver their promise of en“If we are to satisfy the needs of casual users of databases, abling users use natural language to search data. In this we must break the barriers that presently prevent these position paper, we delve into the limitations of existing users from freely employing their native language" (E. F. approaches and we discuss lessons learnt. We argue that Codd, 1974) [1]. Enabling users to query data using natu- despite the recent bloom of approaches, these focus on ral language has long been the “holy grail” of the database certain aspects of the problem (e.g., improving the transcommunity. Research on Natural Language Data Inter- lation accuracy over a specific dataset [ 14]) and largely faces (NLIDBs) started almost as early as the first DBMS miss the big picture. We identify important requirements emerged, in the 70’s [2]. Early systems enabled keyword for natural language data interfaces and highlight open searches. They relied on data indexes to find relations challenges and promising research directions. that contained the query keywords and on the database schema to join them and return the answer to a query (e.g., [3, 4]). Parsing-based approaches parsed the input 2. The Inherent Challenges of NL question to understand its grammatical structure and Data Interfaces them map it to the structure of the desired SQL query (e.g., [5, 6]). Recently, there has been a growing interest in neural machine translation approaches for learning natural language data interfaces [7, 8, 9] (see [10] for a recent survey). Advances in NLP, such as the introduction of Transformers [11], has given a boost in the area, while the latest developments such as ChatGPT [12] seem to promise that human-like conversation is more plausible.

But are we really close to “talking to our databases” [13]?

Unfortunately, existing eforts, including the latest

On the other hand, SQL has strict syntax, which leads queries the methods can understand as well as the diferto limited expressivity compared to natural language. Fur- ent databases they can work on. There are also important thermore, while a sentence in natural language may con- limitations. In practice, all methods focus on limitedtain some mistakes, and still be understood by a human, scope problems and their accuracy severely degrades a SQL query needs to be syntactically and semantically with more complex and diverse NL and SQL queries [17]. correct in order to be executable over the underlying data. Furthermore, they depend on training data and cannot In fact, the above limitations create enormous challenges cope with unseen databases and queries. For example, for translating NL to SQL queries. SQL’s strict syntax Spider [18], a large-scale text-to-SQL dataset that is very may lead to cumbersome translations. A relatively simple popular for training and evaluating text-to-SQL systems, NL query may map to a complex SQL query. For example, contains queries over 200 relational databases from 138 “Return the movie with the best rating” maps to a nested diferent domains. However, these are toy databases with SQL. While the original NL query is simple, building the simple schemas and small sizes that fail to reflect the complex SQL query may be a tough call for the system. characteristics and dificulties of real-world DBs.

All these challenges make the text-to-SQL problem so Furthermore, models used so far are typically quite hard. Not only it is dificult to understand a NL query but complex and large, questioning their practical use in a it is also dificult to build the correct SQL query. Even complex system, like a database engine 1. Moreover, most similar questions may lead to a diferent outcome over dif- ML approaches used support poor and size-limiting input ferent databases: one may be translated over one database representations that cannot possibly leverage the wealth and the other may not, due to issues such as ambiguity, of database information comprising hundreds of tables paraphrasing, and diferent schemas. and attributes, data values, and queries.

These limitations become highly relevant when applying a text-to-SQL system to an actual database [14] used 3. Existing Approaches in a business, research or any other real-world use case. Such databases can pose dificulties not encountered in the datasets used to train and evaluate such systems, for example, a large number of tables and attributes and table and column names that use domain-specific terminology.

The “Database way”. One category of approaches tackle the text-to-SQL problem as a mapping problem [3, 4]: how to map query elements to database elements (tables, columns and values) and then find the desired interconnections of these data elements that capture the user intent. In addition, parsing-based approaches parse the 4. Going Forward: Requirements input question to understand its grammatical structure, which is then mapped to the desired SQL query [5, 6]. and Opportunities

The “Machine Learning way”. The other approach The challenges of the text-to-SQL problem, the aforemenis to tackle the text-to-SQL problem as a language trans- tioned observations as well as our own experience with lation problem, and train a neural network on a large working with and evaluating several text-to-SQL systems amount of {NL query/SQL} pairs [10]. Originally, these [10, 16] point to a set of requirements for a NLIDB. systems ignored the underlying database, and they did not ensure that the generated SQL is syntactically and R1. Query expressivity: Using a query language such semantically correct, i.e., executable over this database. as SQL, the user knows exactly what queries are possible. To address this problem, recent approaches employ two In a similar vein, the set of NL and SQL queries that a additional techniques. First, schema linking aims at the NLIDB supports should be clearly defined so that a user discovery of possible mentions of database elements in is aware of the available query capabilities. the NLQ. These discovered schema links, along with the R2. Data independence: A NLIDB should support the rest of the inputs, are fed into the neural network that same query expressivity for diferent databases. In other is responsible for the translation. Second, output refine- words, the same type of NL query should be possible over ment can be applied on a trained model to avoid produc- any database. For example, if the user could ask “what ing incorrect SQL queries [15]. is the average X of Y” in one database, then this type of

Limitations and Lessons Learnt. The “Database query should be possible in any other database. way” can handle diferent types of SQL queries and can R3. Performance: Allowing the users to express queswork on any database. However, existing approaches tions in NL should free them from using SQL but also, struggle with more complex and diverse NL queries and from how their question will be executed eficiently. The cannot easily cope with NL challenges, such as synonyms, system should transparently find the most eficient way paraphrasing and typos [16].

The “Machine Learning way” promises to be more generalizable both in terms of the diferent types of NL 1The training cost as well as the energy consumption [19] of such big models are important concerns. to answer a NL query, minimizing both the translation pendence and some other parts may use neural models to overhead and the execution cost of retrieving the results. generalize system knowledge, for example on the diver

R4. Scalability: A NLIDB should be feasible and scal- sity and complexity of NL queries. How would a system able over any database. that combines such capabilities look like? Requirement R3 poses a serious challenge. While the

Requirement R1 is important because up to now al- state-of-the-art systems are still dealing with “getting most none of the known text-to-SQL systems provides the answer right”, they are mostly overlooking the “geta clearly defined query language or specification of its ting the answer fast”. Improving translation speed by query capabilities. For the user, it is a trial-and-error pro- building eficient methods is necessary. But this may cess to see what queries can be understood and answered not be enough. Text-to-SQL systems originating from by the system. Is it possible to come up with a query the DB community not only tried to generate correct language specification that systems can refer to in order SQL queries but also optimal in terms of execution speed. to describe their query expressivity? Hence, many of them contained logic for generating code

Towards R1, a query categorization in the spirit of [16] that would return the desired results fast. This may be may be a good starting point. This could enable the cre- necessary for a NLIDB. The database community could ation of appropriate benchmarks for the comparison of come up with benchmarks that focus on eficiency (not the query capabilities of diferent systems. Even devising just efectiveness) and allow evaluating systems based an appropriate query categorization and an appropriate on execution time and resource consumption. benchmark raises several challenges: what categories to R4 highlights the need for realistic solutions. Deep choose, what queries should be in each category, which learning text-to-SQL systems typically rely on very comdatasets to use. Furthermore, one should take into ac- plex models, which have been trained and evaluated on count SQL equivalence (diferent SQL queries that return toy databases (contained in existing benchmarks). In sevthe same results), and NL ambiguity (a NL query may eral cases, it may not be possible to have the required have more than one correct translation over the data). resources to train such enormous models. Furthermore, Unfortunately, existing benchmarks fail to address the since these models require that the database schema query expressivity question. For instance, Spider has four is given as input, they do not scale well to very large very coarse-grained classes of queries. databases, with hundreds of attributes and tables (such

Requirement R2 complements R1 in saying that the as astrophysics and biological data). Instead of focusing same query expressivity should be supported over any on increasing the model complexity aiming at translation database. This comes naturally with query languages accuracy, we need to design solutions that also take into such as SQL. For instance, SPJ queries can be supported account system eficiency, complexity, and scale. over any data. For a NLIDB, that does not hold. Going To further move the needle, we may need to rethink from one database to another, the same type of queries our approach to the problem. The text-to-SQL problem may not be supported. As we have already pointed out, has been seen as a mapping or a language translation this is a major concern for deep learning systems. A problem. This is an oversimplification, and in fact the system trained over Spider will not work over a new text-to-SQL problem comprises (at least) three (connected) domain such as astrophysics or cancer research. problems: a representation problem (what is asked), a

One could build specialized, domain-specific bench- planning problem (how to answer it) and an optimizamarks for training and evaluating text-to-SQL systems in tion problem (how to execute it eficiently). By decomdomains, such as scientific databases. Manually crafting posing the problem into its sub-problems, we can focus such benchmarks is prohibitive, especially in these kinds on each one and find the best solution, either a DB or ML of domains. Data augmentation, i.e. automatic bench- technique or combination of both. We can investigate mark generation, is an open research direction [20]. How- knowledge representation schemes that can scale well to ever, this is where the power of benchmarks as a means very large databases. For planning and optimization, we to demonstrate query expressivity ends. How does one can focus on system eficiency, complexity, and scale. ensure data independence is a diferent beast and finding We also believe that natural language query capabilities better training datasets is not the solution to the problem. should be implemented closer to the DBMS. All the data Rethinking the system design is needed instead. (and information about the data, such as statistics and

Towards this direction, approaches that have been pro- metadata) as well as data operations are part of the DBMS. posed by the DB community have been shown to be more Querying data using natural language requires all the efective from the data independence perspective, since knowledge that a DBMS has on the data as well as its they rely on the information that the database provides. processing capabilities (and will considerably enhance This potentially points to the need of re-thinking our all of them in the process). approach to the text-to-SQL problem. Some parts of the As the system processes NL queries, it should learn solution may require DB methods to ensure data inde- and improve its query capabilities as well. At the same time, it can leverage this knowledge for learning how Query synthesis from natural language, PACMPL to translate SQL queries to NL. To have a fully natural (2017) 63:1–63:26. language access to a database, we also need to consider [7] V. Zhong, C. Xiong, R. Socher, Seq2sql: Generating the SQL-to-NL problem, i.e., how the system can gen- structured queries from natural language using reerate NL descriptions of SQL queries that it executes. inforcement learning, 2017. arXiv:1709.00103. This is useful so that the system can explain the results [8] B. Wang, R. Shin, X. Liu, O. Polozov, M. Richardthe user receives or when the NL query leads to several son, Rat-sql: Relation-aware schema encodinterpretations. ing and linking for text-to-sql parsers, 2020. arXiv:1911.04942. [9] J. Guo, Z. Zhan, Y. Gao, Y. Xiao, J.-G. Lou, T. Liu, 5. Conclusions D. Zhang, Towards complex text-to-sql in crossdomain database with intermediate representation, In this position paper, we revisit the “holy grail” of databases: natural language interfaces. We evaluate exist- [10] 2G0.1K9a.tasroXgiiavn:n1i9s-0M5.ei0m8a2r0a5k.is, G. Koutrika, A suring works, we highlight their limitations, and we discuss vey on deep learning approaches for text-to-sql, lessons learnt so far. We identify important requirements The VLDB Journal (2023) (????). doi:https://doi. for a NLIDB and highlight open challenges and promising research directions. To move the needle, we revisit the [11] oAr. gV/a1s0w.a1n0i,0N7/. sS0h0a7ze7e8r-,0N2.2P-a0r0m7a7r6,-J.8U.szkoreit, text-to-SQL problem, and we argue that natural language L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Ataccess should be realized closer to a DBMS rather than as an external system that provides a NL interface to data. [12] tcehnattigopnti,s a2ll0y2o3u. nUeeRdL,:20h17tt.pasr:/X/oipve:n1a7i0.c6o.m0/3b7lo6g2/. Our intention with this paper is to stir the waters and chatgpt/. give a flavor of an exciting research territory. The data [13] A. Simitsis, Y. E. Ioannidis, Dbmss should talk back interfaces of the future will be more human-like. too, in: Fourth Biennial Conference on Innovative Data Systems Research, CIDR 2009, Asilomar, Acknowledgments CA, USA, January 4-7, 2009, Online Proceedings, www.cidrdb.org, 2009. URL: http://www-db.cs.wisc.

This work has been partially funded by the European edu/cidr/cidr2009/Paper_119.pdf. Union’s Horizon 2020 research and innovation program [14] M. Hazoom, V. Malik, B. Bogin, Text-to-SQL in (grant agreement No 863410). the wild: A naturally-occurring dataset based on stack exchange data, in: 1st Workshop on Natural Language Processing for Programming (NLP4Prog References 2021), 2021, pp. 77–87.

[15] C. Wang, K. Tatwawadi, M. Brockschmidt, P.-S. [1] E. F. Codd, Seven steps to rendezvous with the ca- Huang, Y. Mao, O. Polozov, R. Singh, Robust textsual user, in: J. W. Klimbie, K. L. Kofeman (Eds.), to-sql generation with execution-guided decoding, Data Base Management, Proceeding of the IFIP 2018. arXiv:1807.03100.

Working Conference Data Base Management, 1974, [16] O. Gkini, T. Belmpas, Y. Ioannidis, G. Koutrika, An 1974, pp. 179–200. in-depth benchmarking of text-to-sql systems, in: [2] I. Androutsopoulos, G. D. Ritchie, P. Thanisch, Nat- SIGMOD Conference, ACM, 2021. ural language interfaces to databases - an introduc- [17] H. Kim, B.-H. So, W.-S. Han, H. Lee, Natural lantion, Natural Language Engineering 1 (1995) 29–81. guage to sql: Where are we today?, Proc. VLDB URL: https://doi.org/10.1017/S135132490000005X. Endow. 13 (2020) 1737–1750.

doi:10.1017/S135132490000005X. [18] T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, [3] V. Hristidis, L. Gravano, Y. Papakonstantinou, Ef- Z. Li, J. Ma, I. Li, Q. Yao, S. Roman, Z. Zhang, ifcient IR-style keyword search over relational D. Radev, Spider: A large-scale human-labeled databases, in: VLDB, 2003, pp. 850–861. dataset for complex and cross-domain se[4] Y. Luo, X. Lin, W. Wang, X. Zhou, Spark: Top-k mantic parsing and text-to-sql task, 2019. keyword query in relational databases, in: ACM

SIGMOD, 2007, pp. 115–126. [19] aOr.XSihvar:i1r,8B0.9P.e0l8eg8,8Y7.. Shoham, The cost of train[5] F. Li, H. V. Jagadish, Constructing an interactive ing NLP models: A concise overview, CoRR natural language interface for relational databases, abs/2004.08900 (2020). URL: https://arxiv.org/abs/ PVLDB 8 (2014) 73–84. 2004.08900. arXiv:2004.08900. [6] N. Yaghmazadeh, Y. Wang, I. Dillig, T. Dillig, Sqlizer: [20] N. Weir, P. Utama, A. Galakatos, A. Crotty, A. Ilkhechi, S. Ramaswamy, R. Bhushan, N. Geisler, B. Hättasch, S. Eger, U. Çetintemel, C. Binnig, Dbpal: A fully pluggable NL2SQL training pipeline, in: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference, ACM, 2020, pp. 2347–2361.