A family of natural language interfaces for databases based on ChatGPT and LangChain Eduardo Nascimento1 , Grettel García1 , Wendy Victorio1 , Melissa Lemos1 , Yenier Izquierdo1 , Robinson Garcia2 , Luiz A.P. Paes Leme3 and Marco A. Casanova1 1 Instituto Tecgraf and Departamento de Informática, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil 2 Petrobras, Rio de Janeiro, 20031-912, RJ, Brazil 3 Instituto de Computação, UFF, Niterói, 24210-310, RJ, Brazil Abstract This poster paper proposes a family of Natural Language (NL) interfaces for databases (NLIDBs) that use ChatGPT and LangChain features to compile NL sentences expressing database questions into SQL queries or to extract keywords from NL sentences, which are passed to a database keyword search tool. The use of ChatGPT reduces dealing with NL questions to a few-shot learning process for the benefit of developers interested in creating NLIDBs. The paper concludes by comparing the NLIDBs in the family. Keywords Natural Language Interfaces for Databases, ChatGPT, LangChain, Database Keyword Search 1. Introduction Natural language interfaces for databases (NLIDBs) have long been investigated with relative success [1]. The recent availability of the so-called Large Language Models [2] offers a possible strategy to simplify the development of NLIDBs worth exploring. A Large Language Model (LLM) is a language model based on a deep neural network architecture with a very large number of parameters, trained on enormous quantities of unlabeled text, using self-supervised or semi- supervised learning. LLMs came to the foreground with the announcement of conversational interfaces, such as ChatGPT [3], that generate high-quality textual answers, among other tasks. The primary motivation for this paper is the construction of LLM-based NLIDBs, which poses new challenges to database designers. This paper contributes to meeting such challenges by proposing strategies to build NLIDBs that pass a suitable database description to the LLM in a context prompt. In detail, the paper first introduces an NLIDB family that uses ChatGPT and LangChain features (see Section 3) to compile NL sentences expressing database questions into SQL queries or to extract keywords from NL sentences, which are passed to a database ER2023: Companion Proceedings of the 42nd International Conference on Conceptual Modeling: ER Forum, 7th SCME, Project Exhibitions, Posters and Demos, and Doctoral Consortium, November 06-09, 2023, Lisbon, Portugal Envelope-Open rogerrsn@tecgraf.puc-rio.br (E. Nascimento); ggarcia@tecgraf.puc-rio.br (G. García); wendyzv@tecgraf.puc-rio.br (W. Victorio); melissa@tecgraf.puc-rio.br (M. Lemos); ytorres@tecgraf.puc-rio.br (Y. Izquierdo); lapaesleme@ic.uff.br (L. A.P. P. Leme); casanova@inf.puc-rio.br (M. A. Casanova) Orcid 0000-0001-9713-300X (G. García); 0009-0003-0545-2612 (W. Victorio); 0000-0003-1723-9897 (M. Lemos); 0000-0002-0528-5151 (R. Garcia); 0000-0001-6014-7256 (L. A.P. P. Leme); 0000-0003-0765-9636 (M. A. Casanova) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings keyword search tool. Some of the NLIDBs in the family explore ChatGPT’s few-shot learning to improve the accuracy of processing NL questions. Then, the paper describes early experiments to compare the NLIDBs in the family. The paper is organized as follows. Section 2 summarizes related work. Section 3 covers the required ChatGPT and LangChain features. Section 4 introduces the NLIDB family. Section 5 describes experiments with the NLIDB family. Finally, Section 6 contains the conclusions. 2. Related work Affolters et al. [1] offers a comparative survey of 24 NLIDBs, classified as keyword-, pattern-, parsing- and grammar-based. As lessons learned, the authors indicate that, for simple questions, keyword-based systems are enough, whereas for complex questions, an NLIDB depends on manually designed rules. Diefenbach et al. [4] assumes that an NL question can be correctly interpreted considering only the semantics of the words. For example, the NL question “Give me actors born in Berlin.” is reformulated as a keyword question “Berlin, actors, born in”, that is, the semantics of the words “Berlin”, “actors”, and “born” suffice to deduce the intention of the user. The authors proceed to map the NL question into one of four templates, which limits the usefulness of the process. As for database keyword search, QUIOW [5] is an automatic, schema-based tool that supports keyword-based query processing for relational and RDF environments. Izquierdo et al. [6] introduced an RDF keyword search tool that does not rely on an RDF schema but synthesizes SPARQL queries by exploring the similarity between instance sets observed in the RDF dataset. Manning [2] classifies NLP approaches in roughly four eras. The last era, from 2013 to the present, extended the use of machine learning, allowing a more generalized language understanding by using self-supervised learning and deep learning-based models to represent words with dense vectors. Within this context, ChatGPT from OpenAI was released on Nov. 30th, 2022, and uses the GPT LLM family. Differently from [1][4], the NLIDB family proposed relies on ChatGPT to process NL questions, does not depend on grammar-based rules, and is not limited to a few database query templates. 3. A summary of the required ChatGPT and LangChain features The OpenAI API provides the Chat Completions API 1 that takes a list of messages as input and returns a model-generated message as output. Each message comprises a role (“system”, “user” or “assistant”) and the content. The system message defines the assistant’s behavior from specific instructions about how it should behave during the conversation. The user messages provide requests or comments for the assistant to answer. Finally, the assistant messages refer to responses from ChatGPT, which is called assistant in this context. ChatGPT (and the OpenAI API) can translate NL questions into SQL queries and extract keywords from a text block2 . To translate NL questions into SQL queries, the user can create a simple input, with practically no context passed, and rely on the knowledge of the model to 1 https://platform.openai.com/docs/guides/gpt/chat-completions-api 2 https://platform.openai.com/examples perform the translation. Alternatively, the user can manually create a system message to pass more context about the data and the translation task, such as metadata describing the database tables. Another approach would be to use frameworks to develop LLM-powered applications, such as LangChain3 . Chains, in LangChain, go beyond a single LLM call and involve chaining together sequences of calls (whether to an LLM or a different utility). The chain named SQLDatabaseChain inspects the schema, tables, and joins in the database, then provides context to an LLM in an automated way; the user can also manually add a prompt and choose which tables to inspect. Using such context, ChatGPT constructs an SQL query for the original NL question, which is then executed and returned by LangChain. Finally, the Chat Completions API allows describing function calls to be used to respond when the information the LLM has is not enough or is incomplete. This feature provides a straightforward way to create an NL front-end to a database keyword search (KwS) tool. 4. The NLIDB family In the process of exploring ChatGPT and LangChain features to create NLIDBs, we identified two alternatives – (1) the SQL alternative and (2) the KwS alternative – which induce the proposed NLIDB family4 . The SQL alternative translates NL questions directly into SQL queries. We explored four options to create an NLIDB in this case: • Option 1.1 – ChatGPT-SQL uses only the knowledge embedded in the LLM, with no context. • Option 1.2 – ChatGPT-SQL-Prompt uses a manually created system message to pass context information about the database and query examples, and runs an application on the database for validation. • Option 1.3 – ChatGPT-SQL-LangChain uses LangChain with ChatGPT to generate and execute an SQL query; to avoid exceeding the token limit, LangChain creates chains to inspect specific tables, connects with the database, and calls the ChatGPT API. • Option 1.4 – ChatGPT-SQL-LangChain-Prompt is the same as Option 1.3, extended with prompts. The KwS alternative uses ChatGPT to extract keywords from an NL question and then uses a KwS tool [5][7] to find the answers. We implemented two options to create an NLIDB: • Option 2.1 – ChatGPT-KwS uses the knowledge of the LLM to extract the keywords directly. • Option 2.2 – ChatGPT-KwS-Prompt fine-tunes the model by passing the main terms of the domain, and examples of NL questions and their translations to keywords via prompt messages. In both cases, ChatGPT extracts keywords from the NL question and sends them to the KwS tool via the function shown in Figure 1. The results are passed back to ChatGPT, which presents them to the user. 3 https://python.langchain.com/docs/modules/chains/popular/sqlite 4 The prototypes and a brief video can be found at https://github.com/dudursn/nl_interface_tools_based_chatgpt_kws { ” f u n c t i o n _ c a l l ” : { ” name ” : ” s e a r c h _ k e y w o r d s ” } , ” functions ” : [{ ” name ” : ” s e a r c h _ k e y w o r d s ” , ” d e s c r i p t i o n ” : ” Send keywords t o keyword s e a r c h t o o l ” , ” parameters ” : { ” type ” : ” o b j e c t ” , ” p r o p e r t i e s ” : { ” keywords ” : { ” t y p e ” : ” s t r i n g ” } , } , ” r e q u i r e d ” : [ ” keywords ” ] } } ] } Figure 1: A function that extracts keywords and sends them to the KwS tool. 5. Experiments The experiments used the Mondial database and 27 NL questions5 , which were submitted to each NLIDB of Section 4. Table 1 shows the results obtained, summarized as: • Option 1.1 – ChatGPT-SQL, which translates NL questions into SQL queries using only knowledge from the LLM, generated only incorrect SQL queries. The LLM was not schema-aware and, in some responses, did not generate an SQL query but responded with its knowledge, and in others, it asked to provide information about the database. • Option 1.2 – ChatGPT-SQL-Prompt, which uses a manually provided prompt with schema information, succeeded on most questions. The errors were due to mismatches between the NL question terms and table or column names. • Option 1.3 – ChatGPT-SQL-LangChain, which uses LangChain to provide schema infor- mation to ChatGPT, incurred more errors than Option 1.2. Some errors resulted from misplaced or non-existent column names. Also, some responses were empty due to incorrect filters. • Option 1.4 – ChatGPT-SQL-LangChain-Prompt, which uses a prompt with information for LangChain, showed some improvement, but repeated some of the errors of Option 1.3. • Option 2.1 – ChatGPT-KwS, which uses knowledge from the LLM, incurred in problems with plural names (“Country” versus “Countries”) and synonyms. For some NL questions, the NLIDB returned an error the first time the question was submitted, but executed correctly if the question was re-submitted. For other NL questions, the KwS tool could not return a response for the keywords generated, and in others, ChatGPT responded using its knowledge (since GPT was trained with geographic data). • Option 2.2 – ChatGPT-KwS-Prompt, which uses prompts with context information and examples, was the best-performing NLIDB. ChatGPT was able to generate most of the keywords that the KwS tool expected. For example, for the NL question “Cities with area more than 1000”, ChatGPT extracted “City Area > 1000”, which the KwS tool could handle. However, some errors occurred because the KwS tool did not respond correctly to the keywords ChatGPT extracted. 5 https://github.com/dudursn/nl_interface_tools_based_chatgpt_kws/blob/main/queries_results.xlsx Table 1 Results for the NLIDB tools. Result ChatGPT-SQL ChatGPT-SQL- ChatGPT-SQL- ChatGPT-SQL- ChatGPT-KwS ChatGPT-KwS- Prompt LangChain LangChain-Prompt Prompt Correct 0 19 10 13 16 22 Failed 27 8 17 14 11 5 6. Conclusions This paper demonstrated how ChatGPT and LangChain can help create Natural Language interfaces for databases (NLIDBs). In the SQL alternative, such tools were used to generate an SQL query from an NL question. In the KwS alternative, they were adopted to extract keywords from an NL question, which were then passed to a KwS tool, thereby extending the usefulness of the KwS tool to accept NL questions. The experiments suggest that ChatGPT-KwS-Prompt, which uses prompts with context information and examples to help extract keywords, is the best-performing NLIDB. Acknowledgments This work was partly funded by FAPERJ under grant E-26/200.834/2021, by CAPES under grants 88881.134081/2016-01 and 88887.694383/2022-00, and by CNPq under grant 305.587/2021-8. References [1] K. Affolter, K. Stockinger, A. Bernstein, A comparative survey of recent natural language interfaces for databases, The VLDB Journal 28 (2019). doi:10.1007/s00778- 019- 00567- 8 . [2] C. D. Manning, Human Language Understanding & Reasoning, Daedalus 151 (2022) 127–138. URL: https://doi.org/10.1162/daed_a_01905, doi: 10.1162/daed_a_01905. [3] OpenAI, Chatgpt: Optimizing language models for dialogue, https://openai.com/blog/chat- gpt/, 2022. [4] D. Diefenbach, A. Both, K. Singh, P. Maret, Towards a question answering system over the semantic web, Semant. Web 11 (2020) 421–439. doi:10.3233/SW- 190343 . [5] Y. T. Izquierdo, G. M. García, E. S. Menendez, M. A. Casanova, F. Dartayre, C. H. Levy, Quiow: a keyword-based query processing tool for rdf datasets and relational databases, in: International Conference on Database and Expert Systems Applications (DEXA), Springer, 2018, pp. 259–269. doi:10.1007/978- 3- 319- 98812- 2_22 . [6] Y. T. Izquierdo, G. M. García, E. Menendez, L. A. P. Leme, A. Neves, M. Lemos, A. C. Finamore, C. Oliveira, M. A. Casanova, Keyword search over schema-less rdf datasets by sparql query compilation, Information Systems 102 (2021) 101814. doi:10.1016/j.is.2021.101814 . [7] Y. T. Izquierdo, G. M. Garcia, M. Lemos, A. Novello, B. Novelli, C. Damasceno, L. A. P. P. Leme, M. A. Casanova, A platform for keyword search and its application for covid- 19 pandemic data, Journal of Information and Data Management 12 (2021). URL: https: //sol.sbc.org.br/journals/index.php/jidm/article/view/1904. doi:10.5753/jidm.2021.1904 .