<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Text-to-SQL with Large Language Models: Exploring the Promise and Pitfalls</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Luca</forename><surname>Sala</surname></persName>
							<email>luca.sala@unimore.it</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">University of Modena and Reggio Emilia</orgName>
								<orgName type="institution" key="instit2">UNIMORE</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giovanni</forename><surname>Sullutrone</surname></persName>
							<email>giovanni.sullutrone@unimore.it</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">University of Modena and Reggio Emilia</orgName>
								<orgName type="institution" key="instit2">UNIMORE</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sonia</forename><surname>Bergamaschi</surname></persName>
							<email>sonia.bergamaschi@unimore.it</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">University of Modena and Reggio Emilia</orgName>
								<orgName type="institution" key="instit2">UNIMORE</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Text-to-SQL with Large Language Models: Exploring the Promise and Pitfalls</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">8EA66541B3E1E50E0958AEF9AC7BA866</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:08+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Large Language Models</term>
					<term>Text-to-SQL</term>
					<term>Relational Databases</term>
					<term>SQL</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The emergence of Large Language Models (LLMs) represents a fundamental change in the ever-evolving field of natural language processing (NLP). Over the past few years, the enhanced capabilities of these models have led to their widespread use across various fields, in both practical applications and research contexts. In particular, as data science intersects with LLMs, new research opportunities and insights emerge, notably in translating text into Structured Query Language (Text-to-SQL). The application of this technology to such task poses a unique set of opportunities and related issues that have significant implications for information retrieval. This discussion paper delves into these intricacies and limitations, focusing on challenges that jeopardise efficacy and reliability. This research investigates the scalability, accuracy, and concerning issue of hallucinated responses, questioning the trustworthiness of LLMs. Furthermore, we point out the limits of the current usage of test dataset created for research purposes in capturing real-world complexities. Finally, we consider the performance of Text-to-SQL with LLMs from different perspectives. Our investigation identifies the key challenges faced by LLMs and proposes viable solutions to facilitate the exploitation of these models to advance data retrieval, bridging the gap between academic researcher and real-world application scenarios.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, natural language processing (NLP) has been fundamentally changed by the rise of Large Language Models (LLMs). Models like BERT (Bidirectional Encoder Representations from Transformers) <ref type="bibr" target="#b0">[1]</ref> and GPT (Generative Pretrained Transformer) <ref type="bibr" target="#b1">[2]</ref>, trained on massive corpora of written data, have shown impressive capabilities in grasping semantic relationships and solving complex tasks.</p><p>This has made them powerful tools for human-computer interaction that need an extensive semantic and domain knowledge in order to be up-to-par with the requirements of real-world applications. In particular, their ability to interpret natural language requests and translate them into executable SQL statements has the potential to revolutionize database querying, from bridging the gap between complex database systems and end-users to making data-driven insights more accessible to a broader audience. Furthermore, the impressive adaptability and learning capabilities of LLMs offer promises of continuous improvement in query understanding and processing. As these models are exposed to more domain-specific data, their effectiveness in handling queries in various specialized fields, from healthcare to financial services, is expected to improve <ref type="bibr" target="#b2">[3]</ref>. This not only enhances the accuracy of query conversion, but also opens up possibilities for personalized database interactions, where the model adjusts to the user's language and query patterns.</p><p>This paper explores the current limitations of Text-to-SQL systems powered by LLMs. It focuses on potential pitfalls and readily applicable solutions to improve performance for realworld use cases. The structure is as follows: Section 2 provides background on Text-to-SQL with LLMs; Section 3 addresses challenges, limitations, and solutions; followed by conclusions and future perspectives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Text-to-SQL</head><p>The inherent complexity of the Text-to-SQL task comes from the fundamental differences between natural language and SQL. Natural language is characterized by ambiguity, flexibility, and implicit context, whereas SQL adheres to a strict, formal syntax and requires explicit representation of relationships within a database schema. Early approaches relied heavily on handcrafted rules and grammars <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>, leading to systems that were difficult to generalize to new domains. With the rise of machine learning, new techniques started to take shape, employing elements like sequence-to-sequence models to learn the mapping between natural language and SQL, showing improved robustness compared to previous ones <ref type="bibr" target="#b5">[6]</ref>.</p><p>Since LLMs are neural network-based models pre-trained on massive text corpora, enabling them to capture rich linguistic patterns and world knowledge, their advent has further revolutionized the Text-to-SQL field. Key to their success is the Transformer architecture, which excels at processing sequential data and modeling long-range dependencies <ref type="bibr" target="#b6">[7]</ref>.</p><p>Their pre-training process exposes them to diverse language usage and domain knowledge <ref type="bibr" target="#b7">[8]</ref> that can be readily made available to convert natural language into queries. Furthermore, LLMs can effectively model the logical structure of SQL, handling complex elements like nested structures and aggregations <ref type="bibr" target="#b8">[9]</ref>. Notably, these models display potential for zero-shot or fewshot learning in Text-to-SQL, suggesting they can generate SQL queries for new database schemas with minimal or even no additional fine-tuning, thus increasing their adaptability <ref type="bibr" target="#b9">[10]</ref>.</p><p>The integration of LLMs with Text-to-SQL is currently a thriving area of research. Benchmarks like WikiSQL <ref type="bibr" target="#b5">[6]</ref>, Spider <ref type="bibr" target="#b10">[11]</ref>, and BIRD <ref type="bibr" target="#b8">[9]</ref> play a crucial role in driving progress and providing standard evaluation metrics. These datasets consist of paired natural language questions and corresponding SQL translations across various domains.</p><p>Diverse strategies have been explored to harness the power of this technology. Among them <ref type="bibr" target="#b11">[12]</ref> used an incremental pre-training procedure and fine-tuning on task-specific labeled data. Additionally, interest has been placed in In-Context learning (ICL) <ref type="bibr" target="#b1">[2]</ref>, where LLMs are prompted with natural language instructions, examples, and carefully engineered input sequences to generate the SQL output <ref type="bibr" target="#b12">[13]</ref>. Finally, researchers are exploring hybrid approaches that combine the strengths of LLMs with decoding constraints or intermediate representations to enhance the structure and controllability of the generated SQL queries <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">The need of Text-to-SQL</head><p>Relational databases, characterized by their efficient, structured, and reliable data management capabilities, have been instrumental in supporting transactional data storage and critical business operations for decades. In 2022, the market value of relational databases was an impressive USD 55.9 billion, with forecasts predicting a growth to USD 161.4 billion by 2032, showcasing a compound annual growth rate (CAGR) of 12.50% <ref type="bibr" target="#b14">[15]</ref>. This substantial growth underscores the continuing reliance on relational databases in the digital age and highlights the increasing amount of data being processed and stored.</p><p>However, accessing and analyzing this vast reservoir of data poses a challenge, particularly for non-experts. The traditional method of interacting with databases through structured query languages such as SQL requires a deep understanding of database schemas and precise command syntax. Through NL querying, users can communicate with databases in plain text, bypassing the need to master complex query languages.</p><p>Integrating Text-to-SQL capabilities into data management systems can therefore significantly accelerate the data exploration process, enabling faster decision-making and insight discovery. It allows users to ask iterative questions, refine their queries based on previous results, and explore data relationships and patterns without the bottleneck of formulating precise SQL queries.</p><p>In summary, the need for Text-to-SQL technologies is driven by the growing complexity and volume of data stored in relational databases and the necessity to make this data accessible to a wider audience. As such, investing in and developing these technologies is crucial for organizations aiming to stay competitive in the data-driven landscape of the 21st century.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Addressing Challenges and Limitations</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Response Time and Performance</head><p>In the realm of database interaction, response time, the time elapsed before receiving a query result, plays a vital role in ensuring smooth operation and a seamless user experience. The introduction of LLMs for query generation shifts our perspective on these metrics, placing emphasis on their inference 1 speed as they act as an additional translation layer between the user's requests and the extracted data. Understanding response time from this perspective requires a nuanced look at factors like Time to First Token (TTFT), which indicates the model's initial responsiveness, and Time Per Output Token (TPOT), which determines how efficiently it generates subsequent parts of the query. Together, TTFT and TPOT give us latency, a measure of the total time needed to produce a complete response or, in our case, the converted SQL query. Throughput, on the other hand, quantifies the server's ability to produce output tokens across multiple requests. While these metrics offer valuable insights, it's important to acknowledge that the hardware used to deploy any LLM has the biggest impact on these factors, making them highly susceptible to the specific context of application.</p><p>A significant gap in current research is the lack of direct comparisons between the time it takes a language model to create a query versus the time it takes a human to do the same task. This obscurity hinders our understanding of the potential advantages this technology offer. User expectations have been shaped by the immediate feedback search engines provide; it follows naturally that benchmarks should also account for the desire for fast responses.</p><p>Evaluating Text-to-SQL performance extends beyond the raw capabilities of the LLM; the methods used for assessment play a decisive role. Benchmarks like Spider <ref type="bibr" target="#b10">[11]</ref> offer insufficient analysis of how models compare to human performance in this task. The BIRD benchmark <ref type="bibr" target="#b8">[9]</ref> partially addresses this shortcoming by incorporating human ratings but omits crucial elements such as the number of attempts and time required for humans to write valid SQL queries. Incorporating these measurements would enable a more in-depth comparison between model and human efficiency.</p><p>As database complexity grows, the interplay between response time and performance becomes even more critical. Maintaining responsiveness without compromising reliability demands advanced techniques. Ironically, methods designed to improve LLM accuracy can sometimes worsen response time. The Chain of Thought (CoT) approach <ref type="bibr" target="#b15">[16]</ref>, for example, helps tackle complex queries by breaking them into sub-problems, while techniques like Least-to-Most <ref type="bibr" target="#b16">[17]</ref> and Self-Consistency <ref type="bibr" target="#b17">[18]</ref> involve repeated questioning to gain clarity and improve precision. Although beneficial for complex queries, this subdivision into steps introduces variability into both the computational resources needed and the overall time taken to generate a response. This presents a challenge in ensuring predictability and efficiency.</p><p>One possible workaround is to use specialized inference engine like the Language Processing Unit (LPU) introduced by Groq <ref type="bibr" target="#b18">[19]</ref> that shows 3-18x improvements in Output Tokens Throughput compared to traditional providers. Furthermore, it guarantees consistent Time-to-First-Token reducing drastically the variability of responses.</p><p>Balancing the benefits of advanced LLM techniques with the need for predictable and efficient database interactions remains a critical area for ongoing research and development in the field of NLP and database management.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Scalability</head><p>The rapid expansion of available data and the increasing complexity of databases present significant challenges for applying LLMs to the task of Text-to-SQL. Current models struggle with large databases and real-world datasets that often contain inconsistencies or 'noisy' values <ref type="bibr" target="#b8">[9]</ref>. Additionally, the inherent complexity of databases, combined with the limited context window which determines how much information they can hold in memory, can lead to significant compression of the prompt, hindering their understanding of the underlying data structure.</p><p>Current methodologies, in fact, base the pre-trained model's grounding on two main elements: schema linking and example value sampling.</p><p>Schema linking identifies references to database elements (tables, columns, etc.) within the natural language query to be added to the prompt <ref type="bibr" target="#b19">[20]</ref>. As databases scale, queries may reference a broader range of tables, making schema linking more difficult and forcing a stricter selection, impacting overall performance <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b19">20]</ref>.</p><p>Value sampling aims to provide the LLM with representative examples from the linked tables <ref type="bibr" target="#b9">[10]</ref>. However, with larger tables, these samples may not adequately reflect the full distribution of data, potentially misleading the LLM.</p><p>Fortunately, the ongoing evolution of these models suggests that scalability issues may be addressed intrinsically as these models improve.</p><p>Starting from early models like GPT-3.5-turbo which had a context window of 4096 tokens and GPT-4 with 8192 tokens, significant progress has been made in GPT-3.5-turbo-16k-0613 and GPT-4-32k-0613 with their limits increased to 16384 and 32768 tokens, respectively. Two of today's most advanced models, Claude 3 <ref type="bibr" target="#b20">[21]</ref> and Gemini 1.5 Pro <ref type="bibr" target="#b21">[22]</ref>, offer even more impressive context windows, up to 200,000 tokens for the former and up to 1 million tokens for the latter.</p><p>A potential drawback for long context models, however, is the performance drop in specific positions of their memory which could result in a loss of task-essential information. It has been observed that performance is often highest when relevant information is located at the beginning or end of the input context, while it degrades significantly otherwise <ref type="bibr" target="#b22">[23]</ref>.</p><p>However, the most recent models claim to have mitigated the problem. Gemini 1.5 Pro achieves near-perfect (&gt;99%) recall up to multiple millions of tokens of in all modalities, i.e., text, video, and audio, and even maintaining this recall performance when extending to 10M tokens in the all three modalities <ref type="bibr" target="#b21">[22]</ref>. Additionally Claude 3 Opus not only achieved near-perfect recall, surpassing 99% accuracy, but in some cases, it even identified the limitations of the evaluation itself by recognizing that the "needle" sentence used to test the information retrieval capability appeared to be artificially inserted into the original text by a human <ref type="bibr" target="#b20">[21]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Hallucinations</head><p>The term "hallucinations", in the context of LLMs, refers to instances where the model generates inaccurate or misleading information. This phenomenon can arise due to various factors, such as the inherent complexities of natural language, biases within the training data, and limitations of the model itself. Hallucinations represent a challenge in the field of Text-to-SQL, where accuracy and precision in relation to the underlying database and its schema are paramount.</p><p>Within these systems, hallucinations manifest when the LLM fabricates incorrect assumptions about the database structure or invents non-existent tables, columns, or data values. These hallucinations pose a serious threat to the model's performance and reliability, as they can lead to SQL queries that are either invalid or generate incorrect results.</p><p>Researchers have observed that hallucinations involving the creation of fictional table data are a particularly prevalent issue in large-scale databases <ref type="bibr" target="#b8">[9]</ref>. Even when schema linking techniques are employed to align the generated query with the structure of the target database, these problems persist.</p><p>Mitigating hallucinations is an active area of research that has seen various interesting proposals. Recent solutions include techniques like response selectors that use beam search <ref type="foot" target="#foot_1">2</ref>to choose executable SQL queries to use as final answer <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b13">14]</ref>. Another technique is to use an output calibration step that encompasses, among others, a fuzzy search to find the closest matching columns to potentially resolve invalid ones <ref type="bibr" target="#b24">[25]</ref>. A new avenue of research, however, is the use of Uncertainty Quantification (UQ) to assess the confidence of an LLM's generated output as UQ methods can assign confidence scores to different parts of the model's output. <ref type="bibr" target="#b25">[26]</ref> shows empirically that UQ techniques allow relatively inexpensive fact-checking. This could have a twofold application: to highlight possible hallucinated terms in the converted query to be changed by the user or as additional information for a self-correction procedure of the model itself.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Dataset Representativity</head><p>In the realm of NLP and database query creation, various datasets and benchmarks have been developed in order to fill the gap between human language and structured database queries.</p><p>Among these, there are ATIS <ref type="bibr" target="#b26">[27]</ref> and GEO <ref type="bibr" target="#b27">[28]</ref> datasets which contain less than 500 unique SQL queries. On the other hand, WikiSQL <ref type="bibr" target="#b5">[6]</ref> includes a larger number of queries and significantly larger tables, but it only covers basic queries. Spider <ref type="bibr" target="#b10">[11]</ref> aims to address the limitations of WikiSQL by incorporating more complex, multi-table queries and a broader diversity of SQL queries, thus improving the ability of models to understand and generate intricate SQL commands from natural language inputs. Following Spider, BIRD <ref type="bibr" target="#b8">[9]</ref> aims to further advance this domain by focusing on being more realistic in collecting data from real-world scenarios, while retaining all the complexity and variability of such data in the dataset.</p><p>However, BIRD is not without its limitations. Firstly, it exhibits bias in the generation of NL questions, primarily due to the presumed background knowledge of the user regarding the database's structure and terminology. This assumption can lead to a gap between ad-hoc and real-world query formulations, as a typical user may not recall specific details about the database or might use incorrect terms.</p><p>Including non-experts in the creation of NL questions or limiting their schema knowledge are two potential ways to mitigate these biases. This approach may guarantee a closer representation of generated queries to that of a larger user base.</p><p>Secondly, in BIRD, tables or fields not accessible due to user privileges or absence, are not explored, raising concerns about its practicality in real-world scenarios. One way to better capture real-world facets is by intentionally including non-implementable queries. This intentional introduction of real-world imperfections would enable more robust testing. To this end, we suggest introducing one promising strategy proposed in <ref type="bibr" target="#b28">[29]</ref>. Applied in the Text-to-SQL field, this would entail fine-tuning a model on a dataset where such queries are intentionally tagged with an "I don't know" response. This approach encourages models to recognize the limits of their ability and avoid the tendency to "hallucinate" solutions that violate database constraints or permissions. The key insight is that a model capable of acknowledging its limitations is likely to be far more valuable in a practical setting than one that produces incorrect or misleading results.</p><p>Furthermore, existing Text-to-SQL datasets and benchmarks often underutilize the vast knowledge and contextual understanding capabilities of LLMs. While they excel at incorporating domain knowledge, datasets currently lack queries designed to test these abilities. Considering that non-expect user may naturally create questions incorporating cultural references (e.g. "list movies released in the year of the dragon") or requiring the translation of colloquial terms into precise expressions (e.g. "show me sales figures for the summer months"). This gap represents a significant missed opportunity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Knowledge Acquisition Methods</head><p>For accurate Text-to-SQL conversion in professional settings, models must incorporate fieldspecific linguistic, domain, and mathematical knowledge <ref type="bibr" target="#b29">[30]</ref>. The first enables the model to deal with terminology that may be different between question and underlying schema, the second allows the conversion of domain specific concepts and the last provides the implicit mathematical or SQL operations needed to solve complex requests.</p><p>Current solutions either utilize fine-tuning (FT) or In-context Learning (ICL). Fine-tuning is the more traditional approach for adapting LLM to specific tasks. It involves updating a pre-trained model's weights through gradient descent using a related labeled dataset. ICL, on the other hand, guides model behavior without weight updates providing input-output pairs within the prompt itself, demonstrating the desired response for the task. Both methodologies have intrinsic cost considerations.</p><p>FT in spite of the improved data efficiency thanks to the pre-trained weights, still needs a not insignificant amount of high quality labeled data to work correctly, resulting in a specialized model on the specific task at hand, hindering its use for multiple concurrent downstream tasks. Futhermore the high computational expenditure of tuning a LLM can't be ignored. This methodology, however, provides a clear view of the costs since they are limited to the additional training phase.</p><p>ICL, instead, has the drawbacks of processing the additional examples provided at each execution, increasing the memory usage and time to first token, resulting in a model which performances lag behind the fine-tuning procedure <ref type="bibr" target="#b1">[2]</ref> and is highly sensitive to wording <ref type="bibr" target="#b30">[31]</ref> and pair ordering <ref type="bibr" target="#b31">[32]</ref>. The retrieval of relevant examples from a database has also to be accounted for in the resource consumption. This combination of elements makes the long-term costs and effectiveness of in-context learning more opaque.</p><p>Recently a new idea has been proposed as third option. <ref type="bibr" target="#b24">[25]</ref> introduced the use of Parameter-Efficient Fine-Tuning (PEFT), speficially Low-Rank Adaptation (LoRa) <ref type="bibr" target="#b32">[33]</ref>, to create a modelagnostic framework to efficiently adapt pre-trained models to the task at hand by changing only a small amount of parameters. Additionally, to solve the limits of fine-tuning for multiple domains, a "Plugin hub" has been introduced to both enable the hot-swap of specialized weights to tackle different databases and plugin (i.e. weights) creation starting from merged field-related ones <ref type="bibr" target="#b24">[25]</ref>.</p><p>Regardless of the chosen methodology, a critical challenge lies in efficiently acquiring and providing the necessary field-specific knowledge to the model.</p><p>In <ref type="bibr" target="#b33">[34,</ref><ref type="bibr" target="#b34">35]</ref>, different examples are annotated and used as fine-tuning source. This, however, has high generation costs since there is a need for expert human annotators to instill a diverse and accurate understanding in the model and, therefore, the data.</p><p>[36] tries to solve this by utilizing publicly available resources to retrieve relevant field information. This "bank" of knowledge is then used to guide the model towards the correct schema linking and conversion. The proposed methodology does mitigate the incurred initial cost, but the bank creation, without constant updates or improvements, can miss useful data or lag behind fast evolving fields. Another issue is that, without careful filtering during the set up of the knowledge archive, the extraction process may generate noisy or conflicting information with a negative impact on the following retrieval operations.</p><p>One possible solution to obtain the best of both worlds would be to use the recent advancements in LLM's tool-usage to enable the creation of the bank of knowledge at run-time. In particular, we envision a pipeline where the model, given the natural language prompt, is able to actively scour the internet to extract the knowledge needed for a correct conversion. This could be both applied for augmenting existing datasets and at inference time to help translating the user's intention into query. This pipeline can also be easily merged with the proposed solution in <ref type="bibr" target="#b24">[25]</ref> to create an ever-evolving "plugin hub" that is able to adapt to new terminologies, concepts or requirements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion and Future Perspective</head><p>In this paper, we have shown that LLMs have the potential to bridge the gap between natural language and SQL queries. However, this promise demands additional research to be truly realised. While this new technology demonstrates an impressive ability to interpret and translate natural language into structured queries, it comes with several significant challenges that must be acknowledged. These include the need to effectively mitigate hallucinations, ensure scalability for complex databases, reduce response times to practical levels, and develop robust methods for integrating domain-specific knowledge. Constructing representative training datasets is also paramount, ensuring the models can adapt to diverse linguistic expressions, handle unanswerable queries, and reflect the nuances of real-world user interactions. By systematically overcoming these hurdles, we can pave the way for truly intuitive and accessible database interaction tools, fostering widespread data democratization and significantly enhancing decision-making processes across various domains.</p></div>			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Inference refers to the process of getting a response from the trained LLM model for the user's query or prompts.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">A decoding strategy that, instead of selecting only the single most likely word at each step, keeps track of multiple likely sequences</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Acknowledgments</head><p>This work was supported by the PNRR project Italian Strengthening of Esfri RI Resilience (ITSERR) funded by the European Union -NextGenerationEU (CUP:B53C22001770006).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<ptr target="https://arxiv.org/abs/1810.04805" />
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.14165</idno>
		<ptr target="https://arxiv.org/abs/2005.14165" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Exploring the limits of transfer learning with a unified text-to-text transformer</title>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Matena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Liu</surname></persName>
		</author>
		<ptr target="http://jmlr.org/papers/v21/20-074.html" />
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="1" to="67" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Algorithms for nonnegative matrix factorization with the β-divergence</title>
		<author>
			<persName><forename type="first">C</forename><surname>Févotte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Idier</surname></persName>
		</author>
		<idno type="DOI">10.1162/NECO_a_00168</idno>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A hidden markov model approach to keyword-based search over relational databases</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bergamaschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guerra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Velegrakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conceptual Modeling-ER 2011: 30th International Conference, ER 2011</title>
				<meeting><address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011-11-03">October 31-November 3, 2011. 2011</date>
			<biblScope unit="page" from="411" to="420" />
		</imprint>
	</monogr>
	<note>Proceedings 30</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1709.00103</idno>
		<title level="m">Seq2sql: Generating structured queries from natural language using reinforcement learning</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.03762</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">How much knowledge can you pack into the parameters of a language model?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.437</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.437.doi:10.18653/v1/2020.emnlp-main.437" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Webber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="5418" to="5426" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Qu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Geng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Huo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fosler-Lussier</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.11853</idno>
		<title level="m">How to prompt llms for text-to-sql: A study in zero-shot, single-domain, and cross-domain settings</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yasunaga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1809.08887</idno>
		<title level="m">Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.16347</idno>
		<title level="m">Codes: Towards building open-source language models for text-to-sql</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Pourreza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rafiei</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.11015</idno>
		<title level="m">Din-sql: Decomposed in-context learning of text-to-sql with selfcorrection</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.05965</idno>
		<title level="m">Resdsql: Decoupling schema linking and skeleton parsing for text-to-sql</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<ptr target="https://www.marketresearchfuture.com/reports/relational-database-market-18851" />
		<title level="m">Relational database market research report: Information by type (in-memory, disk-based, and others), by deployment (cloud-based, and on-premises) by end user (bfsi, it &amp; telecom, retail &amp; e-commerce, manufacturing, healthcare, and others), and by region (north america, europe, asia-pacific, and rest of the world) -market forecast till 2032</title>
				<imprint>
			<date type="published" when="2024-03-02">2024. 2024-03-02</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ichter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2201.11903</idno>
		<title level="m">Chain-ofthought prompting elicits reasoning in large language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Schärli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Scales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Bousquet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Chi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2205.10625</idno>
		<title level="m">Least-to-most prompting enables complex reasoning in large language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2203.11171</idno>
		<title level="m">Self-consistency improves chain of thought reasoning in language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<ptr target="https://wow.groq.com/inference-speed-is-the-key-to-unleashing-ai-potential/" />
		<title level="m">Inference speed is the key to unleashing ai&apos;s potential</title>
				<imprint>
			<date type="published" when="2024-03-17">2024. 17 Mar 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Din-sql: Decomposed in-context learning of text-to-sql with self-correction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pourreza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rafiei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<ptr target="https://www.anthropic.com/news/claude-3-family" />
		<title level="m">Introducing the next generation of claude</title>
				<imprint>
			<date type="published" when="2024-03-13">2024. 13 Mar 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Reid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Savinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Teplyashin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lepikhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lillicrap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>-B. Alayrac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Soricut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lazaridou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Firat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schrittwieser</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2403.05530</idno>
		<title level="m">Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Lost in the middle: How language models use long contexts</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">F</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Paranjape</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bevilacqua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="157" to="173" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Exploring unexplored generalization challenges for cross-database semantic parsing</title>
		<author>
			<persName><forename type="first">A</forename><surname>Suhr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shaw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="8372" to="8388" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno>arXiv-2401</idno>
		<title level="m">Finsql: Model-agnostic llms-based text-to-sql framework for financial analysis</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Fadeeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rubashevskii</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shelmanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Petrakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mubarak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tsymbalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kuzmin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Baldwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Panov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2403.04696</idno>
		<title level="m">Fact-checking the output of large language models via token-level uncertainty quantification</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">The atis spoken language systems pilot corpus</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">T</forename><surname>Hemphill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Godfrey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">R</forename><surname>Doddington</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Speech and Natural Language: Proceedings of a Workshop Held at Hidden</title>
				<meeting><address><addrLine>Valley, Pennsylvania</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1990">June 24-27, 1990, 1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Finegan-Dollak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">K</forename><surname>Kummerfeld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ramanathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sadasivam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Radev</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1806.09029</idno>
		<title level="m">Improving text-to-sql evaluation methodology</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tomlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</author>
		<idno>arXiv-2403</idno>
		<title level="m">Unfamiliar finetuning examples control how language models hallucinate</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Dou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Che</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-Y</forename><surname>Kan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-G</forename><surname>Lou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2301.01067</idno>
		<title level="m">Towards knowledge-intensive text-to-sql semantic parsing with formulaic knowledge</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">Do prompt-based models really understand the meaning of their prompts?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Webson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pavlick</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2109.01247</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Z</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2102.09690</idno>
		<title level="m">Calibrate before use: Improving few-shot performance of language models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.09685</idno>
		<title level="m">Lora: Low-rank adaptation of large language models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Building a semantic parser overnight</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Berant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<idno type="DOI">10.3115/v1/P15-1129</idno>
		<ptr target="https://aclanthology.org/P15-1129.doi:10.3115/v1/P15-1129" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">C</forename><surname>Zong</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Strube</surname></persName>
		</editor>
		<meeting>the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1332" to="1342" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Don&apos;t paraphrase, detect! rapid and effective data collection for semantic parsing</title>
		<author>
			<persName><forename type="first">J</forename><surname>Herzig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Berant</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1394</idno>
		<ptr target="https://aclanthology.org/D19-1394.doi:10.18653/v1/D19-1394" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3810" to="3820" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Dou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Che</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-Y</forename><surname>Kan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-G</forename><surname>Lou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2301.01067</idno>
		<title level="m">Towards knowledge-intensive text-to-sql semantic parsing with formulaic knowledge</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
