<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Talk to your database: An open-source in-context learning approach to interact with relational databases through LLMs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maximilian Plazotta</string-name>
          <email>Maximilian.Plazotta@informatik.uni-regensburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meike Klettke</string-name>
          <email>meike.klettke@informatik.uni-regensburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Text-to-SQL, Large Language Models, Relational Databases</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>(55th Annual Conference of the German Informatics Society)</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Regensburg</institution>
          ,
          <addr-line>Bajuwarenstraße 4, 93053 Regensburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>With the emergence of large language models, the long studied field of the Text-to-SQL problem was elevated into new spheres. In this paper, we test how our LLM fine-tuning approach performs on two relational databases (small vs. big) and compare it to a default setting. The results are convincing: using in-context learning boosts the performance from a merely 35% (default) to over 85%. Furthermore, we present a detailed architectural framework for such a system, emphasizing its exclusive reliance on open-source components.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The underlying idea for this paper is the following (real-world) case: Imagine a business analyst
who needs certain information to answer business questions such as ”Which customers should we
contact in our next marketing campaign?” or ”Which customers qualify for a discount?” based on data.
Furthermore, the business analyst only has very rudimentary knowledge in SQL programming. So, is
there a way to create a system that can help this person to gain insights from the data and answer their
business questions? With the novel introduction of large language models (LLMs) and a PostgreSQL
database, we create a system that builds a bridge between the LLM and the database — with the usage of
in-context learning. Moreover, we test the system based on diferent database sizes and the application
of in-context learning vs. default LLM prompting to test the following hypotheses:
• H1: In-context learning should perform better than the default.
• H2: In-context learning should possess a higher execution time due to having more complex
inputs.</p>
      <p>• H3: The more complex a database is, the lower the accuracy should be.
research.</p>
      <sec id="sec-1-1">
        <title>4, describes the approach of the experiment, the technical set-up of the system, and gives an overview of the results — a discussion of the results is also included. Section 5 tackles the limitations and gives some ideas how to eventually circumvent them. The last Section 6 concludes the paper and highlights future areas of</title>
        <p>CEUR
Workshop
ISSN1613-0073</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Terminology</title>
      <p>Within this section, we provide an overview of the central terminology used in this paper: LLMs,
ifne-tuning, and the Text-to-SQL problem.</p>
      <sec id="sec-2-1">
        <title>2.1. Large Language Models</title>
        <p>With the introduction of the first commercially usable large language models in 2022, the whole
world around artificial intelligence changed drastically — today the buzzword ”AI” (short for artificial
intelligence) is everywhere. With the release of GPT-3.5 by OpenAI in late 2022 the trafic and usage of
their chatbot with the remarkable name ChatGPT exploded overnight. Since then, many new players
have entered the market with their own LLMs for which some need to be paid for and some are
open-sourced.</p>
        <p>
          Table 1 provides a broad overview of the current LLM market with respective models and their
capability scoring called ELO or arena score. This number is derived from a multitude of diferent tests a
LLM is encountered with, e.g., coding-, math-, creative writing-, reasoning-tasks. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] The most capable
models according to the ELO score are currently Google’s Gemini model(s) and OpenAI’s GPT model(s).
Other competitor models like xAI’s Grok (backed by Elon Musk), Anthropic (backed by Amazon), and
Qwen (Alibaba) also score very highly. As mentioned for these models one needs to pay for an API
key to get access — for small endeavors the price is manageable, but for enterprise usage the costs
accumulate very quickly as you pay for each prompt. Thankfully, there exist open-source models that
have good performance scores and can be used locally given that you possess enough RAM. Meta AI’s
Llama models are one of the most popular open-source models with diferent parameter sizes: 405B,
70B, or 8B — the ”B” stands for billion and specifies the model’s number of input parameters. Normally,
the more parameters a model has, the better its performance. However, this comes with a downside: the
more parameters a model possesses, the more computing power a system needs to deliver. So, for an 8B
model, a good graphics processing unit (GPU), such as the Nvidia 30er series or higher, is suficient, but
for instance, a 200B+ models needs an Nvidia A100 or 4x 3090 GPUs (or more) to function. Nevertheless,
small models (&lt;14B models) score very well compared to their bigger brothers and sisters: Llama’s
8B model has an ELO of 1213 compared to the 70B (1315), or the 405B (1333) model. So, why is this
important: To build such a system, one needs to define where to deploy the LLM. Small models can be
accessed from your local PC, bigger models need a data center or cloud environment; or you just simply
access it with an API key from a proprietary vendor like OpenAI, Google, or Anthropic. Most recently,
DeepSeek-R1 disrupted the AI market with incredible performance (ELO: 1413) and being open-source,
but on the other hand being relatively big (671B). More interestingly, the new smaller, open-source
model from Google (Gemma) scores very high (ELO: 1362) despite only having 27 billion parameters.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Improving LLM performance</title>
        <p>
          To improve the accuracy and general performance of LLMs many methods have been established
throughout the last years. The most prevalent is retrieval augmented generation (RAG) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Within this
approach a vector database is used to store external knowledge from sources, e.g., textual, structured
data from databases, enterprise systems, and many more. This information is then used by the LLM
for augmentation. Another method is LLM fine-tuning [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] where a pre-trained LLM is retrained on
a smaller, domain-specific dataset. Lastly, in-context learning, known as the most straightforward
approach, only uses the input given through the context window (prompt) to generate better outputs —
the method we test in this paper.
        </p>
        <p>These approaches have all one thing in common: enriching LLM systems with external knowledge
sources to give the model more context (”to make it see”) — see Figure 1.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. The Text-to-SQL problem</title>
        <p>
          First attempts to solve the Text-to-SQL (or NL2SQL) problem were introduced in 2015 and were developed
ever since. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] In 2015, some first rule-based, statistical methods arose to translate natural language
into SQL code but lacked performance for complex database systems. In 2019, deep learning modeling
approaches emerged [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], where the long-short-term-memory (LSTM) architecture is the focal point
of transformation. This approach improved the accuracy substantially. Nevertheless, the real game
changer was just ahead: In 2021, pre-trained language models (PLMs) showed very promising results
but they lacked of individual task-oriented fine-tuning. Consequently, LLMs were derived from this
around 2022 and are able to transform natural language into executable SQL queries, and the results
convince. LLMs are trained on various diferent datasets such as Wikipedia, Project Gutenberg, or
Reddit but more importantly also on GitHub and Kaggle from which the high capability of solving
coding problems comes from.
        </p>
        <p>In the next section, we dive deeper into important publications that highlight these three terms LLM,
in-context learning, and Text-to-SQL from diferent angles.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>
        LLMs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are in the center of most Text-to-SQL tasks. Hence, a lot of improvements and new
implementation methods emerge continuously. On the improvement side, the main players like Google, OpenAI,
or Anthropic release frequently, newly trained and improved models. The same goes for open source
models.
      </p>
      <p>
        As mentioned, improving accuracy and therefore also reliability of LLMs is currently one of the
biggest topics in the AI system research area. In-context learning was one of the first approaches after
LLM were publicly available in 2022. Pourreza and Rafiei [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] show a 5% (from 79.9% to 85.3%) accuracy
improvement of their DINSQL (in-context learning) approach against sophisticated fine-tuned models.
      </p>
      <p>
        Retrieval augmented generation systems also can improve LLM performance. Vichev and Marchev
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] build their own custom evaluation model RAGSQL and test it against the renown BIRD benchmark.
It performs very well with accuracy values above 90%. Another very important notion from the paper
is ”We demonstrate that much smaller models with eficient fine-tuning can lead to higher performance
on a task.” which implies that not only the big models have exceptional performance but also smaller
models can play a huge part.
      </p>
      <p>
        Currently, there is significant hype and enthusiasm surrounding AI agents which is a more direct
problem-oriented approach. Cooperative SQL Generation framework based on Multi-functional Agents
(CSMA) from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or MAC-SQL from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] have to be highlighted in this context.
      </p>
      <p>
        Other mentionable, related work comes from Zaharia et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] on the optimization of LLM queries
in relational workloads. The authors state correctly that LLM inference is (currently) very expensive
and introduce various techniques to improve the LLM inference process. The already mentioned BIRD
(BIg bench for laRge-scale Database grounded in text-to-SQL tasks) benchmark [12] introduces a 33.4
GB database system to hold against newly created systems and test their abilities.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>To validate our initial hypotheses from Section 1, we present an experimental set-up that will be
explained in more depth within the following subsections.</p>
      <sec id="sec-4-1">
        <title>4.1. Approach</title>
        <p>As described in Sections 2.3 and 3, Text-to-SQL is a wide field and also the focal point here in this
experiment. Figure 2 gives a high level overview of the general approach (referred to as in-context
learning): We combine the user’s question (input) which is of course in natural language and retrieve
the current database schema. Consequently, we use both inputs (db_metadata and user_input) and
create a prompt for the LLM. Then, based on this information the LLM creates a SQL query and runs it
automatically on the database. The output is generated. To compare the performance of the in-context
learning approach, we hold it against our so called default LLM prompting technique (”default”). For
this default technique, we only give the LLM the information that it is a customer database with the
respective tables and of course the question. For the experiment, we use two sizes of databases db_small
(Figure 3) and db_big (Figure 4) with dummy data based on a classical, fictional customer database.</p>
        <p>The conceptual data models are depicted in the entity relationship diagrams in the appendix.
Furthermore, we created 50 real-world questions (Table 3) for the experiment to test accuracy and execution
time to compare results based on database size and the usage of in-context learning vs. default. For the
questions, we were vigilant about including simple and also more complex questions to test how the
system reacts to e.g., simple select statements, over more dificult joins, to complex nested queries. The
default tests are done with a simple prompt without the database schema input.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Technical Setup</title>
        <p>The technical setup basically consists of two systems interacting with each other through an API: the
relational database management system (RDBMS) and the intelligent AI system.</p>
        <p>• RDBMS
• AI system</p>
        <p>We selected PostgreSQL as our relational database system to run this experiment as it is one of
the most used database systems with a big and active community. Furthermore, it is open source
and adheres to ACID properties.</p>
        <p>The AI system is a LLM from Meta AI called ’meta-llama/Llama-3.1-8B-instruct’. This experiment
is conducted on a local machine with a total of 16GB of RAM (8GB GPU + 8GB DDR5 RAM). To
calculate which size a certain model must not exceed to be deployed on a machine, the following
formula helps to give an estimate: [13] [14]
 
 

P
Q
8
_ =
_ =
∗
8</p>
        <p>∗ (1 + )
8
26∗4 ∗ 1, 2 = 15, 6</p>
        <p />
        <sec id="sec-4-2-1">
          <title>Number of model parameters in billion</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Quantization of the model in bit (fp32, fp16, int8, int4)</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>No. of bits per byte</title>
          <p>bufer capacity (20% is a common estimate)</p>
          <p>So, the upper theoretical boundary to run the system locally, is a 26B model (assuming 4-bit
quantization). Ceteris paribus, the required RAM on the local machine is 15,6 GB (&lt;16GB). It is important to
mention that models with &gt; 8GB of RAM usage will be slower as it will access the DDR5 RAM after
the GPU’s RAM is maxed out — for trial purposes, a 24B model ran extremely slow on this machine.
However, taking the strong ELO-Score of 1213 (see Section 2.1) into account, the 3.1:8B Meta model is a
reasonable selection for this experiment while it only needs 4,8 GB of RAM.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Experiment Results</title>
        <p>Overall, the results from our experiments (see Table 2) support our initial hypotheses:</p>
        <p>The first hypothesis H1 ”in-context learning should perform better than the default.” holds for both
db_small and db_big with accuracy values for the default vs. in-context learning of 33.33% vs. 90.91%
(small) and 36.00% vs. 86.00% (big), respectively. Taking the average execution time into account, the
second hypothesis H2 ”in-context learning should possess a higher execution time due to having more
complex inputs.” is also fulfilled: in-context learning possesses an average execution time of 3.1987
seconds (small) and 3.2652 seconds (big) versus 2.9052 seconds (small) and 2.7780 seconds (big). This
comes as no surprise as the LLM must process the metadata from the database which logically takes
longer than not applying this step. We were also able to observe some diferences in database sizes.
Bigger databases mean longer execution times and lower accuracy which is partially in line with H3
”The more complex a database is, the lower the accuracy should be.” For the default tests this hypothesis
does not uphold as the results are reversed: the accuracy is higher for db_big. The accuracy outlier can
be explained due to the randomized nature of the queries meaning that e.g., the right column name
(total_amount vs. amount) is returned randomly from the LLM as it does not know the column names.
For the in-context learning tests the hypothesis upholds with 86,00% (big) being lower than 90,91%
(small). Another noticeable finding is also a relative simple one: Spelling. The LLM does not know
from the provided metadata how certain things are spelled meaning e.g., question 27 ”Retrieve a list of
customers who have opted out of the newsletter.” — from first glance a relative simple question (query)
— failed only because the condition in the query for status = ’active’ was misspelled as ’Active’. The
same observation also occurred for the definition of the payment_method (credit card vs. Credit Card).
In these two specific cases the ENUM() data type (instead of TEXT() or VARCHAR()) would have solved
the problem, but if the possible values are not limited, the correctness of the results of these types of
queries are random based on how the LLM decides to spell the word. This simple error represents the
majority of the errors in our experiment — the LLM was not able to handle these type of questions.
Another observation comes from question 25 ”How many percent of customers come from Europe?”:
This one returned always an error independent of database size and method due to the fact that there is
no information in the dataset which country belongs to Europe. An easy solution would have been to
add a column ”continent” in the customers table. To summarize the experiment results, it is important
to mention that the in-context learning prompting technique worked surprisingly well with accuracy
values around 90% even with some shortcomings when it comes to spelling or metrics definitions. The
mentioned solutions to these shortcomings would have increased the accuracy even further.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Limitations</title>
      <p>The technical implementation of this system comes along with a few limitations that are listed below:</p>
      <sec id="sec-5-1">
        <title>5.1. Computing resources</title>
        <p>As mentioned earlier in Subsection 4.2, we run this experiment on a local machine with a total RAM of
16 GB (8GB GPU + 8GB DDR5) that limits us to the usage of 26B-models (assuming 4-bit quantization).
Alternatively, such experiments can also be run in the cloud where much more performant GPUs or
even GPU clusters are available, but of course more costly — currently a memory-optimized virtual
machine with ca. 500 GB of RAM (to run the most sophisticated open source LLMs) costs roughly 10$
per hour at the major cloud providers.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Model selection</title>
        <p>The computing resources directly impact the models that can be selected. As a rule of thumb, the more
parameters a model has, the better it performs (see Table 1). But in our case it might not have changed
the outcome as much as the errors arose from the structure of the data and not the created SQL code.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Sample size questionnaire</title>
        <p>For the experiment, we used 50 questions to test the system. The questions attempt to be as close to
the real-world as possible. One could argue why not more questions were used, but further questions
would have been a derivation of the already existing ones resulting in similar SQL statements only with
some very small changes.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Outlook</title>
      <p>To come back to our initial, real-world notion about the business analyst from the introduction: ”So, is
there a way to create a system that can help this person to gain insights from the data and answer their
business questions?”: Short answer — yes, but with some limitations. We showed that especially the
in-context learning performs much better than default prompting. In addition, the more complex a
database is, the lower the accuracy will be. Apart from this, we also compare execution times of the
queries and find that in-context learning possesses higher execution times due to having to process more
complex inputs. The most mistakes made by our system result from the fact that it simply cannot query
what it does not know, e.g., spelling diferences in the data values (”Active vs. active”) or definitions
(”Europe”). Future research should focus on solving these gaps. Also, the application of such systems to
bigger and more complex data ecosystems like data warehouses or even data lakes might be interesting
to address in the future. In addition to this, it might be interesting to take a deeper look into the security
aspects: as this system runs automatically queries on a database, there is room for errors, e.g., unwanted
deletion or unauthorized change of data.</p>
      <p>Gonzalez, M. Zaharia, Optimizing LLM queries in relational workloads., CoRR abs/2403.05821
(2024).
[12] J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, B. Qin, R. Geng, N. Huo, X. Zhou, M. Chenhao,
G. Li, K. Chang, F. Huang, R. Cheng, Y. Li, Can LLM already serve as a database interface? a BIg
bench for large-scale database grounded text-to-sqls, Advances in Neural Information Processing
Systems 36 (2023) 42330–42357.
[13] Q. Anthony, S. Biderman, H. Schoelkopf, Transformer math 101,
https://blog.eleuther.ai/transformer-math/, 2023.
[14] C. Chen, Transformer inference arithmetic, https://kipp.ly/blog/transformer-inference-arithmetic,
2022.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors hereby declare that no GenAI was used to generate text etc. following the guidelines and
policy by CEUR-WS Policy on AI-Assisting Tools (https://ceur-ws.org/GenAI/Policy.html).</p>
    </sec>
    <sec id="sec-8">
      <title>A. Entity relationship diagrams</title>
    </sec>
    <sec id="sec-9">
      <title>B. Sample questions</title>
      <p>49
50</p>
      <p>Question
Find the average time taken to resolve support tickets.</p>
      <p>Retrieve a breakdown of revenue by country.</p>
      <p>Find the employee who has resolved the most support tickets.</p>
      <p>Find the average delivery time for all shipped orders.</p>
      <p>Find the percentage of support tickets that were resolved within 24 hours.</p>
      <p>How many support tickets are currently open?
What is the average order amount placed by customers who subscribed to the newsletter?
What is the average order value for orders placed on weekends vs weekdays?
Which countries have the highest proportion of customers subscribing to the newsletter?
Which top 5 countries have the most customers subscribing to the newsletter?
How many orders were placed in each week of the year?
What is the average sales amount for each day of the week?
How many support tickets are currently ’In Progress’ and were created in calendar week 1?
How many products are supplied by suppliers with ’gmail.com’ in their contact email?
How many orders used a payment type that is NOT ’CREDIT’?
List the first and last names of employees with the word ’Manager’ in their position?
Which orders from which customers took longer than 7 days to deliver?
List the first and last names, and E-Mail address of customers whose email addresses end with
’.net’?
List the customers’ first and last name, and order dates for orders with a total amount between
$100 and $200?
Provide a table with the employee’s first, last name, and the customers they are associated with.
Sort them by employee.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Angelopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Stoica</surname>
          </string-name>
          ,
          <article-title>Chatbot arena: An open platform for evaluating LLMs by human preference</article-title>
          ,
          <source>Forty-first International Conference on Machine Learning</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. S. H.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive NLP tasks</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-M. Chan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yi</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            , H.-T. Zheng,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J. L. . M.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Parameter-eficient fine-tuning of large-scale pre-trained language models</article-title>
          .,
          <source>Nature Machine Intelligence</source>
          <volume>5</volume>
          (
          <year>2023</year>
          )
          <fpage>220</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohammadjafari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Maida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gottumukkala</surname>
          </string-name>
          ,
          <article-title>From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems</article-title>
          ,
          <source>CoRR abs/2410</source>
          .01066 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsogiannis-Meimarakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Koutrika</surname>
          </string-name>
          ,
          <article-title>A survey on deep learning approaches for text-to-</article-title>
          <string-name>
            <surname>sql</surname>
          </string-name>
          ,
          <source>The VLDB Journal 32.4</source>
          (
          <year>2023</year>
          )
          <fpage>905</fpage>
          -
          <lpage>936</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>A survey of large language models</article-title>
          ,
          <source>CoRR abs/2303</source>
          .18223 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pourreza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          , DIN-SQL:
          <article-title>Decomposed in-context learning of text-to-sql with self- correction</article-title>
          .,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>36339</fpage>
          -
          <lpage>36348</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchev</surname>
          </string-name>
          , RAGSQL:
          <article-title>Context Retrieval Evaluation on Augmenting Text-to-SQL Prompts</article-title>
          ,
          <source>in: IEEE 12th International Conference on Intelligent Systems</source>
          , IS,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Zhou,
          <article-title>Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents.</article-title>
          ,
          <source>CoRR abs/2412</source>
          .05850 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-</article-title>
          <string-name>
            <surname>SQL</surname>
          </string-name>
          ,
          <source>CoRR abs/2312</source>
          .11242 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Biswal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kamsetty</surname>
          </string-name>
          , A. Cheng, L. G. Schroeder,
          <string-name>
            <given-names>L.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Mo</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Stoica</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. E.</surname>
          </string-name>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>