<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Implementation of transformer model for natural language to SQL query translation⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kateryna Yalova</string-name>
          <email>yalovakateryna@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mykhailo Babenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kseniia Yashyna</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Boyarchuk</string-name>
          <email>a.boyarchuk@taltech.ee</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dniprovsky State Technical University</institution>
          ,
          <addr-line>Dniprobudivska str., 2, Kamianske, 51918</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SIRIS Academic</institution>
          ,
          <addr-line>Francesc Cambo Av. 17, Barcelona, 08003</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tallinna Tehhnikaülikool</institution>
          ,
          <addr-line>Ehitajate tee 5, Tallinn, 12616</addr-line>
          ,
          <country country="EE">Estonia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The application of Natural Language Interfaces to Databases (NLIDB) serves as an important means of reducing the technical and IT competence requirements for database users. The objective of this study is the development and experimental evaluation of an NLIDB system that leverages a transformer-based architecture to generate SQL queries from user inputs formulated in natural language. The paper presents a generalized scheme of user interaction with the database via the NLIDB system and describes the main stages of input query preprocessing, including tokenization, embedding, positional encoding, and the architecture of the transformer-based neural network. The proposed model incorporates multi-head attention, which enables effective modeling of input query contexts, as well as the ADAM optimizer. The Spider corpus was utilized for training and evaluating the model. To assess the performance of the resulting model, in addition to accuracy, the BLEU metric was employed to quantitatively evaluate the degree of correspondence between the generated and expected queries, taking lexical similarity into account. The best experimental accuracy reached 69% using the BERT-base tokenizer and 63% with the basic Keras tokenizer. The highest BLEU score for the model with the BERT-base tokenizer was 46%, and 43.3% for the model with the basic Keras tokenizer. A comparative analysis demonstrated the competitiveness of the proposed model, indicating the effectiveness of the adopted solutions. The model performed best on simple and short queries, while the most challenging cases involved queries with literals that required inter-table relationships and domain-specific knowledge.</p>
      </abstract>
      <kwd-group>
        <kwd>Transformer-based neural network</kwd>
        <kwd>natural language to database interface</kwd>
        <kwd>Spider dataset1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The use of natural language as a means of interaction between humans and information or computer
systems is one of the key research directions in the fields of data processing and artificial intelligence.
One of the current challenges in Natural Language Processing (NLP) is the development of Natural
Language Interfaces to Databases (NLIDB), which enable users to formulate queries in natural
language without the need to know or use the Structured Query Language (SQL). A Natural
Language Interface to Database is a system that provides users with a mechanism for transforming
input text into SQL queries, thereby enabling natural language interaction with relational databases
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The integration of an NLIDB interface as an abstraction layer between users and relational
database systems offers a key advantage
simplified interaction through natural language
and can
be employed to lower the knowledge requirements for both developers and users of relational
databases. The main purpose of embedding the NLIDB module into the architecture of an
information system is to improve access to databases, facilitate interaction with information systems,
and enhance overall user productivity. Practical applications of NLIDB include its use in business
contexts to simplify data access and improve analytics, as well as its integration with other intelligent
technologies for automatic entity linking, sentiment analysis, and machine learning, with the goal of
developing comprehensive solutions for business analytics and business process refactoring. The
growing interest of the business sector in adopting this technology, particularly in combination with
deep learning, underpins the rapid advancement and increasing popularity of NLIDB development
tasks.
      </p>
      <p>
        The automatic generation of SQL queries from natural language text sequences is a complex task
that requires precise analysis, contextual processing, and the transformation of unstructured text
into formalized queries [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The accuracy of such transformations remains insufficient due to the
inherent complexity of natural language, including word polysemy, syntactic variability, and
semantic ambiguity.
      </p>
      <p>
        Significant progress in this field has been achieved by transitioning from rule-based and
templatebased approaches to the application of neural networks and deep learning techniques. Currently,
various types of recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are
typically employed to solve natural language processing (NLP) tasks, including the Text-to-SQL
problem. However, transformer-based neural networks are gradually replacing them due to their
flexibility and high performance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The objective of the present study is to design, adapt, and experimentally evaluate a
transformerbased architecture for the task of automatic SQL query generation from natural language text
sequences. The research is based on the hypothesis that the transformer neural network architecture,
equipped with a self-attention mechanism, can be effectively applied to the Text-to-SQL problem.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Prior to 2017, most natural language processing models relied on RNNs incorporating encoder
decoder mechanisms to capture contextual relationships and dependencies between words in a
sentence. However, this approach demonstrated lower effectiveness compared to the statistical
methods dominant at the time, leading to the prevailing belief that neural networks were not
wellsuited for machine translation tasks. Moreover, RNN-based models suffered from a limitation in
which the encoder tended to lose information from the beginning of the sequence if the input was
too long [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. For instance, the authors of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] achieved only 36% BLEU score for their proposed
Seq2Seq-based NLIDB model. Another model, NL2pSQL [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], implemented using a
sequence-tosequence architecture, improved the BLEU score from 27% to 31% by incorporating a denoising
autoencoder mechanism. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a model based on Long Short-Term Memory (LSTM) with two
hidden layers and a dual-encoder mechanism, trained and evaluated on the SENLIDB and WikiSQL
datasets, achieved an accuracy of 38.99% on the test set. Various strategies were employed to improve
neural network performance, such as the use of bidirectional recurrent neural networks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
However, even in these cases, information from the middle of long sequences was often lost. Another
significant drawback was the necessity for the encoder to compress the entire input sequence into a
single hidden state vector, which adversely affected translation quality.
      </p>
      <p>
        A breakthrough in addressing the limitations of recurrent neural networks (RNNs) emerged in
2015 with the introduction of the Attention mechanism, which was integrated into existing RNN
architectures. This mechanism enabled models to assign weights to different parts of the input
sequence regardless of their position, thereby allowing for a more flexible and context-aware
processing of input data [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The implementation of attention significantly mitigated issues related
to information loss in long sequences and the constraint of encoding the entire sentence into a single
hidden state vector [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        NLIDB models such as SQLova, described in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and IRNet [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which are based on
attentionenhanced bi-directional LSTM architectures, demonstrated considerable performance gains,
achieving 80% accuracy on the WikiSQL dataset [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and 53% accuracy on the Spider test set [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
respectively. In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the integration of a self-attention mechanism into an LSTM-based neural
network led to translation accuracies ranging from 32% to 60% for natural language to SQL queries.
      </p>
      <p>
        In 2017, a novel neural network architecture called the Transformer was introduced in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. This
model proposed a new paradigm that eliminated the need for recurrent components entirely, relying
solely on attention mechanisms and thereby avoiding sequential computation. This marked a
significant advancement in the fields of natural language processing and deep learning [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The
Transformer architecture substantially improved training speed and model efficiency, particularly
when applied to large-scale datasets [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Transformer models opened a new frontier in data
processing especially textual data and redefined the methodology for tackling NLP tasks.
      </p>
      <p>
        The Transformer model has become one of the most prominent architectures for Natural
Language Inference (NLI) tasks [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], which encompass a wide range of NLIDB-related challenges.
Studies [8 15] report on the development, evaluation, and performance analysis of various neural
network models, including TaPEx, which achieved 57 89% accuracy on the WikiSQL dataset [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
and SQL-PaLM [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], which demonstrated 77% execution accuracy on short queries from the Spider
dataset. The SPSQL model, introduced in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], reported an impressive 95% accuracy; however, this
high recognition rate was attained on a significantly restricted dataset. Specifically, the experiments
were conducted using a single database consisting of 37 tables, and the training and testing datasets
included only 9,792 and 1,088 query pairs, respectively. In [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], the authors proposed the STAR
framework, which leverages transformer-based pre-training and a self-attention mechanism to
encode both text and SQL queries. This approach achieved 46.6% and 28.2% on the Interaction Match
metric for the SParC and CoSQL test datasets, respectively. The RASAT model [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] extends a
pretrained Seq2Seq transformer architecture and demonstrated execution accuracies of 37.4% and
52.6% on the CoSQL and SParC benchmarks, respectively.
      </p>
      <p>
        An important milestone in the development of NLIDB is the modifications of the Transformer
model, for example, the Bidirectional Encoder Representations from Transformers (BERT) model. It
is based on the Transformer model, the improvements of which allow this model to more accurately
understand the context through the analysis of sequences in two directions. Transformer and BERT
architectures allows systems to analyze more complex phrases, better understand context, and form
queries that require a deeper understanding of language. Model RAT-SQL [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] integrates BERT and
demonstrates 65.6% accuracy on the Spider benchmark dataset. SDSQL [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] is a transformer-based
model that incorporates BERT as a pre-trained encoder to generate rich contextual embeddings.
      </p>
      <p>Its developers achieved up to 85% Logical Form Accuracy on the Spider test dataset.
The results of recent studies demonstrate that, despite the original purpose of Transformer models
being sequence-to-sequence translation, their application in the task of NLIDB proves to be a
promising direction and remains a relevant scientific and practical challenge.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed methodology</title>
      <p>The implementation of the proposed task was carried out in several stages: data preprocessing,
Transformer model development, evaluation of the obtained results, and optimization of the model
architecture. To construct the model, both BERT tokenizers and the basic Keras tokenizer were
employed to ensure high-quality vector representations of natural language input queries. The
decoder was implemented based on the classical Transformer architecture, enabling efficient
generation of SQL queries through step-by-step construction of output sequence tokens.
performance on multi-table schemas with complex joins, nested queries, and aggregations. The
neural network was trained using the Adam optimizer, with categorical cross-entropy selected as
the loss function. A series of experiments was conducted to determine the optimal model
hyperparameters, including the number of Transformer layers, the dimensionality of hidden vectors,
the number of attention heads, batch size, and the number of training epochs.</p>
      <p>To quantitatively assess the proposed model, accuracy and the BLEU score (adapted for the
Textto-SQL task) were used as evaluation metrics.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The primary type of data storage in modern information systems is the relational database, in which
data are stored in the form of interrelated tables. While the use of SQL remains an effective tool for
interacting with databases, it requires technical expertise and knowledge of the underlying database
structure. The application of a Natural Language Interface to Database (NLIDB) helps reduce the
entry barrier for using information systems and enhances data accessibility and usability, as it
enables users to interact with databases using natural language. Text-to-SQL is one of the approaches
to implementing NLIDB. It involves the automatic transformation of a natural language query into
an executable SQL query over a relational database. A generalized schema of the Text-to-SQL
problem implementation is presented in Figure 1.</p>
      <p>In the generalized case, the architecture of a neural network applied to solve the Text-to-SQL
problem and enable the conversion of an input natural language text sequence into output SQL
queries includes mandatory components such as:
•
•
•
encoder is a set of neural modules that provide taking of words from the input sequence
by turns and formation of one or more hidden states characterizing the input sequence;
decoder is a collection of neural modules, that use the hidden state of the encoder to predict
the result;
components for translating natural language sentences to SQL queries.</p>
      <p>The application of the Transformer model involves performing the following data preprocessing
operations: tokenization, embedding, and positional encoding, the result of which is data prepared
for input into the neural network.</p>
      <sec id="sec-4-1">
        <title>4.1. Preprocessing stages</title>
        <p>Tokenization plays an important role in sequence transformation problems. It is used to divide the
input and output sequences into separate elements called tokens. A token can represent a word, word
substring, or even a character, depending on the level of detail required. Tokenization methods vary
depending on the details of the text breaking and the specific requirements of the task. During the
development of the proposed NN model, both general and individual tokenizers were used to perform
textual sequences to the form of tensors. In the development and testing process the BERT tokenizer
was implemented. The developed and implemented into the NN tokenization algorithm can be
realized in the following basic steps:
1. The initial textual sequence is encoded using a tokenizer, single for the input sequence, or
separate for each of them.
2. The tokenizer denotes the beginning and the end of each sequence with &lt; sos &gt; and &lt; eos &gt;
(start of sentence, end of sentence) markers.
3. The first token sent to the decoder is the sequence start label (&lt; sos &gt;).
4. The decoder creates a prediction by checking the output
self5. The predicted token is then fed back to the decoder input. This operation is repeated until
the model displays a label about the end of the sequence (&lt; eos &gt;).</p>
        <p>The input sentence, presented in natural language, is transformed into numerical representations
known as embeddings. Embeddings capture the semantic meaning of tokens within the input
sequence. Through embedding, the input tokens are converted into their corresponding vector
representations. The key parameters of this process include the vocabulary size V and the embedding
vector dimension d. The embedding process can be described as follows:
 (  )=   ∙   ,
(1)
where xi is the i-th input token, WE</p>
        <p>is the embedding matrix, that can be described as  =
  ∈   ×</p>
        <p>, oi is a one-hot vector for the token xi.</p>
        <p>The embedding mechanism is employed only in the lowest block of the encoder. Each encoder
layer receives as input a vector whose size is a global parameter of the network and typically
corresponds to the length of the longest sentence. The main parameters of the embedding are:
•
•
•
the number of unique words or categories in the given dataset;
the dimensionality of the vector space into which the words are embedded. This value is
userdefined and serves as a hyperparameter of the model;
the length of input sequences, i.e., the number of words in each input example. This
parameter helps to fix the size of input sequences for more consistent processing by the
model.</p>
        <p>To handle the problem of different occurrences of the same word, positional encoding is used
vectors that represent the context of a word within the sequence. Positional encoding is added to the
embedded tokens before their input to the model. It provides the model with information indicating
the position or relative distance between tokens in the sequence. This is crucial because transformers
lack recurrence or convolution mechanisms. For each position p and embedding dimension i:</p>
        <p>( , 2 )= sin (
( , 2 + 1)= cos (
10000 


2
10000</p>
        <p>),
2
),
(2)
where p</p>
        <p>is a position of the token in the sequence.</p>
        <p>The result of positional encoding is a matrix of the same size as the token embeddings, containing
encoded positional information.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Transformer-based architecture</title>
        <p>Transformers consist of multiple layers that progressively refine data representations, enabling the
capture of hierarchical and abstract features. When conceptualized as a black box, the internal
structure of the model comprises two main components: the encoder and the decoder, within which
transformer blocks are implemented, with the attention layer serving as the core mechanism. Figure
2 illustrates the architecture of the transformer-based neural network.</p>
        <p>The encoder is the part of the neural network responsible for processing the input message, while
te the output sequence. In the Text-to-SQL task, the encoder is
used to transform the input text into representations that preserve the semantics of the database
query. The encoder receives vector representations of the input tokens and processes them through
multiple layers. In turn, the decoder takes the processed input sequence from the encoder and
translates it into SQL queries. The use of the encoder and decoder, along with data normalization
through the application of a softmax layer, enables the transformation of natural language input text
into syntactically correct SQL queries.</p>
        <p>A key architectural difference in the decoder is the presence of an additional attention block,
which helps the decoder attend to relevant parts of the input sentence. To construct a Transformer
model, it is necessary to combine the encoder and decoder components and add, after the decoder
output, a linear layer whose output size equals the target language vocabulary, followed by a softmax
layer. Each transformer block consists of two main parts: self-attention and a feed-forward network.
The input data first passes through the self-attention layer, which enables the model to consider
other words in the sequence while encoding the current word. Subsequently, the output from this
layer is fed into a fully connected feed-forward neural network, which remains the same across all
layers.</p>
        <p>The generalized algorithm for applying the transformer in the Text-to-SQL task consists of the
following steps:
1. The input text, presented in natural language, is converted into a tokenized sequence.
2. The transformer processes this sequence by utilizing the following layers:
a. Multi-Head Attention: determines relationships between tokens.
b. Add layer: incorporates a residual connection where the output of the attention mechanism
is added to the original tensor.
c. Layer Normalization: stabilizes the data after the attention layer.
d. Feed-Forward Network: performs a nonlinear transformation.
e. Add layer: incorporates a residual connection where the output of the Feed-Forward
Network layer is added to its input via an Add operation to preserve essential information
from the previous layer.</p>
        <p>f. Layer Normalization: again stabilizes the data.</p>
        <p>3. The resulting vector representation is decoded into an SQL query.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.2.1. Self-attention mechanism implementation</title>
        <p>
          The
selftext during the generation of the output translation [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. During training on a large dataset, the
model internalizes this understanding. The self-attention mechanism allows each word to
simultaneously attend to other words in the sequence, considering their importance relative to the
current token. Thus, it can be argued that machine learning models can learn grammatical rules
based on the statistical probabilities of word usage in language. To implement the self-attention
mechanism, three vectors corresponding to each encoder vector must be created: Q={q1,...,qn} is a
set of queries, K={k1,...,km} is a set of keys, V={v1,...,vi} is a set of values. These vectors are formed
by multiplying the embedding by three weight matrices WQ, WK, WV corresponding to queries, keys,
and values, respectively:
        </p>
        <p>=  ∙   ,  =  ∙   ,  =  ∙   ,
where X matrix of embedding vectors of input tokens.</p>
        <p>Self-attention calculation is carried as:
 = 
 ∗  
√ 
 .</p>
        <p>(4)
(5)</p>
        <p>The generalized algorithm for applying the self-attention mechanism involves the following
steps:
1. Calculation of the attention score for each current token, which is the result of the dot
product between the query vector Q and the keys vector K of the current word.
2. Scaling of scores. To avoid excessively large values, the scores are divided by the square root
of the vector dimension
3. The softmax function is applied to normalize the scores into the range between 0 and 1.
4. The resulting attention weights are applied to the value vectors V for each token.</p>
        <p>This result represents a weighted representation of the input sequence, where the importance of
each token is considered in the context of others. In this work, the basic attention mechanism was
implemented, and all other attention mechanisms in the model inherited its implementation.
Moreover, since it is necessary to determine not only the weight of each word in a sentence but also
its relationship with other words, for each word it is necessary to compute multiple self-attention
vectors and then calculate their arithmetic mean. This process is called Multi-Head Attention. The
main parameters of Multi-Head Attention are:
•
•
•
•
number of heads in the attention mechanism. Each attention head hi has its own query vector,
key and value ∀ℎ , ∃  ,   ,   ;
size of the keys set and values for each head, that usually obtained by dividing the total size
of the layer by the number of self-attention heads;
dimension of the values set for each block, calculated as a result of division of the total layer
size by the number of self-attention heads;
dropout probability for block responses before combining them into the final output.
After passing through n multi-head attention blocks, n matrices were obtained, one from each block.
The computed results for each head hi were concatenated and passed through an additional linear
transformation:</p>
        <p>=  (ℎ1, ℎ2, . . , ℎ )  , (6)
where Wo is the weight matrix for combining the heads.</p>
        <p>The attention block that connects the encoder and decoder parts is the most significant use of
attention in the model and performs the task of identifying dependencies between the input and
output sequences. In the developed model, this layer was termed Cross-Attention and implemented
by passing the target sequence X as the query vector Q, and the context from the encoder as the key
K and value V vectors.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.2.2. Adaptive moment estimate algorithm</title>
        <p>Transformer models are trained using supervised learning by minimizing a loss function. During
vocabulary is first created, which consists of a vector of words as well as a vector indicating each
word in the vocabulary. For model training, the decision was made to apply the Adaptive Moment
Estimation (ADAM) optimization algorithm, which combines the concepts of two other methods
Momentum and RMSProp
to compute individual learning rates for each parameter. To stabilize the training process and reduce
oscillations, exponentially weighted moving averages of the gradients and their squares are used:
    ==   21  −−11++((11−−  21)) 2 ,, (7)
where mt the first moment, vt the second moment, gt the gradient of the current
minibatch, 1 2 are the coefficients for the first and second moments, respectively.</p>
        <p>It implements correction of NN learning speed using the formula:</p>
        <p>=  −0.5 min(   −0.5,    ∙    −1.5). (8)
ADAM has adaptive gradient descent properties and adapts the learning rate for each parameter,
using an exponentially weighted average squared gradient, that allows to control the amplitude of
updating weights effectively. The learning rate diagram of the developed model is shown in Fig. 3.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>For the development and testing of neural models, the Kaggle environment was chosen. This
development platform provides cloud-based development and application deployment, offering fast
graphical accelerators that significantly speed up the training of various models. Another advantage
of this service is the ability to save input data within the project. Such data may include training
datasets as well as saved and trained models, which allow using the model without retraining it from
scratch.</p>
      <p>For the experiments, it was decided to use 8 self-attention blocks, varying the number of layers
(from 4 to 6), embedding vector sizes (from 128 to 512), and the dimensionality of the feed-forward
network layer (from 512 to 2048). Compared to WikiSQL, the SQL syntax in the Spider dataset is
more complex and diverse. Spider is a large-scale collection of interdomain semantic analysis and
textual SQL datasets. The aim of using Spider is to develop natural language interfaces for
interdomain databases. The Spider training set contains 10,181 questions and 5,693 unique complex
SQL queries for 200 databases with multiple tables, covering 138 different data domains. The Spider
dataset includes the use of JOIN operations, aggregate functions, nested subqueries, string and date
operations, as well as operators such as LIKE, IN, and BETWEEN, which are especially useful for
implementing real-world NLIDB systems. Spider has SQL queries of varying complexity, as well as
sets with databases for NNs learning and testing. The Spider dataset consists of the following data:
•
•
•
•
•
•
•
db_id is a database identifier;
question is a query in natural language;
question_toks is a query in a natural language divided into tokens;
query is a SQL-query;
query_toks_no_value is a natural language query divided into tokens, where the
parameters for conditional con mask;
query_toks is a natural language query;
sql additional parameters related to query parameters (query type, availability of conditions
or aggregation, parameters for conditional operators).</p>
      <p>In a series of experiments using the Spider dataset, the field question was translated into the field
query. Additionally, the query_toks_no_value field was used to transform natural language into an
SQL query, where conditional parameters were replaced with the mask "value." Subsequently, these
masks were substituted with specific values using auxiliary neural networks or syntactic analysis of
the input sequence. For the insertion of conditional operator values, semantic analyzers and regular
expressions were employed to identify numerical values or substrings within the input sequence.
Table 1 presents the identified optimal hyperparameters of the model.</p>
      <p>To analyze and compare the accuracy of each model, translation was performed on the first 100
pairs from the test set of the given dataset. This sample size was chosen due to the relatively long
generation time of the neural model, while enabling an object
performance. The loss function and execution accuracy of the proposed model using the BERT-base
tokenizer are presented in Figure 4. These graphs serve as important tools for monitoring and
diagnosing the training process of the model and assist in making informed decisions regarding
hyperparameter optimization and architectural improvements. During training, the neural model
trained with Spider dataset demonstrated a gradual decrease in loss function and increased accuracy
on the training sample.</p>
      <p>Analyzing the loss function graph, it can be concluded that the proposed model trains well on the
training data, with its error decreasing accordingly. However, the validation loss initially decreases
but begins to increase after the 6th epoch, indicating model overfitting and a loss of generalization
capability on new, unseen data. Since the minimum validation loss is reached approximately at the
6th epoch, this may indicate an optimal point to stop training in order to prevent further overfitting.</p>
      <p>Training accuracy steadily increases, consistent with the decrease in training loss, confirming
that the model becomes progressively more accurate. Validation accuracy also rises during the initial
epochs but then plateaus, further confirming the presence of overfitting. Although the model
continues to improve performance on the training data, its generalization ability on new data either
stagnates or increases very slowly. The highest accuracy achieved by the model using the BERT-base
tokenizer was 69%, compared to 63% using the basic tokenizer.</p>
      <p>In addition to accuracy values, criteria that assess the naturalness of translation are used to
evaluate the quality of neural translation. To assess the quality of results obtained using a
transformer-based neural network, this work employed the Bilingual Evaluation Understudy (BLEU)
metric in the context of the Text-to-SQL task. This metric measures the similarity between the
translation and the original text based on a statistical analysis of word overlap. To apply the BLEU
metric for evaluating the quality of NLIDB, it is necessary to implement the following steps:
1. The predicted and reference text are tokenized into separate words or phrases. SQL queries
are tokenized by keywords, operators, and values (e.g., SELECT, FROM, WHERE, =, numbers,
and identifiers). Some
2. Each token is given weight depending on its length.</p>
      <p>Count the number of farms
select count ( * ) from farm</p>
      <p>select count ( * ) from farm</p>
      <sec id="sec-5-1">
        <title>Reference SQL-query</title>
        <p>Predicted SQl-query
select count ( * ) from
department where
department_id not in ( select
department_id from
management )
select count ( * ) from
department where department
_ id not in ( select department
_ id from department )
select count ( * ) from concert</p>
        <p>select count ( * ) from
where YEAR = 2014 OR
YEAR = 2015
campuses where year = 2014
or year = 2015
select country, count ( * ) from
3. BLEU calculates the number of n-grams (n token sequences) in the generated translation,
which are also found in reference SQL-query.
4. BLEU calculates the accuracy of the generated translation by comparing the number of
ngrams matches in the predicted SQL-query and reference one.</p>
        <p>Accuracy is calculated for different levels of n-grams (usually from 1 to 4), and then combined
into arithmetic mean to obtain the final BLEU score. In this work, bigrams were used for the
(9)
(10)
Some examples of the used queries with Spider dataset</p>
        <p>BLEU calculation.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Initial textual sequence Training</title>
      </sec>
      <sec id="sec-5-3">
        <title>How many departments are led by heads who are not mentioned?</title>
      </sec>
      <sec id="sec-5-4">
        <title>Testing How many concerts are there in year 2014 or 2015?</title>
        <p>1,  
ℎ,
.</p>
        <p>Analyzing the chosen combinations of NN parameters using the BLEU metric, the most accurate
was a medium-sized with a BERT tokenizer NN model trained on the Spider dataset. Its BLEU score
is 46% for the first 100 test sample conversions. A similar NN model with a basic tokenizer reached
a BERT score of 43.3%.
datasets is presents in the Table 3.</p>
        <p>
          Comparative analysis of the obtained results with neural network models tested on the different
The presented model underperforms compared to SQL-PaLM [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and SDSQL [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], which achieve
accuracies of 77% and 85%, respectively; however, these models are significantly larger, employ
largescale fine-tuning, or incorporate built-in context. In other cases, the proposed model demonstrates
competitive accuracy, which may indicate the effectiveness of the proposed design solutions. The
accuracy of SQL query generation strongly depends on the neural network architecture, its
hyperparameters, and the training algorithm. The use of neural network models enhances the power
of NLIDB systems, making them more adaptable to various types of databases and improving their
ability to interpret unstructured information. The diversity of proposed architectures and
How many singers are from
        </p>
        <p>select country , count ( * )
each country?
from singer group by country
artist group by country</p>
        <p>To prevent overestimation due to repetitions in the generated query, the frequency of n-grams in
the hypothesis is limited by their frequency in the references. The BLEU metric value is defined as:

= BP ∙ exp (∑      ),

 −1
where wn are weights for each n-gram, BP stands for Brevity Penalty, which is used as a length
penalty. The purpose of applying BP is to reduce the score of short queries that may have high
precision. BP is defined by the following condition:
implementation approaches underscores the relevance of finding effective solutions for translating
natural language into SQL queries.</p>
        <p>The described model performed well in translating simple queries to tables used during training.
The most problematic queries were:</p>
        <p>Complex queries requiring joins between tables. When training and test sets contained
queries related to different entities, the model sometimes failed to understand which entities
to use, despite generating a structurally correct query.</p>
        <p>Queries containing constant values and keywords. Unmasked values for conditional
operators did not distinguish values in the natural language query, resulting in incorrect
outputs.</p>
        <p>Possible ways to improve the accuracy of the developed model include:
1. Extracting specific words and entities from natural language queries in full. This approach
removes from the natural language sequence those words that characterize important parts
of the query, such as query type, target table, condition types, grouping, and table joins. After
extracting these components, the query can be constructed according to a predefined
template.
2. Training the model on database schemas, so that given existing entities, the model can
recognize which words correspond to field and table names, identify their associated
databases, understand relationships between tables, and recognize dynamic values used, for
example, in conditional operators.</p>
        <p>Regarding the strategy for improving model training, the authors consider it appropriate to
implement Early Stopping to prevent overfitting and Data Augmentation by transforming existing
data, which will be addressed in future research.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The main task of the NLIDB system developer is to implement functionality that allows to establish
human-system interaction, so user queries are recognizable, can be turned into SQL command and
provides the correct and expected results. Overall, the development of NLIDB systems can be justified
by the need to improve data accessibility, facilitate interaction with information systems, and
enhance user productivity. The task of creating an NLIDB system can be approached algorithmically
in a manner analogous to automatic natural language translation, with the key distinction that the
target language is not a natural language but the structured query language SQL. This implies that
machine translation technologies can be adapted to transform textual queries into SQL.</p>
      <p>In the present work, the application of a transformer-based neural network architecture for the
task of automatic SQL query generation from natural language text sequences (Text-to-SQL), which
constitutes a core component of NLIDB systems, was proposed. The objective of the study was to
develop and adapt a transformer-based architecture for the efficient conversion of natural language
queries into formalized SQL statements.</p>
      <p>Although transformers are most commonly utilized for translation between natural languages,
experimental results on the Spider dataset demonstrated the potential of the proposed architecture
in solving the Text-to-SQL task. The highest achieved model accuracy using the BERT-base tokenizer
was 69%, while using the basic Keras tokenizer it reached 63%. To evaluate the quality of the
translation from natural language queries to SQL, the BLEU metric adapted for the Text-to-SQL task
was additionally employed, enabling the assessment of similarity between generated SQL queries
and reference queries. The use of BLEU allowed a quantitative estimation of the correspondence of
generated queries to the expected results, taking lexical similarity into account. The highest BLEU
score for the model with the BERT-base tokenizer was 46%, and 43.3% for the model with the basic
Keras tokenizer.</p>
      <p>Analysis of the obtained results revealed certain limitations of the developed model, particularly
in handling complex SQL queries involving multiple table joins and non-standard conditional
operator values. The model exhibited better performance on simpler queries to tables represented in
the training dataset. To further improve accuracy and generalization capability, it is advisable to
apply methods for extracting key entities from natural language queries, to train the model on
database schemas for better understanding of context and relationships between tables, as well as to
implement Early Stopping and Data Augmentation strategies to prevent overfitting.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bashir</surname>
          </string-name>
          ,
          <article-title>A Review of NLIDB with deep learning: findings, challenges and open sssues</article-title>
          .
          <source>IEEE Access</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>14927</fpage>
          14945. doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3147586</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Xu,</surname>
          </string-name>
          <article-title>NLI4DB: A systematic review of natural language interfaces for databases</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/html/2503.02435v1#
          <fpage>bib</fpage>
          .bib53
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Majhadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mustapha</surname>
          </string-name>
          ,
          <article-title>The history and recent advances of Natural Language Interfaces for databases querying</article-title>
          .
          <source>E3S Web of Conferences</source>
          <volume>229</volume>
          (
          <year>2021</year>
          )
          <article-title>01039</article-title>
          . doi:
          <volume>10</volume>
          .1051/e3sconf/202122901039.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nagarkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nalhe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vijayakumar</surname>
          </string-name>
          ,
          <article-title>Deep learning driven natural languages text to SQL query conversion: a survey</article-title>
          .
          <source>Journal of latex class files 14</source>
          (
          <year>2022</year>
          ) 1
          <fpage>18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Iacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.-S.</given-names>
            <surname>Apostol</surname>
          </string-name>
          , C.
          <article-title>-</article-title>
          <string-name>
            <surname>O. Truica</surname>
          </string-name>
          , I. Hosu, T. Rebedea,
          <article-title>Neural approaches for natural language interfaces to databases: a survey</article-title>
          ,
          <source>Proceedings of the 28th International Conference on Barcelona, Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>381</fpage>
          <lpage>395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <article-title>Generating semantically valid adversarial questions for TableQA AAAI</article-title>
          . URL: https://arxiv.org/abs/
          <year>2005</year>
          .12696. doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>2005</year>
          .
          <volume>12696</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Ha</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Kim.</surname>
          </string-name>
          <article-title>NL2pSQL: generating pseudo-SQL queries from under-specified natural language questions</article-title>
          ,
          <source>Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing EMNLP- 2613.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>I. Hosu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Iacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruseti</surname>
          </string-name>
          , T. Rebedea,
          <article-title>Natural Language Interface for Databases using a dual-encoder model</article-title>
          ,
          <source>Proceedings of the 27th International Conference on Computational Linguistics</source>
          , Santa Fe, New Mexico, USA,
          <year>2018</year>
          , pp.
          <fpage>514</fpage>
          <lpage>524</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ghaeini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Datla</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Qadir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prakash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. Z.</given-names>
            <surname>Fern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Farri</surname>
          </string-name>
          , DR-BiLSTM:
          <article-title>Dependent Reading Bidirectional LSTM for Natural Language Inference</article-title>
          ,
          <source>Proceedings of NAACL-HLT 18</source>
          ,
          <string-name>
            <surname>New</surname>
            <given-names>Orleans</given-names>
          </string-name>
          , Louisiana,
          <year>2018</year>
          , pp.
          <fpage>1460</fpage>
          <lpage>1469</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Galassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lippi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Torroni</surname>
          </string-name>
          .
          <source>Attention in Natural Language Processing, IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>32</volume>
          (
          <year>2021</year>
          )
          <fpage>4291</fpage>
          4308. doi:
          <volume>10</volume>
          .1109/TNNLS.
          <year>2020</year>
          .
          <volume>3019893</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Soydaner</surname>
          </string-name>
          , Attention Mechanism in Neural Networks:
          <article-title>Where it Comes and Where it Goes</article-title>
          . URL: https://arxiv.org/pdf/2204.13154.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Seo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>A comprehensive exploration on WikiSQL with tableaware word contextualization</article-title>
          ,
          <source>Proceedings of the 33rd Conference on Neural Information</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , J.-G. Lou, T. Liu,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation</article-title>
          ,
          <source>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          -1444.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          , D. Song,
          <article-title>SQLNet: Generating structured queries from natural language without reinforcement learning</article-title>
          ,
          <source>Proceedings of the Sixth International Conference on Learning Representations</source>
          <volume>18</volume>
          ,
          <string-name>
            <surname>Vancuver</surname>
          </string-name>
          , Canada,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.1711.04436.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          . URL: https://doi.org/10.48550/arXiv.1706.03762.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yemets</surname>
          </string-name>
          ,
          <article-title>Time series forescasting model for solving cold start problem via temporal fusion transformer</article-title>
          ,
          <source>Computer systems and information technologies 1</source>
          (
          <year>2024</year>
          )
          <fpage>57</fpage>
          64. doi: https://doi.org/10.31891/csit-2024
          <source>-1-7</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shahin</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <article-title>Ismail, natural language processing and sign language translation systems: survey, taxonomy and performance evaluation</article-title>
          ,
          <source>Artifitial Intelligence Review</source>
          <volume>57</volume>
          2024. doi: https://doi.org/10.1007/s10462-024-10895-z.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Thoyyibah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Haryono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.U.</given-names>
            <surname>Zailani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.M.</given-names>
            <surname>Djaksana</surname>
          </string-name>
          ,
          <article-title>Transformers in machine learning: literature review</article-title>
          ,
          <source>Jurnal Penelitian Pendidikan</source>
          <volume>9</volume>
          (
          <year>2023</year>
          )
          <fpage>604</fpage>
          -
          <lpage>610</lpage>
          . doi:
          <volume>10</volume>
          .29303/jppipa.v9i9.
          <fpage>5040</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ziyadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , J.-G. Lou, TAPEX:
          <article-title>Table Pre-training via Learning a Neural SQL Executor</article-title>
          ,
          <source>Proceedings of the Tenth International Conference on Learning Representations (Virtual) ICLR 2022 Vancuver, Canada</source>
          ,
          <year>2022</year>
          . doi: https://arxiv.org/pdf/2107.07653.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ö. Arik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Miculicich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gundabathula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nakhost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , T. Pfister, SQL-PaLM:
          <article-title>Improved Large Language Model Adaptation for Text-to-SQL</article-title>
          . URL: https://arxiv.org/abs/2306.00739.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , SPSQL:
          <article-title>Step-by-step Parsing Based Framework for Text-to-SQL Generation</article-title>
          . URL: https://arxiv.org/abs/2305.11061 doi: https://doi.org/10.48550/arXiv.2305.11061.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bin. Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Si</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing</article-title>
          ,
          <source>Proceedings of the Conference on Emperical Methods in Natural Language Processing EMNLP</source>
          <year>2022</year>
          ,
          <article-title>Abu-</article-title>
          <string-name>
            <surname>Dabi</surname>
          </string-name>
          ,
          <source>United Arab Emirates</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1235</fpage>
          <lpage>1247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          , Y. Cheng,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-</article-title>
          <string-name>
            <surname>SQL</surname>
          </string-name>
          ,
          <source>Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          , Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>3215</fpage>
          <lpage>3229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>RAT-SQL</surname>
          </string-name>
          :
          <article-title>Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers</article-title>
          ,
          <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics ACL</source>
          <year>2020</year>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>7567</fpage>
          <lpage>7578</lpage>
          . doi: https://doi.org/10.18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>677</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>Improving Text-to-SQL with Schema Dependency Learning</article-title>
          , URL: https://arxiv.org/abs/2103.04399, doi: https://doi.org/10.48550/arXiv.2103.04399.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>