<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. M. Islam, D. Wei, B. Schieber, S. B. Roy, Satis- E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark,
fying complex top-k fairness constraints by prefer- C. Berner, S. McCandlish, A. Radford, I. Sutskever,
ence substitutions, Proc. VLDB Endow.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3514221.3517865</article-id>
      <title-group>
        <article-title>Diversification of Top-k LLM Results using Database Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thinh On</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Subhodeep Ghosh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mengnan Du</string-name>
          <email>mengnan.du@njit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Senjuti Basu Roy</string-name>
          <email>senjutib@njit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ChatGPT Response. Certainly! Here's a list of top 10</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>New Jersey Institute of Technology</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Fantasy: ”The Lord of the Rings: The Fellowship of</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Romance: ”Eternal Sunshine of the Spotless Mind”</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>16</volume>
      <issue>2022</issue>
      <fpage>2005</fpage>
      <lpage>2013</lpage>
      <abstract>
        <p>Result diversification aims to return relevant results that cover a variety of perspectives. Attribute-based diversification groups results by shared attributes (e.g., genre for movies) and selects a proportional number of items from each group based on their distribution in the underlying data. However, large language models (LLMs) are not designed to produce proportionally diverse results. In this work, we propose leveraging external data sources to determine the distribution of groups related to a query and prompt LLMs to produce proportionally diverse results. This can improve result diversity by representing groups in proportion to their prevalence. Specifically, we first argue the benefits of making top-k results from LLMs proportionally diverse. We then show how to use external benchmark databases to enable proportional diversity. Finally, we outline a framework that prompts LLMs with proportionality information from external data and discuss challenges in automating this process. Our approach provides a path to overcoming LLMs' limitations in producing proportionally diverse responses.</p>
      </abstract>
      <kwd-group>
        <kwd>top-k Diversification</kwd>
        <kwd>large language models (LLMs)</kwd>
        <kwd>prompting LLMs</kwd>
        <kwd>querying database</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The goal of result diversification is to return results that
are relevant as well as cover diferent intents. In the
data management community, returning top-k diverse
results of a query has been extensively studied, and there
exists many seminal works that propose objective
functions and eficient algorithms to return results that are
diverse and representative [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. In this work, we
study proportional diversity in top- , which amounts
to grouping items in the result set based on the query
condition that share common attribute values (e.g.,
different genres for movies) and selecting only a limited
number of items from each group that represent their
proportional distribution. Large language models (LLMs),
such as ChatGPT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Claude [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Alpaca [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Vicuna [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
etc, on the other hand, do not return results that are
proportionally diverse. In this work, we first argue the
benefits of making top-  results from LLMs
proportionally diverse. We then present how to leverage
external benchmark databases to enable proportional
diversity. Finally, we outline a computational framework that
prompts LLMs with proportionality information from
external data sources and discuss research challenges to
automate such an outcome.
      </p>
      <sec id="sec-1-1">
        <title>As a concrete example, a leading open source LLM</title>
        <p>Bases (VLDBW’23) — Workshop on LLMs and Databases (LLMDB’23),
the Ring” (2001) - The first installment of the epic
fantasy trilogy that follows a group of heroes on a
quest to save Middle-earth from darkness.
Actor
actor_id name
853453 Orson Welles
853454 Christian Bale
853455 Janet Leigh</p>
        <p>Gender
M
M
F</p>
        <p>Birth
country
US
UK
US</p>
        <p>Production Company
Prod company_id Name Owner
863453 RKO pictures David Sarnoff
886633445545 WPaaprrinacmeturorBeusrnots PWAicmtaNuruarnesteiesormGnBearrnolotuss.p</p>
        <p>Database</p>
        <p>Headquarter</p>
        <p>NYC
CA
CA</p>
        <p>SQL</p>
        <p>Award</p>
        <p>Award id
123
124
179</p>
        <p>AAAcnwawadmaaerrmeddy catAegwo3arryd_id
GGollodbeen 4
AEwdgaardr 1</p>
        <p>Award Details</p>
        <p>Category_id
3
4
1</p>
        <p>Category name
Original screen</p>
        <p>play
Best supporting</p>
        <p>actor
Best motion
picture</p>
        <p>
          • Thriller: ”Psycho” (1960) - A classic suspenseful of 25,000 Comedy movies have similar high ratings,
reiflm by Alfred Hitchcock that explores the chilling spectively. In proportion, therefore, it is desirable to see
story of a motel owner with a dark secret. 5 Drama, 3 Action, and 2 Comedy movies in the top-10
• Animated: ”Toy Story” (1995) - The groundbreaking results, which represents the preference of the IMDB
animated film that introduced audiences to a world users more appropriately in the returned LLM answers.
where toys come to life, combining humor, heart, Such idea of proportionate representation is explored in
and adventure. multiple recent works [
          <xref ref-type="bibr" rid="ref8">8, 9</xref>
          ] and bears close connection
• Mystery: ”Gone Girl” (2014) - A gripping thriller to make the results diverse and fair.
        </p>
        <p>that keeps audiences on the edge of their seats as a
man becomes the prime suspect in his wife’s disap- Problem Definition 1. Proportionally diversified
pearance. top-k Results. Given a user query  , integer  , user
speci• Historical: ”Schindler’s List” (1993) - A poignant ifed attribute  with ℓ domain size (on which the results
and harrowing portrayal of the Holocaust, based are to be diversified), a proportionality constraint defined
on the true story of Oskar Schindler, a German over a single attribute containing ℓ diferent groups  1,
lbiuvesisn.essman who saved over a thousand Jewish is2,. , toℓ,pr-eqrueisruelstst,hwathtehree ∑repℓr es=enta. tiGonenoefraelaizchinggrtohuips,  i f
proportionality is defined over a set  of diferent attributes
These represent a diverse range of genres and have with a required representation on each group of each
atachieved critical acclaim for their storytelling, perfor- tribute, a proportional top- result must simultaneously
mances, and impact on popular culture. It is easy to infer satisfy proportionate representation for all attributes in  .
that the LLM as is returns one movie per genre (i.e., the
proportional distribution of movies per genre is uniform).</p>
        <p>However, if an external data source is looked at (such 2. Proposed Framework
as, IMDB database), it could be seen that the top-rated The proposed framework is presented in Figure 1, which
movies (average rating 6.5 for example) exhibit diferent is motivated by ChatDB [10]. The user writes a query to
proportional distributions per genre. the LLM, the LLM connects with external databases to</p>
        <p>As a toy example, as shown in Figure 1, 10,000 out of retrieve count information, the produced query results
20,000 Drama movies have average ratings higher than are to converted in natural language texts to prompt
6.5, whereas, 5,000 out of 15,000 Action, and 5,000 out LLM, LLM produces final results and summarizes it. The
development of the framework thus requires solving the
following four fundamental tasks.
challenge is to finetune LLMs so that it could summarize
back to the user the reasoning behind the returned results.</p>
        <p>
          In the context of the running example, it may say “5 out
of 10 movies are drama, because 50% drama movies are
very highly rated,...”
A. Convert user query to LLMs to a series of SQL
queries. Given the user query (e.g., find top-10 movies
based on genre), an external data source (e.g., IMDB) is
looked at and a series of SQL queries are submitted. Using
the running example, the first query (step 1) retrieves 3. Preliminary Results
diferent movie genres and their respective counts that
are present in the database. The second, third, and fourth In this section, we evaluate the efectiveness of the
proqueries (corresponding to steps 2,3, and 4, respectively) posed framework as a proof of concept, demonstrating
query for each of the retrieved genre (in this example, its ability to prompt LLMs to generate answers with
prothese are, drama, action, and comedy) and finds the count portional diversity by linking to an external database.
of each kind which has average rating higher than some We first translate user intents into SQL statements to
threshold (in this case &gt; 6.5). In general, the goal is query information and calculate proportional statistics
to make a sequence of SQL queries that are required from the database. We then prompt the models with the
to produce count information of the groups of interests proportional information to produce diversified answers.
based on the query. The Text-to-SQL solutions [11, 12, For instance, a request for ”10 diverse movies by genre”
13] could be leveraged for that. Even though in this requires counting movies of each genre in the database,
preliminary work, we manually generate those queries. computing the proportion for each genre, and
normalizB. Compute proportion based on arbitrary query ing the proportions to the number of movies requested
condition. The next task is to produce proportion of (10 in this example). This process allows for proportional
each group in top- based on the count information re- representation of genres in the generated movie list.
trieved from the query engine. For that, we leverage
our recent research results [
          <xref ref-type="bibr" rid="ref8">8, 14</xref>
          ] that studies compu- 3.1. IMDb Dataset
tational challenges of computing proportional
representation. When the query is defined on a single attribute We use the IMDb dataset by Kaggle 1 for a case study
with ℓ diferent domains (e.g., using the running exam- of our framework. This dataset is a subset of a larger
ple, ℓ = 3). Using the running example, as shown in IMDb dataset which spans a comprehensive collection
Figure 1, 50% (10,000 out of 20,000) Drama movies have of movies over several decades. The dataset is formatted
average ratings higher than 6.5, whereas, 33% (5,000 out as a relational database with 3 tables, namely movieList,
of 15,000) Action, and 20% (5,000 out of 25,000) Comedy ratings, and regions which encompass movies in 25
difermovies have similar high ratings, respectively. Therefore, ent genres across 70 languages. We first convert source
  = .5+.3.53+.2 = 5, whereas,   = 3,   = 2. data files to comma-separated value (CSV) format which
Indeed, it could be seen that the external data-source has is readable by SQL. Next, the CSV files are imported to
a high proportion of highly rated drama movies compared MySQL by removing delimiters between values/records
to comedy movies, even though there are more comedy (e.g., comma, \n) while maintaining the original table
movies that drama movies in the results. These numbers schema (see Table 1 for the database schema). The
strucindicate that in the returned results drama should have ture of the dataset allows users to query and filter the
higher representation than comedies. However, com- data based on specific research questions, which fits the
puting proportion for arbitrarily complex condition is goals of our case study.
non-trivial. In [14], we prove that it is NP-hard just to
decide whether there exists a feasible solution that sat- 3.2. Implementation Details
isfies the proportionality requirement defined over 3 or
more attributes.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>To simplify the task that requires various tools, we unify</title>
        <p>MySQL and GPT-3.5-Turbo Model API developed by
OpeC. SQL to Text Transform. The next step of the process nAI 2 into a single Python interface. First, we link the
is to consume the proportion generated by the SQL results IMDb database to MySQL and establish connection
beand convert them in natural language like texts that LLMs tween MySQL and Python using pymysql library. As a
reunderstand. Using the running example, this is equiva- sult, SQL statements could be written directly in Python
lent to prompting the LLMS to return “ 5 drama movies, interface and the proportion could be computed from
3 action movies, and 2 comedy movie”. As before, there query outputs. Next, we summarize the proportion
inforexists the challenge of automatically translating com- mation into a prompt and feed into the GPT Model API
puted proportions to natural language texts. For which,
existing SQL-to-Text solutions could be used [15, 16].</p>
        <p>D. Finetune LLMs to Summarize Results. The final</p>
      </sec>
      <sec id="sec-1-3">
        <title>1https://www.kaggle.com/datasets/ashirwadsangwan/imdb-dataset 2https://platform.openai.com/docs/models/gpt-3-5</title>
        <sec id="sec-1-3-1">
          <title>Major attributes Description</title>
          <p>tconst
titleType
primaryTitle
originalTitle
genres
tconst
averageRating
numVotes
titleId
title
region
language
alphanumeric unique identifier of the title
type/format of the title (e.g., movie, tvseries, video, etc.)
the more popular title / the title used by the filmmakers on
promotional materials at the point of release
includes up to three genres associated with the title
La Haine (1995, French)
alphanumeric unique identifier of the title
weighted average of all the individual user ratings
number of votes the title has received
a tconst, an alphanumeric unique identifier of the title
the localized title
the region for this version of the title
the language of the title
3.3. Result Analysis</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Open Problems</title>
      <sec id="sec-2-1">
        <title>In this section, we discuss research challenges regarding</title>
        <p>achieving proportionally diverse top-k results from LLMs,
including automatically transform user queries into SQL
queries, automatically transform SQL results into LLM
prompts, and finetuning LLMs.</p>
      </sec>
      <sec id="sec-2-2">
        <title>There are several key challenges in automatically trans</title>
        <p>forming SQL queries and results into natural language
prompts for LLMs. First, it is challenging to map
numerical values, proportions, and counts retrieved from a
database into appropriate quantifiers and aggregates in</p>
        <sec id="sec-2-2-1">
          <title>Answers to summarized prompts</title>
          <p>The Dark Knight (2008, English)
Pulp Fiction (1994, English)
Amelie (2001, French)
La Haine (1995, French)
Blue is the Warmest Color (2013, French)
Life is Beautiful (1997, Italian)
Cinema Paradiso (1988, Italian)
La Dolce Vita (1960, Italian)
The Great Beauty (2013, Italian)
The Conformist (1970, Italian)</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2 English : 3 French : 5 Italian</title>
          <p>to be highly ranked, while omitting unnecessary details.
Third, we must preserve context between the original
user query, the intermediate SQL queries and results, and
the final LLM prompts and summaries. The end-to-end
system needs to retain the attributes, conditions, and
constraints specified in the initial user query so that the
ifnal LLM prompts and instructions are grounded and
relevant. Addressing these challenges will be the focus
of our future research.
4.3. Finetuning LLMs
The last research challenge lies in finetuning LLMs to
generate informative summaries. The LLMs should be
ifnetuned to produce summaries that not only present
the final results but also explain the reasoning behind
those results. To accomplish this, the LLMs need to be
trained on a large dataset of query-result pairs, where the
results are accompanied by human-generated summaries
or explanations. These summaries should capture the
key insights and patterns in the results, highlighting the
factors that influenced the distribution of groups.
However, creating high-quality summaries requires extensive
human expertise and efort. A potential approach is to
use a combination of human-generated and automatically
generated data by other LLMs to balance quality and
efifciency. During finetuning, the LLMs learn to generate
summaries that are concise, informative, and relevant
to the user’s query. The model should understand the
statistical information from the SQL queries and use it to
construct coherent explanations. For example, in the
running example, the LLM might incorporate information
about the high ratings of drama movies and the
comparative proportions of diferent genres to generate an
insightful summary. Lastly, to improve the quality of
the generated summaries, various techniques can be
employed, such as reinforcement learning. The model can
be rewarded based on the informativeness and coherence
of its generated summaries. By optimizing these rewards,
the LLM can learn to produce personalized, high-quality
summaries that explain the results to users.
5. Related Work
size, often leads to improved performance on downstream
tasks [19] and enhances the model’s ability to solve
various complex tasks. With their ever-increasing sizes,
popular language models such as PaLM [20], LLaMA
[21], Galactica [22], GPT-3 [23], and GPT-4 [24] have
achieved state-of-the-art performance on many tasks.
Consequently, they have motivated a profound shift in
Natural Language Processing (NLP) research towards
LLMs. For example, OpenAI released ChatGPT [25]
which leverages the GPT-3.5 architecture, capable of
understanding languages and engaging in meaningful
conversations across various topics. ChatGPT represents
the impact of LLMs throughout the community and
revolutionizes our understanding of NLP [26, 18, 27]. In
addition, these LLMs have made significant progress in
natural language processing and have enabled various
applications, such as coding assistants, search engines,
and dialogue systems.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Augmenting LLMs. Despite the rapid progress of LLMs,</title>
        <p>LLMs also sufer from some limitations, including
generating implausible predictions (hallucinations), requiring
massive scale and data to achieve good performance, and
struggling with continual learning [28, 29, 30]. To
address these issues, there is a growing research trend of
”augmenting” LLMs by providing them with additional
context beyond just their parameters and input tokens.
Two representative approaches are: 1) Increasing context
relevance by retrieving external info or employing
reasoning, which gives useful context with fewer parameters.
2) Allowing LLMs to use external tools and knowledge to
augment context, which adds missing information [28].
Recent research has proposed using databases for LLMs
which led to enhanced performance in multi-hop
reasoning. These works have demonstrated the possibility of
improving reasoning capabilities of LLMs by using
external memory modules. For example, ChatDB [10] is a
recent work that enables the use of real-time databases to
enhance multi-step reasoning capabilities of LLMs. The
ChatDB framework uses LLMs to transform user inputs
into a chain-of-memory (multi-step SQL instructions) that
manipulates an external database. The intermediate
results are summarized as a prompt which is fed to LLMs
to achieve final results.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Text-to-SQL. LLMs have become a reliable source of</title>
        <p>In this section, we review several lines of research that generating code, from common programming languages
are most closely relevant to our work. such as Python/Java to SQL statements for querying
Large Language Models (LLMs). LLMs have become databases [24]. However, parsing natural languages to
a prominent area of research that has garnered signif- SQL statements faces semantic and syntactic challenges:
icant attention in recent years. LLMs typically refer LLMs as parsers must capture semantics of correct
tato Transformer-based models with multi-head attention bles/columns from the database and generate
syntacti[17] embedded in deep neural networks and trained on cally valid SQL queries [31]. These requirements pose
large-scale corpora [18]. The development of LLMs has significant challenges in designing text-to-SQL models
been driven by observations that scaling Pre-trained Lan- to generalize across databases and user intents. To
overguage Models (PLMs), either in terms of model or data come these challenges, researchers proposed text-to-SQL
frameworks using encoder-decoder neural architectures, ment this approach for improving the diversity of movie
categorized into two parsing approaches: single-turn and query results according to genres. The preliminary
remulti-turn [32]. In single-turn parsing, the encoder gen- sults show the potential of this framework to mitigate
erates embeddings capturing natural language input and large language models’ tendency to return inadequately
table schema semantics. The decoder then generates SQL diverse responses. Moving forward, we plan to
invesstatements from the encodings [33, 34, 35, 36, 37, 38, 39]. tigate practical solutions to the automate the proposed
In multi-turn parsing, the encoder uses diferent encod- framework and evaluate their efectiveness on diverse
ing schemes to generate contextual and schema structure query types and domains.
embeddings. The decoder, an LSTM model with attention
mechanisms [40, 41], generates SQL queries using
current and previous hidden states. This enables capturing Acknowledgments
long-term input dependencies and generating
contextaware SQL queries [42, 43, 44, 45, 46].</p>
        <p>Finetuning LLMs. Finetuning involves modifying
parameters of a pre-trained model, LLM in our context,
using a smaller and task-specific dataset. The goal of
ifnetuning LLMs is to enhance pre-trained LLMs using
domain adaptation or human feedback, making LLMs
more relevant for specific tasks. There are two main
streams of finetuning methods for LLMs: instruction
tuning and reinforcement learning. Instruction tuning
involves supervised learning using instruction-formatted
instances, where each instance includes a task
description, an input-output pair, and optional demonstrations
of the task [47, 48, 22, 49]. For example, if the task is
text-to-SQL, the task description could be ”translate to
SQL statements” and an input-output pair includes a
natural language sentence as input and equivalent SQL
statements as output. The formatted instances can be
constructed from formatting existing datasets [50, 51] or
formatting human needs from real user queries [52] or
semi-automated augmentation approaches which feed
existing instances into LLMs to generate new task
descriptions and instances [53, 54, 55]. Reinforcement
learning methods, on the other hand, propose using human
feedback to make outputs of LLMs align with human
expectations [52, 55, 56]. Although alignment considers
human preferences to mitigate unexpected behaviors of
LLMs (e.g., hallucinations, misleading or biased answers),
low-quality feedback data may pose negative efects on
the general abilities of LLMs [57].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusion</title>
      <p>In this work, we present our initial directions on how
to make query results coming from LLMs more
representative to what external gold standard data sources
may provide. To that end, we present a framework that
queries external benchmark databases to determine the
proportional distribution of relevant attributes based on
the given query. This proportion information is then
used to prompt the large language model to return
results that match that distribution and cover a variety of
relevant perspectives. As a proof of concept, we
imple</p>
      <sec id="sec-3-1">
        <title>The work of Senjuti Basu Roy and Thinh On are sup</title>
        <p>ported by the National Science Foundation (CAREER
Award #1942913, IIS #2007935, IIS #1814595) and the
Ofice of Naval Research (Grants No, N000141812838,
N000142112966).
Y. Wu, S. M. Xie, M. Yasunaga, J. You, M. Zaharia, domain database with intermediate representation,
M. Zhang, T. Zhang, X. Zhang, Y. Zhang, L. Zheng, in: Proceedings of the 57th Annual Meeting of the
K. Zhou, P. Liang, On the opportunities and risks Association for Computational Linguistics,
Associof foundation models, 2022. arXiv:2108.07258. ation for Computational Linguistics, Florence, Italy,
[27] C. Zhou, Q. Li, C. Li, J. Yu, Y. Liu, G. Wang, K. Zhang, 2019, pp. 4524–4535. URL: https://aclanthology.org/
C. Ji, Q. Yan, L. He, H. Peng, J. Li, J. Wu, Z. Liu, P. Xie, P19-1444. doi:10.18653/v1/P19- 1444.
C. Xiong, J. Pei, P. S. Yu, L. Sun, A comprehensive [37] W. Hwang, J. Yim, S. Park, M. Seo, A
comprehensurvey on pretrained foundation models: A history sive exploration on wikisql with table-aware word
from bert to chatgpt, 2023. arXiv:2302.09419. contextualization, 2019. arXiv:1902.01069.
[28] G. Mialon, R. Dessì, M. Lomeli, C. Nalmpantis, [38] W. Lei, W. Wang, Z. Ma, T. Gan, W. Lu,
M.R. Pasunuru, R. Raileanu, B. Rozière, T. Schick, Y. Kan, T.-S. Chua, Re-examining the role of
J. Dwivedi-Yu, A. Celikyilmaz, et al., Augmented schema linking in text-to-SQL, in:
Proceedlanguage models: a survey, arXiv preprint ings of the 2020 Conference on Empirical
MetharXiv:2302.07842 (2023). ods in Natural Language Processing (EMNLP),
As[29] B. Peng, M. Galley, P. He, H. Cheng, Y. Xie, Y. Hu, sociation for Computational Linguistics, Online,
Q. Huang, L. Liden, Z. Yu, W. Chen, et al., Check 2020, pp. 6943–6954. URL: https://aclanthology.
your facts and try again: Improving large language org/2020.emnlp-main.564. doi:10.18653/v1/2020.
models with external knowledge and automated emnlp- main.564.</p>
        <p>feedback, arXiv preprint arXiv:2302.12813 (2023). [39] D. Choi, M. C. Shin, E. Kim, D. R. Shin,
RYAN[30] B. Xu, Z. Peng, B. Lei, S. Mukherjee, Y. Liu, D. Xu, SQL: Recursively applying sketch-based slot fillings
Rewoo: Decoupling reasoning from observations for complex text-to-SQL in cross-domain databases,
for eficient augmented language models, arXiv Computational Linguistics 47 (2021) 309–332. URL:
preprint arXiv:2305.18323 (2023). https://aclanthology.org/2021.cl-2.12. doi:10.1162/
[31] P. Glenn, P. P. Dakle, P. Raghavan, Correcting se- coli_a_00403.</p>
        <p>mantic parses with natural language through dy- [40] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert:
namic schema encoding, 2023. arXiv:2305.19974. Pre-training of deep bidirectional transformers for
[32] B. Qin, B. Hui, L. Wang, M. Yang, J. Li, B. Li, R. Geng, language understanding, 2019. arXiv:1810.04805.</p>
        <p>R. Cao, J. Sun, L. Si, F. Huang, Y. Li, A survey on [41] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,
text-to-sql parsing: Concepts, methods, and future O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
directions, 2022. arXiv:2208.13629. Roberta: A robustly optimized bert pretraining
ap[33] V. Zhong, C. Xiong, R. Socher, Seq2SQL: Gener- proach, 2019. arXiv:1907.11692.
ating structured queries from natural language [42] Q. Liu, B. Chen, J. Guo, J.-G. Lou, B. Zhou, D. Zhang,
using reinforcement learning, 2018. URL: https: How far are we from efective context modeling? an
//openreview.net/forum?id=Syx6bz-Ab. exploratory study on semantic parsing in context,
[34] T. Yu, Z. Li, Z. Zhang, R. Zhang, D. Radev, Type- 2020. arXiv:2002.00652.</p>
        <p>SQL: Knowledge-based type-aware neural text-to- [43] R. Zhang, T. Yu, H. Y. Er, S. Shim, E. Xue, X. V. Lin,
SQL generation, in: Proceedings of the 2018 Con- T. Shi, C. Xiong, R. Socher, D. Radev, Editing-based
ference of the North American Chapter of the sql query generation for cross-domain
contextAssociation for Computational Linguistics: Hu- dependent questions, 2019. arXiv:1909.00786.
man Language Technologies, Volume 2 (Short Pa- [44] P. Jain, M. Lapata, Memory-based
semanpers), Association for Computational Linguistics, tic parsing, Transactions of the
AssociaNew Orleans, Louisiana, 2018, pp. 588–594. URL: tion for Computational Linguistics 9 (2021)
https://aclanthology.org/N18-2093. doi:10.18653/ 1197–1212. URL: https://aclanthology.org/2021.
v1/N18- 2093. tacl-1.71. doi:10.1162/tacl_a_00422.
[35] T. Yu, M. Yasunaga, K. Yang, R. Zhang, D. Wang, [45] B. Wang, R. Shin, X. Liu, O. Polozov, M.
RichardZ. Li, D. Radev, SyntaxSQLNet: Syntax tree net- son, Rat-sql: Relation-aware schema
encodworks for complex and cross-domain text-to-SQL ing and linking for text-to-sql parsers, 2021.
task, in: Proceedings of the 2018 Conference arXiv:1911.04942.
on Empirical Methods in Natural Language Pro- [46] Y. Zheng, H. Wang, B. Dong, X. Wang, C. Li,
cessing, Association for Computational Linguis- HIE-SQL: History information enhanced network
tics, Brussels, Belgium, 2018, pp. 1653–1663. URL: for context-dependent text-to-SQL semantic
parshttps://aclanthology.org/D18-1193. doi:10.18653/ ing, in: Findings of the Association for
Comv1/D18- 1193. putational Linguistics: ACL 2022, Association
[36] J. Guo, Z. Zhan, Y. Gao, Y. Xiao, J.-G. Lou, T. Liu, for Computational Linguistics, Dublin, Ireland,
D. Zhang, Towards complex text-to-SQL in cross- 2022, pp. 2997–3007. URL: https://aclanthology.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gollapudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halverson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ieong</surname>
          </string-name>
          ,
          <article-title>Diversifying search results</article-title>
          ,
          <source>in: Proceedings of the second ACM international conference on web search and data mining</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>Linear submodular bandits and their application to diversified retrieval</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>24</volume>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gollapudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>An axiomatic approach for result diversification</article-title>
          ,
          <source>in: Proceedings of the 18th international conference on World wide web</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>381</fpage>
          -
          <lpage>390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bubeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chandrasekaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Eldan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          , E. Kamar,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          , et al.,
          <source>Sparks of artificial general intelligence: Early experiments with gpt-4</source>
          , arXiv preprint arXiv:
          <volume>2303</volume>
          .12712 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] AnthropicAI, Introducing claude,
          <year>2023</year>
          . URL: https: //www.anthropic.com/index/introducing-claude.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Taori</surname>
          </string-name>
          , I. Gulrajani,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dubois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <article-title>Alpaca: A strong, replicable instruction-following model</article-title>
          , Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/
          <year>2023</year>
          /03/13/alpaca. html
          <volume>3</volume>
          (
          <year>2023</year>
          )
          <article-title>7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.-L.</given-names>
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Zheng,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Stoica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. P.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <surname>Vicuna:</surname>
          </string-name>
          <article-title>An opensource chatbot impressing gpt-4 with 90%* chatgpt quality</article-title>
          ,
          <year>2023</year>
          . URL: https://lmsys.org/blog/ 2023-03-30-vicuna/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Islam</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Schieber</surname>
            ,
            <given-names>S. Basu</given-names>
          </string-name>
          <string-name>
            <surname>Roy</surname>
          </string-name>
          ,
          <article-title>Rank aggregation with proportionate fairness</article-title>
          ,
          <source>in: Proceedings of the 2022 International Conference on Management of Data, SIGMOD '22</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>