<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Prompting LLMs in Italian language for Text-to-SQL translation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Federico Ranald</string-name>
          <email>federico.ranaldi@alumni.uniroma2.e</email>
          <email>leonardo.ranaldi@idiap.c</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Sofia Ruzzett</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LeonardoRanald</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>DavideVenditt</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CristinaGiannone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>AndreaFavall</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>RanieroRomagnol</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio MassimoZanzott</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Almawave S.p.A., Via di Casal Boccone</institution>
          ,
          <addr-line>188-190 00137, Rome, IT</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Commons License Attribution 4.0 International</institution>
          ,
          <addr-line>CC BY 4.0</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Idiap Research Institute</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Text-to-SQL</institution>
          ,
          <addr-line>It-LLMs, prompt, zero-shot, Natural Language Processing, Natural Language Query, Natural Language Under-</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Rome Tor Vergata</institution>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Fine-tuning Large Language Models (LLMs) on tasks with instructions has demonstrated potential in boosting zero-shot generalization to unseen tasks. Inspired by studies on the reasoning skills of Instruction-tuned LLMs (It-LLMs), we investigate reading-comprehension, reasoning, and production over symbolic tasks. In particular, we propose an iterative readingcomprehension and reasoning approach to solve question-answering tasks based on structured data, i.e., Text-to-SQL task. In our approach, we define a specialized procedure to provide the relevant evidence from structured data and natural language queries in order to stimulate the It-LLMs to focus on the production task and reasoning. Hence, we propose a prompting generation procedure to allow It-LLMs to reason about the structural information and natural language queries and produce symbolic output, i.e., the SQL queries. Extensive experiments, in zero-shot scenarios, with diferent types of structured data, demonstrate the superhuman abilities of It-LLMs in comprehension and production astonishing answers. However, hallucinations and misleading answers are also produced; this still shows the shortcomings of the instructed LLMs and, thus, their partial unreliability.</p>
      </abstract>
      <kwd-group>
        <kwd>standing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>ceur-ws.org
1. Introduction
shown to overcome the limits of zero-shot performance,
CEUR
htp:/ceur-ws.org
ISN1613-073
© 2023 Copyright for this paper by its authors. Use permitted under Creative</p>
      <p>CEUR Workshop Proceedings C(EUR-WS.org)
version. Then, we define a specialized procedure to
provide the relevant evidence from structured data and query
the It-LLMs in natural language. In this way, we direct
the models to focus on understanding the prompt,
reasoning based on the information provided, and producing
the output, the SQL code that solves Text-to-SQL task.
ent types of structured data demonstrate the remarkable
abilities of It-LLMs in understanding and producing
astonishing responses in the presence of various levels of
information. However, we have observed errors as the2.3. Text-to-SQL task
information given to It-LLMs decreases. The results of</p>
      <p>The ability to translate natural language queries into SQL
the zero-shot scenarios still show shortcomings of the</p>
      <p>or other ontological formal languag27e,s2[8] is a
valuIt-LLMs and, thus, their partial unreliability when the</p>
      <p>able tool because it allows one to interact with databases
harder queries and less informative databases are
consid</p>
      <p>using a natural language without having to learn SQL.
ered. There are several approaches to the problem of
translation from natural language to SQL. The earliest methods
2. Background &amp; Related Works were totally rule-base2d9[, 30]; later, with the arrival of
statistical learners, a common approach became learning
2.1. Large Language Models the mapping between SQL queries and command1s5][.</p>
      <p>Database schema and queries, rich in terms of
relationBrown et al.,2[] with GPT3 were the forerunners of theships, are often encoded in graphs – and processed by
many Large Language Models (LLMs). Among the welgl-raph neural networks31[] or self-attention mechanisms
famous LLMs are OPT [16], FLAN [17], and LLaMA [18]. [32] – or translated into intermediate representations
Compared to the smaller language models, LLMs ha[v3e3]. Recently, the Text-to-SQL task has been interpreted
several emergent abilitie1s9][, including zero-shot multi-as a sequence-to-sequence, and transformer-based
modtask solving6[] and few-shot in-context learning witehls are applied3[4, 35]. However, a critical aspect is the
chain-of-thought reasonin2g0][. amount of input information, i.e., database schemas and
relationships encoding. In this paper, we move forward
2.2. Instruction-tuned LLMs and propose a new Text-to-SQL approach by exploiting
the potential of It-LLMs models. In particular, after an
LLMs generate texts following certain formats andeixnt-ensive prompt-tuning phase, we analyze two It-LLMs
structions from examples in their prompts. Ouyang aml.o,dels’ reasoning and generalization abilities in
solv[5] trained GPT3 with instruction-response corpora tinog the Text-to-SQL task with less informative database
make LLMs more scalable and improve zero-shot perforr-epresentations and harder queries. Our contribution is
mance. As a result, InstructGPT, ChatGPT, and GPTu4nafected by LLMs’ prior knowledge after pre-training
perform well on a wide range of tasks without seeinags we test a collection of definitely unseen databases.
any examples. Recent research has also found that
GPTgenerated instructions and outputs to follow instructions
[21] can improve LLMs’ ability to follow instructio3ns.. Methods
Wang et al.2,[2] proposed a semi-supervised method to
generate diferent instructions from an NLP task-baseIdn order to test the reading-comprehension abilities
seed instruction7[]. However, these models are not fullyof Instruction-tuned Large Language Models (It-LLMs)
open-source, and it is often possible to use them for freein the Text-to-SQL translation task, we organized the
as black-boxes2[3]. Recent open-sourcing eforts include prompting phase into two parts. In the first phase, we
several competitive model2s4[,25] but cannot match thedefined diferent prompts for studying how the presence
performance of closed-source models26[]. of Structural Information and data afects the behavior of
models (Section3.1). In the second phase, we defined
pos</p>
    </sec>
    <sec id="sec-2">
      <title>3.1. Prompting Structural Information</title>
    </sec>
    <sec id="sec-3">
      <title>4.1. Datasets</title>
    </sec>
    <sec id="sec-4">
      <title>3.2. Prompting Natural Language Query</title>
      <sec id="sec-4-1">
        <title>In order to analyze the generalization abilities, we have</title>
        <p>Regarding the Natural Language Query (NLQ), i.e., thfeed dumps of three SQL databases that are definitely
unqueries we wish to translate SQL, inspired by the work osefen, thus not found on the Web, and never seen in the
[36], we considered three hardness-levels: easy, mediump,re-training corpora of Large Language Models.
Moreand hard. A given NLQ is assigned to a certain levoevler, databases difer in topic, topology, and size as shown
if the best corresponding SQL translation has speciificn Table1.
hardness characteristics. The hardness-levels are defined
as follows: 4.2. Experimental Settings
1. EASY: values are selected only from one tablBeehind describing the data (Secti4o.1n) and prompting
(there is no join). methodologies (Sectio3n), we tested our proposals on
2. MEDIUM: values are selected by joining two taG-PT-3.5 and Claude Instant. Hence, we provided
Strucbles. tural Information, defined in Sectio3n.1, in three
difer3. HARD: values are selected by joining more thaennt ways, in each of which we requested the translation
two tables. of four Natural Language Queries (NLQ) for each
hardFurthermore, in all levels, an arbitrary number of conndei-ss level. We conducted experiments on three diferent
tions is allowed, and aggregation functions are includdeadt.abases to study phenomena in diferent scenarios. The
NLQs were in Italian and, as described in Sect3i.o2nwere
of the type:”Traduci in sql la seguente query
’nomi,cog3.3. Prompting Phase nomi,età degli utenti...ordinati per età’”.</p>
      </sec>
      <sec id="sec-4-2">
        <title>We conducted the Text-to-SQL task using two It-LLMs:</title>
        <p>GPT-3.5 [37] and Claude-instant38[]. In a zero-shot 5. Results &amp; Discussion
scenario, we considered the three diferent approaches
(as described in Section3.1), behind which we asked the 5.1. The reading-comprehension
models to translate a small number of NLQ per
hardnesslevel on three diferent databases. In particular, except for Challenge
feeding the SQL dump of the database as input, requesItts-LLMs are amazing understanders; in fact, in presence
such as”Traduci la seguente query NL in SQL” were made of structured information, they perform very well in
overwithout any further prompt engineering steps. coming complex challenges and generating good
translations from Text-to-SQL. In Tabl2ewe can observe that
4. Experiments both GPT-3.5 and Claude Instant perform very well in
theSOLO∼SCHEMA approach. In particular, both
GPTIn order to observe the real abilities of Intructio3.n5-and Claude Instant produce an accurate translation
tuned Large Language Models (It-LLMs) in readingfo-r all the EASY queries. Moreover, Claude Instant
procomprehension on heterogeneous inputs and the redau-ces very good results on average also on the MEDIUM
queries. Hence, the It-LLMs showed good abilities in
soning abilities behind output generation, we selected a
comprehending natural language and the structural iInn- fact, we can observe that as we degrade the
strucformation of databases in SQL language. tural information of the database by removing vocals
from the table and attribute namUeGsL(Y∼SCHEMA),
5.2. The reasoning-generation Challenge the models tend to make errors with a high frequency.</p>
        <p>Looking at Tabl2e, GPT-3.5 and Claude instant
perforThe It-LLMs’ reasoning and SQL query generation skillmsances deteriorate at all hardness levels. Moreover,
GPTare strongly related to the quality of the queries. Ind3e.e5da,lways fails to translate HARD queries. This means
the It-LLMs could generate intriguing output eventihnat both models find it more challenging to understand
zero-shot and low-resource scenarios (with limited struwch-at is asked in the NL query and to reason over the
tural information). However, they could not generadtaetabase structure with deteriorated names.
exhaustive translations when the types of SQL queriesHowever, some points can be recovered by
providrequired were hard. In fact, in Tab2le,it is possible to ob- ing the database with a small amount of real data
serve a marked decrease in thSeOLO∼SCHEMA rows of (UGLY&amp;INSERT). This phenomenon can be observed by
theHARD columns compared to thEeASY andMEDIUM noting that the TOT obtained inUtGhLeY &amp; INSERT
apcolumns. In particular, for DB3 queries, performancesproach never worsens compared to thUeGLY∼SCHEMA,
fall by half, or worse, going from EASY level to HARDr.egardless of the hardness level of the queries.</p>
        <p>Hence, we can conclude that degrading information
5.3. Efects of degradation of structural quality has negative efects on both models, afecting the
information reliability of their reasoning skills.</p>
        <p>Finally, we want to quantify how model performance
Both the reading-comprehension and reasoningis- afected by the amount of information available on a
generation abilities of It-LLMs are negatively afected dbaytabase compared to the amount of information needed
degrading database information. to efectively resolve queries. We hence define this
quanin the database in question. In Figu3,rewe can observe
the efect of diferent approaches on the number of errors
in the two cases.</p>
        <p>As expected, as the information available to a system
decreases, the number osfemantic errors tends to
increase. We can observe that both GPT-3.5 (Fig3uar)eand
Claude Instant (Figu3reb) tend to make a limited number
of semantic errors in theSOLO∼SCHEMA approach,
while theUGLY∼SCHEMA approach leads to the largest
number of errors. We can observe that tUhGeLY &amp;
INSERT approach, with a limited set of realistic data, seems
to reduce the number osfemantic errors.
(a) GPT-3.5 (b) Claude Instant On the other hand, the trend in the number of
syntactic errors is diferent between the two
modeFrigruorres f3o:r GNPuTm-3b.5eranofd sCelmaaundteiIcnsetrarnotrascraonssdaspypnrtoaacchtiesc, els. In GPT-3.5, the decrease in the informativeness of
ordered from most informative to least informative. the dumps leads to more errors. Manual inspection found
that only one error was due to incorrect use of SQL
syntax: in most cases, GPT-3.5 has dificulty identifying the
tables and columns to be used in the given database and
tity of information aInsformation Level  . We define  as therefore proposes SQL queries that make use of
arbifollows:  =  trary tables. In this case, thseysnetactic errors are
ℎ definitely examples of hallucinations and need to be
furwhere  is the Approach score anℎd is Hardness Score. ther explored. Claude Instant, instead, tends to retain
The Approach Score assigns a score to each approachm,ore information about the dump, and the number of
ranging from1 to 2: the highest valu2eis assigned syntactic errors is more constant across the diferent
to theSOLO∼SCHEMA approach and the lowes1tto approaches.</p>
        <p>UGLY∼SCHEMA. The UGLY &amp; INSERT approach is
assigned an intermediate score 1o.f5. To calculate the6. Conclusion
Information Level we smooth this information with the
actual hardness of the query that is assigned with Itnhethis paper, we propose an iterative
readingHardness Score ℎ : it ranges from1 (for the EASY level) comprehension and reasoning approach to solve
to3 (for the HARD level). question-answering challenges of the Text-to-SQL task.</p>
        <p>As shown in Figure2, GPT-3.5 and Claude Instant per-The results obtained from the experiments conducted
formances correlate with tIhneformation Level. For GPT- in this work witness the potential of Instruction-tuned
3.5 (Figure 2a), a large Pearson correlation coeficient Large Language Models (It-LLMs).However, despite
(0.88) is observed, which is statistically significant withthaeir promising performance, certain limitations have
 value of0.001. Claude Instant performance (Figu2rbe) emerged. We discovered that even with minimal
inforis still positively correlated withIntfohremation Level, mation about the database, It-LLMs can generate
natualthough the Pearson correlation coeficient is low0e.5r)( ral language query translations that yield correct and
and has a higher value 0(.1). executable SQL queries by just prompting them.
Nevertheless, it became evident that reducing the amount
5.4. Errors Analysis of information provided could lead to the generation of
incorrect queries. Expanding the scope of our
investigaIn this section, we focus on the characterization ofteior-n, we believe it would be worthwhile to conduct
simirors that are made by the analyzed models. We ilna-r experiments with other It-LLMs. Such comparisons
vestigate two types of errorss:emantic errors and could help determine whether the common phenomena
syntactic errors. The semantic errors are queries observed in both tested models result from a coincidence
mistranslated by the system that, if executed, resulotr irnepresent aspects to further investigate in studying
the selection of information other than what wastihneis-e new technologies.
tially requested in natural language. On the other ha nInd,conclusion, this research underscores the substantial
syntactic errors are errors that make the query notadvancements ofered by It-LLMs in the realm of
Text-toexecutable by an engine: these queries are characterizSeQdL translation while also the implications of choosing
by incorrect use of SQL syntax (e.g., they contain a fieldwhether to provide more or less information during the
in thehaving statement that is not present insetlhecet) prompting process.
or contain references to tables and fields that do not exist</p>
        <p>arXiv:2203.02155.
[6] V. Sanh, A. Webson, C. Rafel, S. H. Bach,</p>
        <p>L. Sutawika, Z. Alyafeai, A. Chafin, A. Stiegler, T. L.</p>
        <p>
          Scao, A. Raja, M. Dey, M. S. Bari, C. Xu, U. Thakker, [11] Y. Bang, S. Cahyawijaya, N. Lee, W. Dai, D. Su,
S. S. Sharma, E. Szczechla, T. Kim, G. Chhablani, B. Wilie, H. Lovenia, Z. Ji, T. Yu, W. Chung, Q. V.
N. Nayak, D. Datta, J. Chang, M. T.-J. Jiang, H. Wang, Do, Y. Xu, P. Fung, A multitask, multilingual,
multiM. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Baw- modal evaluation of chatgpt on reasoning,
halluciden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. San- nation, and interactivity, 20a2r3X.iv:2302.04023.
tilli, T. Fevry, J. A. Fries, R. Teehan, T. Bers, S. Bi-[12] Y. Zhou, A. I. Muresanu, Z. Han, K. Paster, S. Pitis,
derman, L. Gao, T. Wolf, A. M. Rush, Multitask H. Chan, J. Ba, Large language models are
humanprompted training enables zero-shot task gener
          <xref ref-type="bibr" rid="ref14">al- level prompt engineers (2022</xref>
          a).rXiv:2211.01910.
iz
          <xref ref-type="bibr" rid="ref14">ation, 2022</xref>
          .arXiv:2110.08207. [13] J. Jang, S. Ye, M. Seo, Can large language
mod[7] Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, els truly understand prompts? a case study with
neg
          <xref ref-type="bibr" rid="ref14">ated prompts, 2022</xref>
          a.rXiv:2209.12711.
[14] S. Arora, A. Narayan, M. F. Chen, L. Orr, N. Guha, H. Liu, P. Abbeel, S. Levine, D. Song, The
K. Bhatia, I. Chami, F. Sala, C. Ré, Ask me anything: false promise of imitating proprietary llms, 2023.
A simple strategy for prompting language models, arXiv:2305.15717.
        </p>
        <p>
          2022. arXiv:2210.02441. [27] P. Atzeni, R. Basili, D. Hansen, P. Missier, P. Paggio,
[15] T. Wolfson, D. Deutch, J. Berant, Weakly super- M. Pazienza, F. Zanzotto, Ontology-based question
vised text-to-SQL parsing through question de- answering in a Federation of University Sites: The
composition, in: Findings of the Association MOSES case study, Lecture Notes in Computer
for Comput
          <xref ref-type="bibr" rid="ref14">ational Linguistics: NAACL 2022</xref>
          , As- Science
          <xref ref-type="bibr" rid="ref11 ref24">(including subseries Lecture Notes in
sociation for Computational Linguistics, Seattle, Artificial Intelligence and Lecture Notes in
BioinUnited States, 2022, pp. 2528–2542. URLh: ttps: formatics)</xref>
          (2004). URL:https://www.scopus.com/
//
          <xref ref-type="bibr" rid="ref12 ref14">aclanthology.org/2022</xref>
          .findings-naacl..1d9o3i:10. inward/record.uri?eid=2-s2.0-35048854325&amp;doi=
18653/
          <xref ref-type="bibr" rid="ref13">v1/2022</xref>
          .findings-naacl.193. 10.1007%2f978-3-540-27779-8_40&amp;partnerID=
[16] S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, 40&amp;md5=7545b9abe40e6ac9d64b47d45e71b78c.
        </p>
        <p>
          S. Chen, C. Dewan, M. Diab, X. Li, X. V. Lin, T. Mi- doi:10.1007/978-3-540-27779-8_40.
haylov, M. Ott, S. Shleifer, K. Shuster, D. Simig[2,8] R. Basili, D. H. Hansen, P. Paggio, M. T. Pazienza,
P. S. Koura, A. Sridhar, T. Wang, L. Zettlemoyer, F. M. Zanzotto, Ontological resources and question
Opt: Open pre-trained transformer language mod- answering, in:
          <xref ref-type="bibr" rid="ref23">Proceedings of the Workshop on
els, 2022</xref>
          .arXiv:2205.01068. Pragmatics of Question Answering at HLT-NAACL
[17] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. 2004, Association for Computational Linguistics,
Yu, B. Lester, N. Du, A. M. Dai, Q. V. Le, Fine- Boston, Massachusetts, USA, 2004, pp. 78–84. URL:
tuned language models
          <xref ref-type="bibr" rid="ref14">are zero-shot learners, 2022</xref>
          . https://aclanthology.org/W04-2.510
arXiv:2109.01652. [29] F. Li, H. V. Jagadish, Constructing an interactive
nat[18] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. ural language interface for relational databases,
ProLachaux, T. Lacroix, B. Rozière, N. Goyal, E. Ham- ceedings of the VLDB Endowment 8 (2014) 73–84.
bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, [30] T. Mahmud, K. M. Hasan, M. Ahmed, T. Chak,
G. Lample, Llama: Open and eficient foundation A rule based approach for nlp based query
pro
          <xref ref-type="bibr" rid="ref1">language models, 2023</xref>
          a.rXiv:2302.13971. cessing, 2015, pp. 78–82. doi1:0.1109/EICT.2015.
[19] J. Wei, Y. Tay, R. Bommasani, C. Rafel, B. Zoph, 7391926.
        </p>
        <p>
          S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou,[31] B. Bogin, J. Berant, M. Gardner, Representing
D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, schema structure with graph neural networks for
P. Liang, J. Dean, W. Fedus, Emergent abilities of text-to-SQL parsing, in: Proceedings of the 57th
Anlarge language models, 202a2r. Xiv:2206.07682. nual Meeting of the Association for Computational
[20] J. Wei, X. Wang, D. Schuurmans, M. Bosma, Linguistics, Association for Computational
LinB. Ichter, F. Xia, E. Chi, Q. Le, D. Zhou, Chain-of- guistics, Florence, Italy, 2019, pp. 4560–4565. URL:
thought prompting elicits reasoning in large lan- https://aclanthology.org/P19-1.4d4o8i:10.18653/
guage models, 2023.arXiv:2201.11903. v1/P19-1448.
[21] B. Peng, C. Li, P. He, M. Galley, J. Gao, Instruction[32] B. Wang, R. Shin, X. Liu, O. Polozov, M.
Richardtuning with gpt-4, 202a3r.Xiv:2304.03277. son, RAT-SQL: Relation-aware schema encoding
[22] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, and linking for text-to-SQL parsers, in: Proceedings
S. Narang, A. Chowdhery, D. Zhou, Self-consistency of the 58th Annual Meeting of the Association for
improves chain of thought reasoning in language Computational Linguistics, Association for
Compumodels, 2023.arXiv:2203.11171. tational Linguistics, Online, 2020, pp. 7567–7578.
[23] Z. Lin, S. Trivedi, J. Sun, Generating with confi- URL: https://aclanthology.org/2020.acl-mai n..677
dence: Uncertainty quantification for black-box doi:10.18653/v1/2020.acl-main.677.
large language models, 202a3r. Xiv:2305.19187. [33] I. Sucameli, A. Bondielli, L. Passaro, E. Annunziata,
[24] R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, G. Lucherini, A. Romei, A. Lenci, Mate, a meta
C. Guestrin, P. Liang, T. B. Hashimoto, Stanford al- layer between natural language and database, in:
paca: An instruction-following llama mohdtetlp,s: Proceedings of the Sixth Workshop on Natural
Lan//github.com/tatsu-lab/stanford_al,p2a0c2a3. guage for Artificial Intelligence (NL4
          <xref ref-type="bibr" rid="ref14">AI 2022</xref>
          )
co[25] W.-L. Chiang, Z. Li, Z. Lin, Y. Sheng, Z. Wu, located with 21th International Conference of the
H. Zhang, L. Zheng, S. Zhuang, Y. Zhuang, J. E. Italian Association for Artificial Intelligence (AI* IA
Gonzalez, I. Stoica, E. P. Xing, Vicun
          <xref ref-type="bibr" rid="ref14">a: An open- 2022</xref>
          ), 2022.
source chatbot impressing gpt-4 with 90%* chatg[p3t4] T. Scholak, N. Schucher, D. Bahdanau,
PIquality, 2023. URLh:ttps://vicuna.lmsys.o.rg CARD: Parsing incrementally for constrained
[26] A. Gudibande, E. Wallace, C. Snell, X. Geng, auto-regressive decoding from language models,
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>language models with self-generated instructions, This work was conducted within the DATALAKE Gius- 2023</article-title>
          . arXiv:
          <volume>2212</volume>
          .10560.
          <article-title>tizia project; we acknowledge the partners and the scie</article-title>
          [n8]
          <string-name>
            <surname>- Y. K. Dwivedi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Kshetri</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hughes</surname>
            ,
            <given-names>E. L.</given-names>
          </string-name>
          <article-title>tific committee for their support</article-title>
          . Slade,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jeyaraj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Kar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Baabdullah</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>ishnan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Barlette</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Basu</surname>
            ,
            <given-names>I. Bose</given-names>
          </string-name>
          , L. Brooks, [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buhalis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Carter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chowdhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Crick</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. W.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>transformer</surname>
          </string-name>
          ,
          <year>2020</year>
          .arXiv:
          <year>1910</year>
          .10683.
          <string-name>
            <surname>Edwards</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Flavián</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Gauld</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Grover</surname>
          </string-name>
          , M.-C. [2]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            , Hu,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Janssen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
          </string-name>
          , I. Junglas, S. Khorana,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <year>2020</year>
          . arXiv:
          <year>2005</year>
          .14165. W. van der Aalst, V. Venkatesh, G. Viglia,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wade</surname>
          </string-name>
          , [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Golding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hoppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Walton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wirtz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wright</surname>
          </string-name>
          , Opinion paper:
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Presser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leahy</surname>
          </string-name>
          ,
          <article-title>The pile: An 800gb dataset perspectives on opportunities, challenges and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>of diverse text for language modeling, 2020. implications of generative conversational ai for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>arXiv:2101</source>
          .00027. research, practice and policy,
          <source>International Journal</source>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Khashabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Baral</surname>
          </string-name>
          , H. Hajishirzi,
          <source>of Information Management</source>
          <volume>71</volume>
          (
          <year>2023</year>
          )
          <article-title>102642</article-title>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>crowdsourcing instructions</article-title>
          ,
          <source>in: Proceedings of pii/S0268401223000233</source>
          . doi:https://doi.org/10.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>the 60th Annual Meeting of the Association for 1016/j</article-title>
          .ijinfomgt.
          <year>2023</year>
          .
          <volume>102642</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Computational</given-names>
            <surname>Linguistics</surname>
          </string-name>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Pa</surname>
          </string-name>
          [-9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Dublin</surname>
          </string-name>
          , Ireland,
          <year>2022</year>
          , pp.
          <fpage>3470</fpage>
          -
          <lpage>3487</lpage>
          . URLh:ttps:/
          <article-title>/ ple knowledge encoder for enhancing the</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          aclanthology.org/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <year>2d4o4i</year>
          :
          <fpage>10</fpage>
          .18653/
          <article-title>commonsense reasoning capacity of pre-trained</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          v1/
          <year>2022</year>
          .
          <article-title>acl-long.244. models</article-title>
          , in: Findings of the Association for [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Wain- Computational</surname>
          </string-name>
          <string-name>
            <surname>Linguistics: NAACL</surname>
          </string-name>
          <year>2022</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schulman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kelton</surname>
          </string-name>
          , L. Miller, tle, United States,
          <year>2022</year>
          , pp.
          <fpage>1730</fpage>
          -
          <lpage>1741</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Simens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Welinder</surname>
          </string-name>
          , P. Christiano, https://aclanthology.org/
          <year>2022</year>
          .findings-naac.
          <source>l.131</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Leike</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lowe</surname>
          </string-name>
          , Training language models to doi:10.18653/v1/
          <year>2022</year>
          .findings-naacl.
          <volume>131</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>follow instructions with human feedback</article-title>
          ,
          <volume>202</volume>
          [
          <fpage>21</fpage>
          .0]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dwivedi-Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dessì</surname>
          </string-name>
          , R. Raileanu,
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>to use tools, 2023a</article-title>
          .rXiv:
          <volume>2302</volume>
          .
          <fpage>04761</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>in: Proceedings of the 2021 Conference on Em-</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <year>2021</year>
          , pp.
          <fpage>9895</fpage>
          -
          <lpage>9901</lpage>
          . URL: https://aclanthology.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          org/
          <year>2021</year>
          .emnlp-main.
          <volume>77</volume>
          .9doi:
          <fpage>10</fpage>
          .18653/v1/
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          emnlp-main.
          <volume>779</volume>
          . [35]
          <string-name>
            <given-names>T.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhong</surname>
          </string-name>
          , T. Scholak,
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>Proceedings of the 2022 Conference on Empirical</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>United</given-names>
            <surname>Arab Emirates</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>602</fpage>
          -
          <lpage>631</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          https://aclanthology.org/
          <year>2022</year>
          .emnlp-mai.n.
          <volume>39</volume>
          [36]
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>ings of the 2018 Conference on Empirical Meth-</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <year>2018</year>
          , pp.
          <fpage>3911</fpage>
          -
          <lpage>3921</lpage>
          . URL: https://aclanthology.org/
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <fpage>D18</fpage>
          -
          <lpage>1425</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D18</fpage>
          -1425. [37]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Chatgpt,
          <year>2022</year>
          . URLh:ttps://chat.openai.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          com/. [38]
          <string-name>
            <surname>Anthropic</surname>
          </string-name>
          , Claude-instant,
          <year>2022</year>
          . URhLt:tps://poe.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>