<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>In-Context LLM Query Execution on Textual Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dario Satriani</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enzo Veltri</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donatello Santoro</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Papotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EURECOM</institution>
          ,
          <addr-line>Biot</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli Studi della Basilicata (UNIBAS)</institution>
          ,
          <addr-line>Potenza</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>33</volume>
      <fpage>16</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>In the last few years, Large Language Models (LLMs) have been widely adopted for data science tasks such as data extraction. In this scenario, textual documents are fed in the context of the LLM - as in a RAG setting - and tabular data is extracted via declarative queries. However, LLMs often struggle with complex queries, resulting in low precision and recall. This challenge highlights the limitations of existing data processing assumptions when applied to LLM-based query execution. Traditional optimization techniques are exclusively cost-based, while LLMs also present challenges over the output quality, and rely on catalog metadata, unavailable in this context. Our results show that traditional query optimization principles fail to generate the best plans in terms of result quality. To address this issue, we present a novel approach for optimizing SQL query results by introducing techniques tailored for LLMs. We present Galois, an intermediary between SQL queries and LLMs, treating the latter as a storage layer. We extend traditional optimization strategies to incorporate alternative physical operators designed for LLM-based query execution. Given the absence of a conventional data catalog, we introduce confidence-based metadata collection techniques for query optimization. Our results demonstrate the efectiveness of using such metadata in logical and physical optimization, ultimately showing that Galois successfully balances the trade-of between result's quality and execution cost in querying textual documents.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>SQL</kwd>
        <kwd>Query Optimization</kwd>
        <kwd>Data Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large Language Models (LLMs) have proved to be a valid solution in direct question answering
and data processing tasks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Solely through the usage of Natural Language (NL) or
SQLlike queries, users can extract meaningful and structured data exploiting both the internal
representations within these models, in fact querying the parametric knowledge in the LLM, and
external sources, via an in-context learning setting, e.g., RAG. Consequently, these models can
be seamlessly integrated into mainstream data retrieval and elaboration pipelines, eliminating
the need to resort to traditional user-centric extraction methods, which require either data
annotation eforts [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or manually crafted extraction pipelines [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The Problem with Data Outputs. Despite being able to process simple NL queries, more
NL Natural Language Question
What are name, size and population of European cities with
more than 1M people and more than fifteen private hospitals?</p>
      <p>Text to SQL
SQL SQL Query
SELECT name, size, population
FROM EUCities
WHERE population&gt;1M,
              num_private_hospitals&gt;15</p>
      <p>GALOIS Prompt</p>
      <p>Prompt
Prompt LLM</p>
      <p>
        Worst results
Most errors
More results
Some errors
Most results
Least errors
complex instances that require structured results as output often prove to be a challenge for
those models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. An example is shown in Figure 1. Given as in-context input a textual document
about private hospitals and the NL question: "What are names, size and population of European
cities with more than 1M people and more than fifteen private hospitals?", the correctness and
completeness of the output result are unsatisfactory1. Recent advancements allow LLMs to
process SQL queries directly from prompts, also by removing natural language ambiguities [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ],
obtaining more precise answers compared to the corresponding questions in NL. However,
feeding the corresponding SQL query:
Q1: SELECT name, size, population
      </p>
      <p>FROM EU_Cities</p>
      <p>WHERE population &gt; 1M AND num_private_hospitals &gt; 15
improves results w.r.t. natural language, but still lacks correctness and completeness. These
limitations underscore the inherent design constraints of LLMs. If prompted with queries that
require complex reasoning, such as with multiple conditions or aggregates, the models are not
able to produce a satisfactory result. Therefore, to get the best possible results, we argue that a
database management system should be used to handle the query execution while using the
LLM exclusively as a storage layer, as shown in the bottom part of Figure 1.
Challenges. This approach comes with a series of challenges. First, when querying LLMs
with SQL, a trade-of between the accuracy of the results and the execution costs must be
considered. While direct execution of SQL queries comprises the most cost-efective approach,
decomposing these queries into a sequence of small operator executions, akin to the plans
in a DBMS, yields superior result quality. As a result the traditional optimization principles,
primarily focused on cost, are only partially applicable. Additionally, LLMs do not provide
access to crucial metadata such as schema details, column statistics or histograms, significantly
complicating query optimization eforts.</p>
      <p>
        The Framework. Galois [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] is a query optimization framework designed to address the
challenges of optimizing SQL queries for LLMs. It defines a novel physical "Scan" operator
crafted for interfacing with LLMs, which balances the eficiency and accuracy of data retrieval.
Furthermore, it introduces a cost/quality model and accompanying optimization techniques
that balance execution cost with query accuracy, thus enhancing the overall retrieval quality.
Finally, it exposes dynamic methods for acquiring essential metadata during query execution,
      </p>
      <sec id="sec-1-1">
        <title>1We assume here that the entire document fits the LLM input context.</title>
        <p>n1 LLMScan
h select name, country [...] from EUCities 
sopu LLM LnPiasabmroisen cPoourtnutgry. population # pvt hosp</p>
        <p>France 20..15MM 196
N Madrid Spain 3.2M 20
p1 Filter-LLMScan
ll swehleercet npaompuel,actioounn&gt;tr1yM[..a.]nfdro#mpEvtUhCoistipe&gt;s1 5
sauhP LLM MRnaaomdmreied coSIutpaanlyitnry pop23u..82laMMtion # pv2t30hosp
liteecves1 swehleercet n#aFpmvietlt,hecoros-upLn&gt;Lt1ry5M[.S..]cfaronm EUCities 
sh LLM Znuarmiceh country population # pvt hosp
s
u Utrecht NSewthitez.r. 00..34MM 1108
P</p>
        <p>population &gt; 1M AND # pvt hosp &gt; 15
name country population # pvt hosp
Madrid Spain 3.2M 20</p>
        <p>Filter
Filter
&lt;empty&gt;
name country population # pvt hosp
MRaodmried SItpaalyin 23..82MM 230</p>
        <p>
          Filter
population &gt; 1M
name country population # pvt hosp
focusing on estimating the confidence of LLM output to bridge the gap typically filled by
catalog information. Experimental evaluations show the efectiveness of the proposed approach,
reporting improvements in result quality while maintaining a competitive edge in terms of
resource eficiency. While Galois can also handle querying the LLMs over their parametric
knowledge [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], in this work we focus on the in-context learning scenario, where the textual
documents are fed as input to the LLM.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Problem Formulation and Challenges</title>
      <p>
        The core challenge is executing SQL queries that cannot be answered by existing databases
but can be resolved using textual documents. The goal is to feed the queries directly to the
LLM that acts as an extractor, returning structured data from unstructured text. This setting
combines traditional DBMS challenges (eficiency) with LLM-specific ones (output quality and
token cost). Traditional optimizations focused on performance fail to ensure high-quality results
with LLMs. A major limitation is the lack of a catalog. Traditional metadata (e.g., column stats)
nor LLM-specific metadata is readily available, yet both are essential for optimization. Our
solution enables dynamic data extraction from documents provided at runtime, leaving the
broader case of accessing LLM pretraining knowledge to the full paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Logical Level. In traditional DBMSs, logical optimization focuses on minimizing resource
usage, such as CPU and memory. With LLMs, however, output quality becomes a key metric.</p>
      <p>Consider query Q1 in the top part of Figure 2. The simplest logical plan uses a single prompt
to retrieve all tuples (operator n1), followed by a filtering step outside the LLM (operator n2).
c1'</p>
      <p>This yields high precision but low recall as only one tuple is returned.</p>
      <p>A key optimization is condition pushdown. While it reduces execution cost by limiting LLM
interactions, it can create complex prompts that degrade quality. For instance, pushing both
population and hospital conditions (operator p1) saves tokens but overwhelms the LLM’s ability
to handle multi-faceted queries. Even pushing only the most selective condition (operator s1)
might lead to errors preventing the identification of relevant data and ultimately producing
an empty result. Achieving a balance between cost and quality requires analyzing each query
component’s efect on the LLM. In the last example in Figure 2 (c1), pushing the condition for
which the model is most confident leads to the best results.</p>
      <p>Physical Level. The next challenge is adapting query execution to treat LLMs as the data
storage layer. In the physical plan, this requires new operators, as data is accessed via natural
language prompts. Prompt generation becomes critical, balancing variability in data quality
and cost. Unlike traditional systems where a single command retrieves data, querying LLMs
demands more flexible strategies.</p>
      <p>For instance, a straightforward scanning technique retrieves the entire tuple set directly with
a prompt, as in the Table-Scan operator c1’ in Figure 3 that retrieves all tuples in one prompt,
minimizing interactions but often producing low-quality results. An alternative physical scan
operator first identifies values for key attributes and then iteratively requests additional data, as
with Key-Scan for operator c1” in Figure 3. This improves quality with more focused prompts
but increases the number of LLM calls. These trade-ofs highlight the complexity of designing
prompt-based operators that balance execution cost with result quality.</p>
      <p>Moreover, LLMs face two key limitations in knowledge extraction. First, they generate the
most likely next word based on prior text, often favoring frequent values from their training
data—making rare information harder to retrieve without multiple interactions. Second, their
output length is constrained, so a single response often fails to capture all relevant data.</p>
      <p>These challenges highlight the need for new operators and optimization strategies tailored
to LLMs. While traditional methods provide a foundation, they must be extended to support
LLM-based data processing. Crucially, dynamic metadata generation is essential, as it directly
impacts both logical and physical query planning.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In traditional DBMSs, query execution involves generating a logical plan to define the steps
needed to produce the results, followed by a physical plan to determine how the query will be
executed. However, querying LLMs requires diferent strategies to efectively manage logical
and physical optimization.</p>
      <p>Filter-LLMScan (LLM) Fetch data from LLM w.r.t. cond</p>
      <sec id="sec-3-1">
        <title>Operator</title>
        <p>LLMScan
Selection
Projection</p>
        <p>Join
Distinct
Grouping
Symbol
(LLM)
 
 
⋊⋉</p>
      </sec>
      <sec id="sec-3-2">
        <title>Description</title>
        <p>Fetch data from LLM
Select tuples w.r.t. cond
Extract attrs from tuples
Join two table given cond</p>
        <p>Removes duplicate tuples
Groups tuples on common values
and compute  over groups
3) single pushdown, where the Filter-LLMScan retrieves the tuples that match a single condition.</p>
        <p>Despite the inherent limitations in the considered pushdown strategies, the number of possible
logical plans for a given query can still be substantial. A natural strategy for reducing token
consumption is to push down all filter conditions into the LLMScan operator. While it minimizes
the required tokens, it does not always produce the most accurate results. In certain cases,
pushing down only a subset of conditions improves data quality because simpler prompts reduce
complex reasoning and the risk of hallucination.</p>
        <p>To guide logical optimization, Galois employs the LLM itself as a source of information for
estimating the most eficient logical plan. Using a classification prompt (that uses the table
schema  and the query ), it estimates a “high” or “low” confidence score for each atom in the
WHERE clause. All the atoms with “high” confidence are pushed down in the Filter-LLMScan.
Physical Level. The goal of LLMScan and Filter-LLMScan is to extract structured data from the
LLM, which is then processed by the logical plan. Galois achieves this by generating natural
language prompts tailored to the query.</p>
        <p>
          The natural strategy, called Table-Scan, prompts the LLM to extract all relevant tuples for
a given query . Given a logical plan  and schema , Galois uses an iterative prompting
approach for each LLMScan operator in . The initial prompt requests all table attributes and
instructs the LLM to return results in JSON format, aligned with the schema derived from .
Due to space limits, a template for this prompt is shown in the full paper [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>
          LLMs are known to benefit from techniques like Chain-of-Thought (CoT) prompting [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], which
improve reasoning and output quality. Based on this, we introduce Key-Scan, an alternative
scan operator designed to improve data extraction accuracy. Inspired by CoT, Key-Scan splits
data collection into two steps: first, Galois retrieves all key values for the target table; then, for
each retrieved key value, it fetches the corresponding attribute values. For some queries, this
two-step approach improves quality results by asking simpler and more specific prompts to
LLM. Due to space limits, the example prompt is shown in the full paper [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>For each SQL query , Galois must choose between two scan strategies: Table-Scan and
KeyScan. While Key-Scan, based on CoT prompting, generally ofers higher accuracy, Table-Scan
can outperform it in cases where additional context (i.e., the full table structure) helps the LLM
generate more reliable data.</p>
        <p>By choosing the right physical scan operator is therefore possible to improve the result data
quality. For this goal we rely again on metadata generated by the LLM itself.</p>
        <p>Given a query we estimate the confidence of the model in returning factual data in the Scan
operation. To do so we prompt the LLM to gather a confidence value between 0 and 1 (  ()).
To leverage the strengths of both Key-Scan and Table-Scan, we introduce a confidence threshold,
 . If  () &gt;  , we use Key-Scan; otherwise, Table-Scan is selected. The rationale is that when
the model’s confidence is low, the accuracy of key retrieval in Key-Scan may be compromised.
In such scenarios, Table-Scan provides a more reliable alternative by incorporating additional
context from other attributes, potentially improving the quality of data extraction. The drawback
of this approach is that it requires an extra interaction with the LLM, increasing the total costs
measured in the number of tokens.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results and Conclusions</title>
      <p>Galois can be seamlessly integrated with frameworks like Retrieval Augmented Generation
(RAG), which combine traditional information retrieval with the generative power of LLMs.
This experiment demonstrates how Galois’s optimizations enhance both the accuracy and
eficiency of LLM-based data retrieval in in-context learning settings.</p>
      <p>
        To evaluate the ability to handle novel information, we use the Premier and Fortune
datasets, containing 60 and 500 documents respectively. These corpora are designed to include
information not present in the LLM’s training data. Premier includes reports from the first six
match-days of the 2024–2025 Premier League season (scraped from BBC News), while Fortune
comprises data on the 2024 Fortune 500 companies, collected from Kaggle. Both datasets are
processed via a RAG engine built using LangChain4j. Each document is divided into text
segments with 128 tokens for Premier and 400 tokens for Fortune. Segments are encoded
using the “WhereIsAI/UAE-Large-V1” model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and stored in a vector database. At runtime,
the prompt is augmented with the query schema and the 50 most relevant segments, retrieved
via embedding similarity. All models use the same retrieved chunks for consistency. As LLM
we use LlaMa 3.1 70B.
      </p>
      <p>
        We evaluate variants of Galois against three baselines: NL, SQL, and Palimpzest [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (PZ). In
NL, the LLM is directly prompted with a natural language question; in SQL, the input is an SQL
query and both NL and SQL expect structured data as output. Palimpzest [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] shares our data
extraction use case but difers significantly in design and execution. While our system uses a
declarative SQL-based interface to define operations, Palimpzest adopts an ETL-like, procedural
approach, requiring users to explicitly define data transformations and model invocations as a
sequence of function calls in Python. This distinction has practical implications. Palimpzest
provides granular control over individual processing steps, which can be advantageous in
specialized use cases but comes at the cost of increased user efort and complexity. In addition, it
also ofers advanced features such as support for extracting and processing diverse documents.
For fair comparison, all queries in our experiments have been rewritten using Palimpzest’s API,
using functions such as filter, convert, and execute, to implement the SQL execution; we use its
optimization policy MaxQualityAtFixedCost, as suggested by the authors. Galois  simulates
a database optimization that pushes down All attributes and employs a Table-Scan strategy.
Galois  represents the Full system with all the optimizations based on the LLM confidence.
      </p>
      <p>
        Each experiment comprises a query  and a database . We compute the expected tuple set
by executing  over  ( = ()). We aim to compare  with the tuple set produced by
executing  on Galois (). As quality metrics to compare those two sets of tuples, we adopt
metrics used to benchmark SQL queries on LLMs [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]:
• F1-Cell: we compute the F1 score among the set of cells in  w.r.t those in . The
rationale of this metric is to evaluate the results considering only the cell values.
• Cardinality: measures the ratio between the size of  w.r.t the size of . The rationale
of this metric is to evaluate the capability of Galois in returning the right cardinality of the
results. In particular, to report a value between 0 and 1, the cardinality quality measure is
measured as: ((), ())/ ((), ()).
• Tuple Constraint: measures the fraction of tuples in  present in  comparing tuples
as whole. Tuple Constraint is 1.0 if  and  have the same schema, cardinality, and
cell values, making it stricter than F1-Cell, as it requires not only that the same values appear
but also within the same corresponding tuples.
• AVG-Score combines F1-Cell, Cardinality (that are soft metrics that do not consider tuple
schema at all), and Tuple Constraint (that is an hard metric) into a single metric, averaging
them to provide a comprehensive comparison score.
      </p>
      <p>
        To prevent false negatives, we normalize cell values in  and  before evaluation. We use
string similarity (Edit Distance [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]) with a 10% threshold to match similar values, such as “Bill
Clinton” and “Bill J. Clinton”. For numerical values, we allow a 10% diference w.r.t. the expected
value. We use simple and eficient comparisons, a more sophisticated implementation for
matching tuples could resort to Entity Resolution methods [
        <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
        ] or tuples matching [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>Results are shown in Table 2. NL and SQL perform poorly due to the challenges of complex
queries. The unstructured nature of natural language in NL and the rigid structure of SQL
restrict the LLM’s ability to interpret and respond to intricate queries accurately. Both Galois
versions outperform them, with Galois  scoring higher. Palimpzest achieves the highest
quality, but at a cost 11 times higher than Galois  . Palimpzest’s high costs are due to its</p>
      <p>Metric</p>
      <p>AVG-Score
# TOKENS in M
0.389
1.448</p>
      <p>SQL
multi-step processing, which involves repeated LLM interactions. In contrast, in Galois only
the scan operator interacts with the LLM (once per query), thus reducing computational costs.</p>
      <p>
        In summary, Galois efectively integrates with in-context learning (like RAG), achieving
comparable quality to the best baseline while reducing token costs. Moreover, Galois only
requires users to write SQL scripts. Experiments with queries executed over the parametric
knowledge of the LLMs confirm these results and are reported in the full paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Conclusions. We study the problem of querying textual documents through SQL queries over
LLMs. Our system acts as an intermediary between the user and the LLM, extending traditional
query optimization techniques to improve the precision and recall of query results from LLMs.
      </p>
      <p>
        Galois adopts a DB-first architecture, integrating the LLM directly within the database
operators. This direction opens several problems, such as the design of mechanisms to simulate
index-like eficiency using LLMs, e.g., through caching techniques based on prior interactions
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Alternatively, an LLM-first architecture poses intriguing possibilities with new challenges,
e.g., whether LLMs can replace DBMSs by ingesting structured data during training or
incontext. While research in tabular language models indicates that such a scenario is not yet
feasible, primarily due to context size limitations [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and its issues with long inputs [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], recent
advancements are overcoming this constraint [
        <xref ref-type="bibr" rid="ref22 ref23 ref24">22, 23, 24</xref>
        ].
      </p>
      <p>
        Another promising research direction involves support for queries spanning multiple
modalities, such as text, image, and structured data [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. This integration could enable users to extract
insights from diverse data formats in a unified querying framework [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Beyond iterative refinement, another open challenge arises from the inherent biases within
LLMs. These models may not return rare values unless explicitly prompted. For example, when
asking for “private hospitals”, the LLM may return the list of US hospitals first. Instead, if we
are interested in querying EU hospitals, our approach relies on the users specifying precisely
their intent, which may not always be the case in practice [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>
        Our work shows the increasing need to refine LLM confidence estimation mechanisms [
        <xref ref-type="bibr" rid="ref27">27, 28</xref>
        ].
Improving confidence estimation can lead to more reliable outputs, informing users of the
certainty associated with query responses by investigating how to incorporate the estimated
confidence from open LLMs.
      </p>
      <p>Acknowledgments. Veltri was partially supported by the TECH4YOU project.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <sec id="sec-5-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies (Volume 1: Long Papers), 2024, pp. 6577–6595.
[28] L. Chen, A. Perez-Lebel, F. M. Suchanek, G. Varoquaux, Reconfidencing llms from the
grouping loss perspective, arXiv preprint arXiv:2402.04957 (2024).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: NeurIPS,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiticariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Danilevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Reiss</surname>
          </string-name>
          , H. Zhu,
          <article-title>SystemT: Declarative text understanding for enterprise</article-title>
          , in: NAACL, Association for Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>76</fpage>
          -
          <lpage>83</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N18</fpage>
          -3010.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Cafarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Deepdive: declarative knowledge base construction</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>60</volume>
          (
          <year>2017</year>
          )
          <fpage>93</fpage>
          -
          <lpage>102</lpage>
          . doi:
          <volume>10</volume>
          .1145/3060586.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Language models as knowledge bases?</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>2463</fpage>
          -
          <lpage>2473</lpage>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <fpage>D19</fpage>
          -1250.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Veltri</surname>
          </string-name>
          , G. Badaro,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <article-title>Data ambiguity profiling for the generation of training examples</article-title>
          ,
          <source>in: 39th IEEE International Conference on Data Engineering, ICDE</source>
          <year>2023</year>
          , Anaheim, CA, USA, April 3-
          <issue>7</issue>
          ,
          <year>2023</year>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>450</fpage>
          -
          <lpage>463</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICDE55515.
          <year>2023</year>
          .
          <volume>00041</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Veltri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          , G. Badaro,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          , Pythia:
          <article-title>Unsupervised generation of ambiguous textual claims from relational data</article-title>
          ,
          <source>in: SIGMOD, ACM</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2409</fpage>
          -
          <lpage>2412</lpage>
          . doi:
          <volume>10</volume>
          .1145/3514221.3520164.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <article-title>Querying large language models with SQL</article-title>
          ,
          <source>in: Proceedings 27th International Conference on Extending Database Technology, EDBT</source>
          <year>2024</year>
          , Paestum, Italy, March 25 - March 28, OpenProceedings.org,
          <year>2024</year>
          , pp.
          <fpage>365</fpage>
          -
          <lpage>372</lpage>
          . doi:
          <volume>10</volume>
          .48786/EDBT.
          <year>2024</year>
          .
          <volume>32</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Satriani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Veltri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Varriale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <article-title>Logical and physical optimizations for sql query execution over large language models</article-title>
          ,
          <source>Proc. ACM Manag. Data</source>
          <volume>3</volume>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .1145/3725411.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          , M. Bosma, b. ichter,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>
          , in: S. Koyejo,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Belgrave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Oh (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>35</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2022</year>
          , pp.
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Angle-optimized text embeddings</article-title>
          ,
          <source>arXiv preprint arXiv:2309.12871</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cafarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kraska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Vitagliano</surname>
          </string-name>
          ,
          <article-title>A declarative system for optimizing ai workloads</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv. org/abs/2405.14696. arXiv:
          <volume>2405</volume>
          .
          <fpage>14696</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Papicchio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          , L. Cagliero, Qatch:
          <article-title>Benchmarking table representation learning models on your data, in: NeurIPS (Datasets</article-title>
          and Benchmarks),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Biswal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kamsetty</surname>
          </string-name>
          , S. Liu,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zaharia, Text2sql is not enough: Unifying ai and databases with tag</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/ abs/2408.14717. arXiv:
          <volume>2408</volume>
          .
          <fpage>14717</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marzal</surname>
          </string-name>
          , E. Vidal,
          <article-title>Computation of normalized edit distance and applications</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>15</volume>
          (
          <year>1993</year>
          )
          <fpage>926</fpage>
          -
          <lpage>932</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Papadakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ioannou</surname>
          </string-name>
          , E. Thanos, T. Palpanas,
          <article-title>The four generations of entity resolution</article-title>
          , Springer,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Simonini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zecchini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          , et al.,
          <source>Entity resolution on-demand, Proceedings of the VLDB Endowment</source>
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>1506</fpage>
          -
          <lpage>1518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Buoncristiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mecca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          , E. Veltri,
          <article-title>Detective gadget: Generic iterative entity resolution over dirty data</article-title>
          ,
          <source>Data</source>
          <volume>9</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .3390/data9120139.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Glavic</surname>
          </string-name>
          , G. Mecca,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          , E. Veltri,
          <article-title>Similarity measures for incomplete database instances</article-title>
          ,
          <source>in: Proceedings 27th International Conference on Extending Database Technology, EDBT</source>
          <year>2024</year>
          , Paestum, Italy, March 25 - March 28, OpenProceedings.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ray</surname>
          </string-name>
          , Y. Cheng,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , Cacheblend:
          <article-title>Fast large language model serving for RAG with cached knowledge fusion</article-title>
          ,
          <source>CoRR abs/2405</source>
          .16444 (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.2405.16444. arXiv:
          <volume>2405</volume>
          .
          <fpage>16444</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Badaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paolo</surname>
          </string-name>
          ,
          <article-title>Transformers for Tabular Data Representation: A Survey of Models and Applications, Transactions of the Association for Computational Linguistics 11 (</article-title>
          <year>2023</year>
          )
          <fpage>227</fpage>
          -
          <lpage>249</lpage>
          . doi:doi.org/10.1162/tacl_a_
          <fpage>00544</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paranjape</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bevilacqua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Lost in the middle: How language models use long contexts</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>12</volume>
          (
          <year>2024</year>
          )
          <fpage>157</fpage>
          -
          <lpage>173</lpage>
          . doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00638</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-Y. Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          , L. Qiu,
          <article-title>LLMLingua: Compressing prompts for accelerated inference of large language models</article-title>
          ,
          <source>in: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Singapore,
          <year>2023</year>
          , pp.
          <fpage>13358</fpage>
          -
          <lpage>13376</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>825</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          , Y. Cheng, S. Ray,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lu</surname>
          </string-name>
          , G. Ananthanarayanan,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , Cachegen:
          <article-title>Kv cache compression and streaming for fast large language model serving</article-title>
          ,
          <source>in: Proceedings of the ACM SIGCOMM 2024 Conference</source>
          , ACM SIGCOMM '
          <volume>24</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>38</fpage>
          -
          <lpage>56</lpage>
          . doi:
          <volume>10</volume>
          .1145/3651890.3672274.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>G.</given-names>
            <surname>Corallo</surname>
          </string-name>
          , P. Papotti,
          <article-title>FINCH: prompt-guided key-value cache compression for large language models</article-title>
          ,
          <source>Trans. Assoc. Comput. Linguistics</source>
          <volume>12</volume>
          (
          <year>2024</year>
          )
          <fpage>1517</fpage>
          -
          <lpage>1532</lpage>
          . URL: https: //doi.org/10.1162/tacl_a_00716. doi:
          <volume>10</volume>
          .1162/TACL\_A\_
          <volume>00716</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Urban</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Binnig, CAESURA: language models as multi-modal query planners</article-title>
          ,
          <source>in: 14th Conference on Innovative Data Systems Research, CIDR</source>
          <year>2024</year>
          ,
          <article-title>Chaminade</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA, January
          <volume>14</volume>
          -
          <issue>17</issue>
          ,
          <year>2024</year>
          , www.cidrdb.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Floratou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Psallidas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Deep</surname>
          </string-name>
          , G. Hagleither,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cahoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Alotaibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Henkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Grootel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. V.</given-names>
            <surname>Emani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pandit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shnayder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Curino, NL2SQL is a solved problem</article-title>
          ... not!,
          <source>in: 14th Conference on Innovative Data Systems Research, CIDR</source>
          <year>2024</year>
          ,
          <article-title>Chaminade</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA, January
          <volume>14</volume>
          -
          <issue>17</issue>
          ,
          <year>2024</year>
          , www.cidrdb.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Koeppl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych,</surname>
          </string-name>
          <article-title>A survey of confidence estimation and calibration in large language models</article-title>
          ,
          <source>in: Proceedings of the 2024 Conference of</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>