<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Total</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Hyperspecialized Compilation for Serverless Data Analytics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leonhard Spiegelberg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Kraska</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malte Schwarzkopf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Brown University</institution>
          ,
          <addr-line>Providence, Rhode</addr-line>
          <country country="IS">Island</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MIT</institution>
          ,
          <addr-line>Cambridge, Massachusetts</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2231</year>
      </pub-date>
      <volume>6</volume>
      <abstract>
        <p>Serverless functions can be spun up in milliseconds and scaled out quickly, forming an ideal platform for quick, interactive parallel queries over large data sets. Modern databases use code generation to produce efficient physical plans, but compiling such a plan on each serverless function is costly: every millisecond spent executing on serverless functions multiplies in cost by the number of functions running. Existing serverless data science frameworks therefore generate and compile code on the client, which precludes specializing this code to patterns that may exist in the input data of individual serverless functions. This paper argues for exploring a trade-off space between one-off code generation on the client, and hyperspecialized compilation that generates bespoke code on each serverless function. Our preliminary experiments show that hyperspecialization outperforms client-based compilation on typical heterogeneous datasets in both cost and performance by 2-4× .</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>for example, to specialize the code to schema changes that
occur over time, to constant-fold values that change rarely
Designing an efficient data analytics framework that uti- (e.g., years), or to fit to other patterns in the input data,
lizes serverless functions is challenging, as it must balance such as data sorted by categories. In other words, while
parallelism, communication, and runtime costs. Many compiling the same code on each Lambda is wasteful, our
modern databases and data analytics systems allow end- idea is to generate different specialized code paths on
indiusers to write queries in familiar languages like SQL or vidual Lambda functions to offset compilation overheads
Python, but generate code and compile these queries into by obtaining more efficient code for execution. As every
native machine code for efcfiiency [ 1, 2, 3, 4, 5, 6, 7]. millisecond on a Lambda is expensive and comes at a
Using compiled code in a serverless setting makes sense, premium over longer-running provisioned resources, it
beas more efficient code directly lowers costs and avoids comes critical to hit the right trade-off between
ahead-ofmerely parallelizing overheads [8]. Code generation, and time work on the client and the Lambdas and the runtime
compilation into machine code naturally fit on the client, reductions realized.
which knows the query and can generate code before Our approach, hyperspecialization, demonstrates that
dispatching hundreds or thousands of parallel serverless compilation on individual Lambdas is feasible and
benefunctions (“Lambdas” for short in the rest of this paper) ifcial to craft efcfiient data analytics frameworks on top
that each operate over a part of the input data. Existing of serverless functions. We present preliminary results
serverless frameworks like Starling [9] or Lambada [2] from a prototype hyperspecializing system, Viton, built
therefore employ code generation on the client machine, on top of an existing analytics system for Python
workand invoke Lambdas with the generated plan in form of a loads, Tuplex [1]. Our preliminary findings indicate that
custom runtime executable or shared object, which avoids compilation for subsets on Lambdas can lead to both cost
compilation costs on individual Lambdas. But what if we and efficiency improvements by 2–4 × .
performed code generation and compilation on individual
Lambda functions?</p>
      <p>
        This fine-grained code generation and compilation al- 2. Motivation
lows harnessing additional opportunities for performance
optimization: as each Lambda processes a small part of Python became the dominant language for writing modern
the input data (e.g., a day’s worth) and many datasets data science pipelines due to its rich universe of packages
have shifting distributions and patterns over time, code and popular data processing frameworks like Pandas or
generation can produce specialized, more efficient code if PySpark. Similarly, writing serverless functions in Python
it knows the input data distribution. This allows a system, is attractive for data scientists, as the benefit of the quick
launch of a Python runtime [
        <xref ref-type="bibr" rid="ref18">10</xref>
        ] together with the
parallelism of thousands of serverless functions makes Python
DJoaitnat BWaosrekss(hVoLpDsBaWt4’293th) —IntWeronraktsihoonpalonCSoenrfevreernlecses DonatVaeArynaLlyatrigces attractive for large-scale data processing when trying to
(SDA’23), August 28 - September 1, 2023, Vancouver, Canada minimize end-to-end runtime.
$ leonhard@brown.edu (L. Spiegelberg); kraska@mit.edu For example, PyWren [11] is a popular framework that
(T. Kraska); malte@brown.edu (M. Schwarzkopf) combines Python, AWS Lambda serverless functions, and
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License storage via S3 without the need to provision a cluster first
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutiRon 4W.0Iontrekrnsathioonapl(CPCrBoYc4e.0e)d.ings (CEUR-WS.org)
to run simple queries that can be expressed as a sequence cution as the serverless function parallelism increases and
of map operations, with each map operation taking a user- per-function runtime shrinks, making it harder to amortize
defined function (UDF) as a parameter. PyWren’s limited long compile times.
      </p>
      <p>API only allows for simple data analytics workloads that Vectorized execution engines that rely on pre-compiled
apply a UDF f to each of N input rows stored within primitives trade-off shorter compile time against missed
S3, but it demonstrates that processing large quantities optimization potential for generated code and larger code
via serverless functions relying on Python is feasible and size compared to fully-compiling, fine-grained execution
scales nearly linearly. engines. Thus, it becomes difcfiult to provide both
effi</p>
      <p>However, this scalability comes at a cost: for increased cient code and low, interactive end-to-end query latency
dataset sizes, the benefit of the Python runtime’s low by relying on a classic compiler.
startup times gets eclipsed by the slow execution speed Heterogeneity and marginal optima. In cases where
for the actual processing work in the Python UDFs. A the data distribution varies across subsets of the input
data scientist might be tempted to simply increase the data, compiling different code paths may be beneficial.
parallelism level to reduce runtime, but this could be an Generating individual code for subsets of the data would
expensive mistake: each millisecond wasted due to slow allow a system to locally specialize and emit optimized
execution rapidly multiplies by the number of Lambda code that may outperform a single, globally optimized
functions invoked—e.g., spending an extra second on code path. By compiling different code paths in parallel on
5, 000 Lambdas on AWS with 1GB memory each trans- individual Lambdas, the system can also prevent stalling
lates to an added $0.08 cost. Reducing end-to-end run- execution when all Lambdas would otherwise need to wait
time by scaling up the parallelism may therefore end up on the physical plan to compile on the client machine.
merely parallelizing Python overhead, hiding a higher- Given the HTTP request model of Lambdas, existing
than-necessary total compute cost (in cycles and dollars). techniques involving multiple code-paths—such as
on</p>
      <p>A possible answer is to instead generate efcfiient ma- stack replacement, where an existing code-path is replaced
chine code, similar to what an optimizing C/C++ compiler on-the-fly with a more performant version [ 13]—are
chalmay produce. This is a tried-and-tested approach in a lenging to realize, as serverless environments allow only
single-machine setting, but making it work for interactive for limited communication and synchronization between
queries on Lambdas poses new challenges. individual Lambdas (or require extensive effort to
overcome network limitations [14]), and provide no
bidirectional communication channel to the client.
3. Code generation for Lambdas Pre-baking code in the form of specialized primitives,
as proposed in micro-adaptivity [15], may benefit
longCode generation improves runtime efficiency for queries running queries, but could also lead to high runtime costs
at the expense of a one-time compile cost, which amor- when swapping between paths too often, or miss out on
tizes when running over sufficiently large input data. In- optimization potential when relying on primitives that are
deed, code generation (either fine-grained, or by tem- too coarse-grained.
plating and combining query fragments) and subsequent Low startup times come from light runtimes. To
compilation are a standard way to produce an efcfiient guarantee fast startup times, images for Lambda functions
physical plan. In the serverless setting, this raises the should be as small as possible.1 A common optimization
question where and how to generate and compile a query. is to use warmed-up instances by keeping “hot”
contain</p>
      <p>
        Code generation blocks query execution. Compil- ers around, via warmup calls or by paying a premium to
ing on the client machine (or via a dedicated compilation the vendor (e.g., AWS Lambda provisioned functions).
service) is cost-effective, as no Lambda functions are in- Caching techniques on the service side [16, 17, 18, 19]
voked, but also limits the parallelism to the client machine or loading only necessary application code during
runand blocks query execution until this machine finishes time [
        <xref ref-type="bibr" rid="ref1">20</xref>
        ] can also help to drive down overheads.
compiling the plan. Generating C/C++ code is a popular Frameworks that are able to compile most of the
userchoice because it makes code generation easy, but C/C++ supplied logic reduce the image size by including only
compilers like Clang or GCC take a long time to generate minimal runtime and compile logic. This much reduces
code with optimizations enabled. For example, Meta re- startup time compared to including a full language
inports that its unified execution engine, Velox, which uses terpreter and all dependencies, even though it may
reC/C++ templating and code generation, takes tens of sec- quire shipping compiled code from the client to individual
onds to generate code, invoke a C/C++ compiler, and pro- Lambdas, or compiling code on them.
duce a shared library to load into the execution engine [12].
      </p>
      <p>While ahead-of time code generation for queries can be
cost-effective, as shown in proof-of-concept engines like
Starling [9], it can become the dominant cost in query
exe</p>
      <sec id="sec-1-1">
        <title>1Image size restrictions (e.g., 250MB on AWS Lambda) can be over</title>
        <p>come using a container registry at the cost of higher startup time.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Hyperspecialization</title>
      <p>The central idea of hyperspecialization is to generate
bespoke, specialized code for each input slice rather than to
rely on a single, global specialization. As emitting
different code paths benefits only heterogeneous datasets, we
focus on such in the following. For homogeneous datasets,
a system would automatically disable hyperspecialization,
or let users do so explicitly.</p>
      <sec id="sec-2-1">
        <title>4.1. Challenges</title>
        <p>Python program
client
2</p>
        <p>sample and
globally pre-optimize
3 codegen global path
and interpreter path
The overall challenge of hyperspecialization is that any Figure 1: Viton system architecture: the client performs initial
cost to perform hyperspecialization weighs against the sampling and code generation, but each serverless Lambda
performance benefits of better-fitted code. In particular, function further samples and specializes to its particular input.
a hyperspecializing query compiler must avoid situations
where hyperspecialization performs worse than just a sin- an identical number of random samples from each group.
gle, globally-generated code path. Picking random samples within a group avoids sampling</p>
        <p>Balancing optimization cost. One key challenge is errors. Viton then detects whether the Lambda’s input
to balance where the system generates, optimizes, and data distribution differs from the global distribution. If
executes code. Typically, the client machine issuing the so, Viton triggers re-optimization of the complete stage
query to each Lambda executor has limited parallelism assigned to the Lambda, which fits both logic and data
repand a slow connection to a blob service like S3. However, resentation tightly to the concrete input data the Lambda
any compute time spent on the client machine is essen- is about to process.
tially free, whereas every single millisecond spent on a
Lambda multiplies by the parallelism employed. Keeping
overheads low on each Lambda is crucial, but spending 4.2. Design
too much time on the client to generate and optimize code
results in a slow query and a bad user experience.</p>
        <p>In Viton, our hyperspecializing query compiler, we find
a compromise: Viton performs a raw global optimization
using a cheap sample on the client that it uses to split
a query into stages, to project an initial set of columns,
and to perform logical optimizations (like pushing filters
through joins). Re-optimization on the Lambdas then
resolves any initial sampling errors Viton may have incurred
on the client and addresses heterogeneity within the input
data. With this design choice, Viton balances the cost of
too much optimization and code generation on a Lambda
versus increased end-to-end time.</p>
        <p>Balancing sampling cost. To generate a new
specialized code-path, a Lambda must draw an input data sample
for its specific input slice from S3. Controlling the
sampling cost here is challenging, as the system must avoid
issuing too many S3 requests and spending cycles
parsing many rows, but must also ensure that the sample is
representative. For example, sorted input data easily
provokes sampling errors where using randomized sampling
or sampling the first and last rows only.</p>
        <p>Viton issues two S3 requests to get a block of fixed size
of the start and end of a file to base the initial sample on.</p>
        <p>To further reduce sampling cost, Viton uses stratified
sampling instead of parsing all available rows in the received
data blocks. With stratified sampling, Viton partitions the
input data into groups (strata) of equal size, and draws
We base the design of Viton on a setting in which a
single client machine issues AWS Lambda requests for data
stored in S3. Viton divides query execution into two steps
when it comes to planning, reflected in the overall system
architecture (Figure 1). In a first step, which executes
on the client, Viton draws a small initial sample from S3
to estimate an initial data distribution for the query to
perform initial query planning steps, like detecting the
schema, deciding which stages to generate, and collecting
globally helpful statistics to derive a global physical plan.</p>
        <p>Viton intentionally keeps the sampling on the client cheap,
as it expects hyperspecialization to adapt the query during
execution. Viton also generates and compiles a general
code path that is globally optimized and serves as a
fallback on each Lambda executor when subsets are similar
in distribution or hyperspecialization on an executor fails.</p>
        <p>Viton then executes each stage using parallel Lambda
executors. With hyper-specialization mode active, Viton
assigns each Lambda a specialization unit. While there
may be different strategies on how to identify and assign
specialization units, in Viton, each input file serves as a
specialization unit. We base this choice on the assumption
that data sets are often partitioned by initial attributes,
such as time. Thus, individual files marginalize the data
distribution such that marginal distributions have overall
lower variance. For historical data, this is typically the
time of collection, but other schemes exist (e.g.,
categorical grouping or sorted data).</p>
        <sec id="sec-2-1-1">
          <title>In the second step, each Lambda draws a new sample</title>
          <p>and re-optimizes the stage if the data distribution differs
from the global sample. In order to re-optimize a stage on
a Lambda executor, Viton ships logical operators together
with associated UDFs in the form of lightly annotated
abstract syntax trees (ASTs).</p>
          <p>Specializing code on each Lambda on the new sample
allows the specialization to combine logical with compiler
optimizations, with each potentially benefiting the other.
For example, a UDF may require different input columns
to be parsed for input data from different years, but a
globally optimized pipeline would always parse all the
union of all required input columns. By re-optimizing the
code locally and detecting common branches (a compiler
optimization), Viton avoids parsing unnecessary columns
in the first place (a logical pushdown optimization).
Likewise, Viton could remove operators that become dead
code, or reorder filters based on patterns in the data.</p>
          <p>To make hyperspecialization work, the cost of
executing all these steps has to be low enough to be offset by a
performance gain through a more efficient code path.
Viton uses aggressive optimizations, which may work for a
subset of the data, but would likely fail if applied globally.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>4.3. Optimizations</title>
        <sec id="sec-2-2-1">
          <title>Viton adds two additional, aggressively-specializing spec</title>
          <p>ulative optimizations to those already in Tuplex [1].</p>
          <p>Constant folding applies when an input data column
is constant (e.g., a year or month), and allows Viton to
remove deserialization of constant data and eliminates
unnecessary code. While constant folding is a well-known
compiler optimization, Viton applies it as a logical
optimization to avoid deserialization.</p>
          <p>Filter promotion assumes that a filter condition always
holds or fails, which reduces code complexity by
eliminating any future checks on the filter condition and allows
Viton to base other optimizations only on sample rows
that pass the filter. In the best case, filter promotion fully
collapses individual operators.</p>
          <p>These two optimizations are examples of a broader
class of speculative optimizations that may be effective
locally on subsets of a dataset. They also benefit logical
optimizations when, e.g., they reduce the set of input
columns required.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>4.4. Implementation</title>
        <sec id="sec-2-3-1">
          <title>We implemented our Viton prototype on top of Tuplex [1].</title>
          <p>
            Creating Viton required adding support for more
aggressive optimizations that can exploit properties of marginal
distributions, and extending the early-stage Lambda
backend of Tuplex to support shipping stages in the form of
abstract syntax trees (ASTs) to Lambda executors. For
this, we implemented a custom AWS Lambda runtime as
this was more efficient in micro-benchmarks than building
on top of existing runtimes in AWS Lambda. In addition
to implementing per-Lambda, per-input file sampling, and
hyperspecialized code generation, Viton also adds support
for semi-structured JSON files with a parser built on top
of simdjson [
            <xref ref-type="bibr" rid="ref9">21</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Preliminary Results</title>
      <sec id="sec-3-1">
        <title>We configure each Lambda to run a single Viton executor</title>
        <p>that uses up to 10 GB of memory and a maximum of three
threads. As of June 2023, a Lambda instance with 10
GB of memory has six vCPUs, three of which we use
for processing and three for S3. We run the client on a
single r5d.xlarge EC2 instance. For our preliminary
evaluation, we evaluate two queries.</p>
        <p>Flights query. This query performs data cleaning over
the flights dataset [ 22], but imputes missing values for
delay factors prior to 06/2003, and retrieves a cleaned
result for the years 2002–2005. Due to a schema change,
delay information prior to 06/2003 was collected only as
a single, aggregate delay factor in the form of one column
which then changed into collecting detailed information
breaking down delays into several delay factors using
additional columns. The input data consists of 410 files (83.51
GB total) with sizes from 177–284 MB, each containing
data for one month between 10/1987 to 11/2021.</p>
        <p>Github query. The second query analyzes historical
data in the Github Archive dataset collected from Github
since February, 2011 [23], which contains raw
information about 20+ events. Within this dataset, data is
organized as newline-delimited JSON files for each day.
Schema changes due to introduction of new fields are
frequent (e.g., there are 3,748 changes over 417 days [24]).
In addition, the schema of each row varies depending on
the event type and time of collection, as data collection
used multiple APIs with different response schemas over
time. Due to resource constraints, we limit out experiment
to a subset of eleven files for October 15th of each year
(35.5GB total). We run a query that, for each fork event,
extracts the number of commits, original repository ID,
and when a fork happened.</p>
        <p>Results. We evaluate the potential of
hyperspecialization by measuring the runtime improvements that
specialized code paths provide. We keep files in each dataset
partitioned as they were in the original dataset, including
heterogeneous input file sizes, and measure performance
without hyperspecialization (i.e., vanilla Tuplex [1]), with
hyperspecialization using only Tuplex’s existing
optimizations (e.g., speculating on NULL values), and with
aggressive hyperspecialization, which adds the two new
optimizations from §4.3. A good result would show
hyperspecialization reducing the querys’ end-to-end runtime
and monetary cost.
125
s
in100
e
m
it 75
d
n
-eo50
-t
d
en25</p>
      </sec>
      <sec id="sec-3-2">
        <title>Afterwards, Viton’s Lambdas spend about 44% of their</title>
        <p>7–10 second execution time on hyperspecialization
(Figure 3a), and the remainder of execution time processing
data. Figures 3b and 4 further break down the time spent
on Lambdas. Sampling takes about one second and code
generation and compilation take about 1.2 seconds per
Lambda. Importantly, it was necessary to restrict
ourselves to a set of cheap LLVM optimizations in order
to achieve this quick optimization time. Combined with
other overheads, the total overheads of
hyperspecialization come to 2.43 seconds, while 2.86 seconds are spent
running the specialized fast path, and 0.15 seconds on the
general compiled code path or in the interpreter. The
initial time spent on the client could be reduced by caching
information about files stored in S3, as the client spends
most of the time accessing S3.</p>
        <p>These results indicate that hyperspecialization is
effective and can amortize its overheads sufficiently to provide
end-to-end runtime reduction and cost savings.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusion and Outlook</title>
      <p>(b) Cumulative time spent on Lambda executors.</p>
      <sec id="sec-4-1">
        <title>In this paper, we introduced the idea of hyperspecializa</title>
        <p>tion. Our preliminary results indicate that
hyperspecial</p>
        <p>Figure 2 shows the results. Hyperspecialization both ization is a promising direction. Further work will need
makes existing optimizations more impactful (“hyperspe- to investigate several research questions.
cialization”), reducing runtime by 1.25–2× , and enables What specialization unit size to pick? We want to
extra, aggressive optimizations (“aggr. hyperspecializa- quickly identify large, distinct subsets of input data and
tion”) that further reduce runtime for a total runtime gain compile efficient code for them. However, optimizing too
of 2.3–3× . These reduced runtimes translate into 2.3– narrowly may fail to amortize the overheads of
hyperspe2.8× lower cost-per-query. The github query sees larger cialization despite improvements in performance. New
improvements from the extra optimizations, as it benefits techniques to identify regions for which
hyperspecializafrom both filter promotion and constant folding, while the tion is a good idea and for a query optimizer to utilize this
lfights query only benefits from constant folding. These information are needed.
early results indicate that generating hyperspecialized How to handle scenarios where compilation cost is
code paths for sufcfiiently large specialization units can high? Interpreters with JIT-compilation support typically
yield overall improvements in cost and performance, amor- compile only small code regions like individual loops or
tizing any overheads incurred. functions. But query compilation for a full query can</p>
        <p>Breakdown. We now break down an individual run of become prohibitively expensive. Automating the process
the flights query to understand the overheads of hyperspe- of detecting when to perform costly compilation within a
cialization. Figure 3 shows a timeline of the query. Viton serverless setting, and what optimizations are affordable,
spends 6.9 seconds on the client retrieving data from S3, is part of the set of research questions we are just starting
sampling globally, and generating the global code path. to understand better.</p>
      </sec>
      <sec id="sec-4-2">
        <title>This research was supported by a Meta PhD fellowship.</title>
        <p>
          We thank Ben Givertz, Yunzhi Shao, Andrew Wei, Rhea
Goyal, Shreeyash Gotmare, Khemarat (March)
Boonyapaluk and Rahul Yesantharao for their contributions to
Viton’s implementation. This research was supported
by NSF awards DGE-2039354 and IIS-1453171, and by
funding from Google and VMware.
[8] F. McSherry, M. Isard, D. G. Murray, Scalability!
but at what COST?, in: Proceedings of the 15th
Workshop on Hot Topics in Operating Systems
(HotOS), 2015.
[9] M. Perron, R. Castro Fernandez, D. DeWitt, S.
Madden, Starling: A scalable query engine on cloud
functions, in: Proceedings of the 2020 ACM
SIGMOD International Conference on
Management of Data, SIGMOD ’20, Association for
Computing Machinery, New York, NY, USA, 2020,
p. 131–141. URL: https://doi.org/10.1145/3318464.
3380609. doi:10.1145/3318464.3380609.
[1] L. Spiegelberg, R. Yesantharao, M. Schwarzkopf, [
          <xref ref-type="bibr" rid="ref18">10</xref>
          ] D. Jackson, G. Clynch, An investigation of the
imT. Kraska, Tuplex: Data Science in Python at Native pact of language runtime on the performance and
Code Speed, Association for Computing Machinery, cost of serverless functions, in: 2018 IEEE/ACM
New York, NY, USA, 2021, p. 1718–1731. URL: International Conference on Utility and Cloud
Comhttps://doi.org/10.1145/3448016.3457244. puting Companion (UCC Companion), 2018, pp.
[2] I. Müller, R. Marroquín, G. Alonso, Lambada: In- 154–160. doi:10.1109/UCC-Companion.2018.
teractive data analytics on cold data using serverless 00050.
cloud infrastructure, in: Proceedings of the 2020 [11] E. Jonas, Q. Pu, S. Venkataraman, I. Stoica,
ACM SIGMOD International Conference on Man- B. Recht, Occupy the cloud: Distributed computing
agement of Data, SIGMOD ’20, Association for for the 99%, in: Proceedings of the 2017
SympoComputing Machinery, New York, NY, USA, 2020, sium on Cloud Computing, 2017, pp. 445–451.
p. 115–130. URL: https://doi.org/10.1145/3318464. [12] P. Pedreira, O. Erling, M. Basmanova, K. Wilfong,
L. Sakka, K. Pai, W. He, B. Chattopadhyay, Velox:
[3] J3.38S9o7m5p8o.ldsokii:,10M..11Z4u5k/o3w3s1k8i,4P6.4.B3o3n8cz9,758V.ector- meta’s unified execution engine, Proceedings of the
ization vs. compilation in query execution, in: VLDB Endowment 15 (2022) 3372–3384.
Proceedings of the Seventh International Work- [13] G. M. Essertel, R. Y. Tahboub, T. Rompf, On-stack
shop on Data Management on New Hardware, Da- replacement for program generators and
source-toMoN ’11, Association for Computing Machinery, source compilers, in: Proceedings of the 20th ACM
New York, NY, USA, 2011, p. 33–40. URL: https: SIGPLAN International Conference on Generative
//doi.org/10.1145/1995441.1995446. doi:10.1145/ Programming: Concepts and Experiences, GPCE
2021, Association for Computing Machinery, New
[4] 1T9.9N54eu4m1.a1n9n,9544E6f.cfiiently compiling efcfiient York, NY, USA, 2021, p. 156–169. URL: https://
query plans for modern hardware, Proc. VLDB doi.org/10.1145/3486609.3487207. doi:10.1145/
Endow. 4 (2011) 539–550. URL: https://doi. 3486609.3487207.
org/10.14778/2002938.2002940. doi:10.14778/ [14] M. Wawrzoniak, I. Müller, R. Fraga Barcelos
Paulus Bruno, G. Alonso, Boxer: Data analytics
[5] 2K0. 0K2r9ik3e8l.la2s0,0S2. 9D4.0V.iglas, M. Cintra, Generating on network-enabled serverless platforms, in: 11th
code for holistic query evaluation, in: 2010 IEEE Annual Conference on Innovative Data Systems
Re26th International Conference on Data Engineering search (CIDR 2021), 2021.
        </p>
        <p>(ICDE 2010), IEEE, 2010, pp. 613–624. [15] B. Ra˘ducanu, P. Boncz, M. Zukowski, Micro
[6] R. Y. Tahboub, G. M. Essertel, T. Rompf, How adaptivity in vectorwise, in: Proceedings of the
to architect a query compiler, revisited, in: Pro- 2013 ACM SIGMOD International Conference
ceedings of the 2018 International Conference on on Management of Data, SIGMOD ’13,
AssociManagement of Data, SIGMOD ’18, Association for ation for Computing Machinery, New York, NY,
Computing Machinery, New York, NY, USA, 2018, USA, 2013, p. 1231–1242. URL: https://doi.org/10.
p. 307–322. URL: https://doi.org/10.1145/3183713. 1145/2463676.2465292. doi:10.1145/2463676.
[7] 3W1.9Z6h8a9n3g. ,dJo.iK:1i0m.,1K1.4A5/.3R1o8ss3,7E1.3S.e3d1la9r6,L89.3S.tadler, [16] 2D4. 6D5u2,92T.. Yu, Y. Xia, B. Zang, G. Yan, C. Qin,
Adaptive code generation for data-intensive ana- Q. Wu, H. Chen, Catalyzer: Sub-millisecond
lytics, Proc. VLDB Endow. 14 (2021) 929–942. startup for serverless computing with
initializationURL: https://doi.org/10.14778/3447689.3447697. less booting, in: Proceedings of the Twenty-Fifth
International Conference on Architectural Support for
doi:10.14778/3447689.3447697. Programming Languages and Operating Systems,</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>ASPLOS '20</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machin-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>ery</surname>
          </string-name>
          , New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>467</fpage>
          -
          <lpage>481</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>https://doi.org/10.1145/3373376.3378512. doi:10.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <volume>1145</volume>
          /3373376.3378512. [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singhvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Balasubramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Houck</surname>
          </string-name>
          , M. D.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          ing, SoCC '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machin-
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>ery</surname>
          </string-name>
          , New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>138</fpage>
          -
          <lpage>152</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>https://doi.org/10.1145/3472883.3486981. doi:10.</mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <volume>1145</volume>
          /3472883.3486981. [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mvondo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bacou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nguetchouang</surname>
          </string-name>
          , L. Ngale,
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>EuroSys '21</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machin-
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>ery</surname>
          </string-name>
          , New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>228</fpage>
          -
          <lpage>244</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>https://doi.org/10.1145/3447786.3456239. doi:10.</mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <volume>1145</volume>
          /3447786.3456239. [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ao</surname>
          </string-name>
          , G. Porter,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Voelker</surname>
          </string-name>
          , Faasnap: Faas
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Computer</given-names>
            <surname>Systems</surname>
          </string-name>
          , EuroSys '22, Association for
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Computing</given-names>
            <surname>Machinery</surname>
          </string-name>
          , New York, NY, USA,
          <year>2022</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          p.
          <fpage>730</fpage>
          -
          <lpage>746</lpage>
          . URL: https://doi.org/10.1145/3492321.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          3524270. doi:
          <volume>10</volume>
          .1145/3492321.3524270. [20]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Y. Liu,
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Softw. Eng. Methodol.</surname>
          </string-name>
          (
          <year>2023</year>
          ). URL: https://doi.org/
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          10.1145/3585007. doi:
          <volume>10</volume>
          .1145/3585007. [21]
          <string-name>
            <given-names>G.</given-names>
            <surname>Langdale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lemire</surname>
          </string-name>
          , Parsing gigabytes of json
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>per</surname>
            <given-names>second</given-names>
          </string-name>
          ,
          <source>The VLDB Journal</source>
          <volume>28</volume>
          (
          <year>2019</year>
          )
          <fpage>941</fpage>
          -
          <lpage>960</lpage>
          . [22] Bureau of Transportation Statistics, United States
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>time performance (1987-present</article-title>
          ),
          <year>2020</year>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          //www.transtats.bts.gov/Fields.asp?Table_ID=
          <fpage>236</fpage>
          . [23]
          <string-name>
            <surname>I. Grigorik</surname>
          </string-name>
          , Github archive, https://www.gharchive.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>org/</source>
          ,
          <year>2023</year>
          . [24]
          <string-name>
            <surname>Github</surname>
          </string-name>
          , Changelog - github docs,
          <year>2022</year>
          . URL: https:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>