1. Introduction

Total

Hyperspecialized Compilation for Serverless Data Analytics

Leonhard Spiegelberg

Tim Kraska

Malte Schwarzkopf

0 0 Brown University , Providence, Rhode Island 1 MIT , Cambridge, Massachusetts , USA

2231

Serverless functions can be spun up in milliseconds and scaled out quickly, forming an ideal platform for quick, interactive parallel queries over large data sets. Modern databases use code generation to produce efficient physical plans, but compiling such a plan on each serverless function is costly: every millisecond spent executing on serverless functions multiplies in cost by the number of functions running. Existing serverless data science frameworks therefore generate and compile code on the client, which precludes specializing this code to patterns that may exist in the input data of individual serverless functions. This paper argues for exploring a trade-off space between one-off code generation on the client, and hyperspecialized compilation that generates bespoke code on each serverless function. Our preliminary experiments show that hyperspecialization outperforms client-based compilation on typical heterogeneous datasets in both cost and performance by 2-4× .

1. Introduction

for example, to specialize the code to schema changes that occur over time, to constant-fold values that change rarely Designing an efficient data analytics framework that uti- (e.g., years), or to fit to other patterns in the input data, lizes serverless functions is challenging, as it must balance such as data sorted by categories. In other words, while parallelism, communication, and runtime costs. Many compiling the same code on each Lambda is wasteful, our modern databases and data analytics systems allow end- idea is to generate different specialized code paths on indiusers to write queries in familiar languages like SQL or vidual Lambda functions to offset compilation overheads Python, but generate code and compile these queries into by obtaining more efficient code for execution. As every native machine code for efcfiiency [ 1, 2, 3, 4, 5, 6, 7]. millisecond on a Lambda is expensive and comes at a Using compiled code in a serverless setting makes sense, premium over longer-running provisioned resources, it beas more efficient code directly lowers costs and avoids comes critical to hit the right trade-off between ahead-ofmerely parallelizing overheads [8]. Code generation, and time work on the client and the Lambdas and the runtime compilation into machine code naturally fit on the client, reductions realized. which knows the query and can generate code before Our approach, hyperspecialization, demonstrates that dispatching hundreds or thousands of parallel serverless compilation on individual Lambdas is feasible and benefunctions (“Lambdas” for short in the rest of this paper) ifcial to craft efcfiient data analytics frameworks on top that each operate over a part of the input data. Existing of serverless functions. We present preliminary results serverless frameworks like Starling [9] or Lambada [2] from a prototype hyperspecializing system, Viton, built therefore employ code generation on the client machine, on top of an existing analytics system for Python workand invoke Lambdas with the generated plan in form of a loads, Tuplex [1]. Our preliminary findings indicate that custom runtime executable or shared object, which avoids compilation for subsets on Lambdas can lead to both cost compilation costs on individual Lambdas. But what if we and efficiency improvements by 2–4 × . performed code generation and compilation on individual Lambda functions?

This fine-grained code generation and compilation al- 2. Motivation lows harnessing additional opportunities for performance optimization: as each Lambda processes a small part of Python became the dominant language for writing modern the input data (e.g., a day’s worth) and many datasets data science pipelines due to its rich universe of packages have shifting distributions and patterns over time, code and popular data processing frameworks like Pandas or generation can produce specialized, more efficient code if PySpark. Similarly, writing serverless functions in Python it knows the input data distribution. This allows a system, is attractive for data scientists, as the benefit of the quick launch of a Python runtime [ 10 ] together with the parallelism of thousands of serverless functions makes Python DJoaitnat BWaosrekss(hVoLpDsBaWt4’293th) —IntWeronraktsihoonpalonCSoenrfevreernlecses DonatVaeArynaLlyatrigces attractive for large-scale data processing when trying to (SDA’23), August 28 - September 1, 2023, Vancouver, Canada minimize end-to-end runtime. $ leonhard@brown.edu (L. Spiegelberg); kraska@mit.edu For example, PyWren [11] is a popular framework that (T. Kraska); malte@brown.edu (M. Schwarzkopf) combines Python, AWS Lambda serverless functions, and © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License storage via S3 without the need to provision a cluster first CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutiRon 4W.0Iontrekrnsathioonapl(CPCrBoYc4e.0e)d.ings (CEUR-WS.org) to run simple queries that can be expressed as a sequence cution as the serverless function parallelism increases and of map operations, with each map operation taking a user- per-function runtime shrinks, making it harder to amortize defined function (UDF) as a parameter. PyWren’s limited long compile times.

API only allows for simple data analytics workloads that Vectorized execution engines that rely on pre-compiled apply a UDF f to each of N input rows stored within primitives trade-off shorter compile time against missed S3, but it demonstrates that processing large quantities optimization potential for generated code and larger code via serverless functions relying on Python is feasible and size compared to fully-compiling, fine-grained execution scales nearly linearly. engines. Thus, it becomes difcfiult to provide both effi

However, this scalability comes at a cost: for increased cient code and low, interactive end-to-end query latency dataset sizes, the benefit of the Python runtime’s low by relying on a classic compiler. startup times gets eclipsed by the slow execution speed Heterogeneity and marginal optima. In cases where for the actual processing work in the Python UDFs. A the data distribution varies across subsets of the input data scientist might be tempted to simply increase the data, compiling different code paths may be beneficial. parallelism level to reduce runtime, but this could be an Generating individual code for subsets of the data would expensive mistake: each millisecond wasted due to slow allow a system to locally specialize and emit optimized execution rapidly multiplies by the number of Lambda code that may outperform a single, globally optimized functions invoked—e.g., spending an extra second on code path. By compiling different code paths in parallel on 5, 000 Lambdas on AWS with 1GB memory each trans- individual Lambdas, the system can also prevent stalling lates to an added $0.08 cost. Reducing end-to-end run- execution when all Lambdas would otherwise need to wait time by scaling up the parallelism may therefore end up on the physical plan to compile on the client machine. merely parallelizing Python overhead, hiding a higher- Given the HTTP request model of Lambdas, existing than-necessary total compute cost (in cycles and dollars). techniques involving multiple code-paths—such as on

A possible answer is to instead generate efcfiient ma- stack replacement, where an existing code-path is replaced chine code, similar to what an optimizing C/C++ compiler on-the-fly with a more performant version [ 13]—are chalmay produce. This is a tried-and-tested approach in a lenging to realize, as serverless environments allow only single-machine setting, but making it work for interactive for limited communication and synchronization between queries on Lambdas poses new challenges. individual Lambdas (or require extensive effort to overcome network limitations [14]), and provide no bidirectional communication channel to the client. 3. Code generation for Lambdas Pre-baking code in the form of specialized primitives, as proposed in micro-adaptivity [15], may benefit longCode generation improves runtime efficiency for queries running queries, but could also lead to high runtime costs at the expense of a one-time compile cost, which amor- when swapping between paths too often, or miss out on tizes when running over sufficiently large input data. In- optimization potential when relying on primitives that are deed, code generation (either fine-grained, or by tem- too coarse-grained. plating and combining query fragments) and subsequent Low startup times come from light runtimes. To compilation are a standard way to produce an efcfiient guarantee fast startup times, images for Lambda functions physical plan. In the serverless setting, this raises the should be as small as possible.1 A common optimization question where and how to generate and compile a query. is to use warmed-up instances by keeping “hot” contain

Code generation blocks query execution. Compil- ers around, via warmup calls or by paying a premium to ing on the client machine (or via a dedicated compilation the vendor (e.g., AWS Lambda provisioned functions). service) is cost-effective, as no Lambda functions are in- Caching techniques on the service side [16, 17, 18, 19] voked, but also limits the parallelism to the client machine or loading only necessary application code during runand blocks query execution until this machine finishes time [ 20 ] can also help to drive down overheads. compiling the plan. Generating C/C++ code is a popular Frameworks that are able to compile most of the userchoice because it makes code generation easy, but C/C++ supplied logic reduce the image size by including only compilers like Clang or GCC take a long time to generate minimal runtime and compile logic. This much reduces code with optimizations enabled. For example, Meta re- startup time compared to including a full language inports that its unified execution engine, Velox, which uses terpreter and all dependencies, even though it may reC/C++ templating and code generation, takes tens of sec- quire shipping compiled code from the client to individual onds to generate code, invoke a C/C++ compiler, and pro- Lambdas, or compiling code on them. duce a shared library to load into the execution engine [12].

While ahead-of time code generation for queries can be cost-effective, as shown in proof-of-concept engines like Starling [9], it can become the dominant cost in query exe

1Image size restrictions (e.g., 250MB on AWS Lambda) can be over

come using a container registry at the cost of higher startup time.

4. Hyperspecialization

The central idea of hyperspecialization is to generate bespoke, specialized code for each input slice rather than to rely on a single, global specialization. As emitting different code paths benefits only heterogeneous datasets, we focus on such in the following. For homogeneous datasets, a system would automatically disable hyperspecialization, or let users do so explicitly.

4.1. Challenges

Python program client 2

sample and globally pre-optimize 3 codegen global path and interpreter path The overall challenge of hyperspecialization is that any Figure 1: Viton system architecture: the client performs initial cost to perform hyperspecialization weighs against the sampling and code generation, but each serverless Lambda performance benefits of better-fitted code. In particular, function further samples and specializes to its particular input. a hyperspecializing query compiler must avoid situations where hyperspecialization performs worse than just a sin- an identical number of random samples from each group. gle, globally-generated code path. Picking random samples within a group avoids sampling

Balancing optimization cost. One key challenge is errors. Viton then detects whether the Lambda’s input to balance where the system generates, optimizes, and data distribution differs from the global distribution. If executes code. Typically, the client machine issuing the so, Viton triggers re-optimization of the complete stage query to each Lambda executor has limited parallelism assigned to the Lambda, which fits both logic and data repand a slow connection to a blob service like S3. However, resentation tightly to the concrete input data the Lambda any compute time spent on the client machine is essen- is about to process. tially free, whereas every single millisecond spent on a Lambda multiplies by the parallelism employed. Keeping overheads low on each Lambda is crucial, but spending 4.2. Design too much time on the client to generate and optimize code results in a slow query and a bad user experience.

In Viton, our hyperspecializing query compiler, we find a compromise: Viton performs a raw global optimization using a cheap sample on the client that it uses to split a query into stages, to project an initial set of columns, and to perform logical optimizations (like pushing filters through joins). Re-optimization on the Lambdas then resolves any initial sampling errors Viton may have incurred on the client and addresses heterogeneity within the input data. With this design choice, Viton balances the cost of too much optimization and code generation on a Lambda versus increased end-to-end time.

Balancing sampling cost. To generate a new specialized code-path, a Lambda must draw an input data sample for its specific input slice from S3. Controlling the sampling cost here is challenging, as the system must avoid issuing too many S3 requests and spending cycles parsing many rows, but must also ensure that the sample is representative. For example, sorted input data easily provokes sampling errors where using randomized sampling or sampling the first and last rows only.

Viton issues two S3 requests to get a block of fixed size of the start and end of a file to base the initial sample on.

To further reduce sampling cost, Viton uses stratified sampling instead of parsing all available rows in the received data blocks. With stratified sampling, Viton partitions the input data into groups (strata) of equal size, and draws We base the design of Viton on a setting in which a single client machine issues AWS Lambda requests for data stored in S3. Viton divides query execution into two steps when it comes to planning, reflected in the overall system architecture (Figure 1). In a first step, which executes on the client, Viton draws a small initial sample from S3 to estimate an initial data distribution for the query to perform initial query planning steps, like detecting the schema, deciding which stages to generate, and collecting globally helpful statistics to derive a global physical plan.

Viton intentionally keeps the sampling on the client cheap, as it expects hyperspecialization to adapt the query during execution. Viton also generates and compiles a general code path that is globally optimized and serves as a fallback on each Lambda executor when subsets are similar in distribution or hyperspecialization on an executor fails.

Viton then executes each stage using parallel Lambda executors. With hyper-specialization mode active, Viton assigns each Lambda a specialization unit. While there may be different strategies on how to identify and assign specialization units, in Viton, each input file serves as a specialization unit. We base this choice on the assumption that data sets are often partitioned by initial attributes, such as time. Thus, individual files marginalize the data distribution such that marginal distributions have overall lower variance. For historical data, this is typically the time of collection, but other schemes exist (e.g., categorical grouping or sorted data).

In the second step, each Lambda draws a new sample

and re-optimizes the stage if the data distribution differs from the global sample. In order to re-optimize a stage on a Lambda executor, Viton ships logical operators together with associated UDFs in the form of lightly annotated abstract syntax trees (ASTs).

Specializing code on each Lambda on the new sample allows the specialization to combine logical with compiler optimizations, with each potentially benefiting the other. For example, a UDF may require different input columns to be parsed for input data from different years, but a globally optimized pipeline would always parse all the union of all required input columns. By re-optimizing the code locally and detecting common branches (a compiler optimization), Viton avoids parsing unnecessary columns in the first place (a logical pushdown optimization). Likewise, Viton could remove operators that become dead code, or reorder filters based on patterns in the data.

To make hyperspecialization work, the cost of executing all these steps has to be low enough to be offset by a performance gain through a more efficient code path. Viton uses aggressive optimizations, which may work for a subset of the data, but would likely fail if applied globally.

4.3. Optimizations Viton adds two additional, aggressively-specializing spec

ulative optimizations to those already in Tuplex [1].

Constant folding applies when an input data column is constant (e.g., a year or month), and allows Viton to remove deserialization of constant data and eliminates unnecessary code. While constant folding is a well-known compiler optimization, Viton applies it as a logical optimization to avoid deserialization.

Filter promotion assumes that a filter condition always holds or fails, which reduces code complexity by eliminating any future checks on the filter condition and allows Viton to base other optimizations only on sample rows that pass the filter. In the best case, filter promotion fully collapses individual operators.

These two optimizations are examples of a broader class of speculative optimizations that may be effective locally on subsets of a dataset. They also benefit logical optimizations when, e.g., they reduce the set of input columns required.

4.4. Implementation We implemented our Viton prototype on top of Tuplex [1].

Creating Viton required adding support for more aggressive optimizations that can exploit properties of marginal distributions, and extending the early-stage Lambda backend of Tuplex to support shipping stages in the form of abstract syntax trees (ASTs) to Lambda executors. For this, we implemented a custom AWS Lambda runtime as this was more efficient in micro-benchmarks than building on top of existing runtimes in AWS Lambda. In addition to implementing per-Lambda, per-input file sampling, and hyperspecialized code generation, Viton also adds support for semi-structured JSON files with a parser built on top of simdjson [ 21 ].

5. Preliminary Results We configure each Lambda to run a single Viton executor

that uses up to 10 GB of memory and a maximum of three threads. As of June 2023, a Lambda instance with 10 GB of memory has six vCPUs, three of which we use for processing and three for S3. We run the client on a single r5d.xlarge EC2 instance. For our preliminary evaluation, we evaluate two queries.

Flights query. This query performs data cleaning over the flights dataset [ 22], but imputes missing values for delay factors prior to 06/2003, and retrieves a cleaned result for the years 2002–2005. Due to a schema change, delay information prior to 06/2003 was collected only as a single, aggregate delay factor in the form of one column which then changed into collecting detailed information breaking down delays into several delay factors using additional columns. The input data consists of 410 files (83.51 GB total) with sizes from 177–284 MB, each containing data for one month between 10/1987 to 11/2021.

Github query. The second query analyzes historical data in the Github Archive dataset collected from Github since February, 2011 [23], which contains raw information about 20+ events. Within this dataset, data is organized as newline-delimited JSON files for each day. Schema changes due to introduction of new fields are frequent (e.g., there are 3,748 changes over 417 days [24]). In addition, the schema of each row varies depending on the event type and time of collection, as data collection used multiple APIs with different response schemas over time. Due to resource constraints, we limit out experiment to a subset of eleven files for October 15th of each year (35.5GB total). We run a query that, for each fork event, extracts the number of commits, original repository ID, and when a fork happened.

Results. We evaluate the potential of hyperspecialization by measuring the runtime improvements that specialized code paths provide. We keep files in each dataset partitioned as they were in the original dataset, including heterogeneous input file sizes, and measure performance without hyperspecialization (i.e., vanilla Tuplex [1]), with hyperspecialization using only Tuplex’s existing optimizations (e.g., speculating on NULL values), and with aggressive hyperspecialization, which adds the two new optimizations from §4.3. A good result would show hyperspecialization reducing the querys’ end-to-end runtime and monetary cost. 125 s in100 e m it 75 d n -eo50 -t d en25

Afterwards, Viton’s Lambdas spend about 44% of their

7–10 second execution time on hyperspecialization (Figure 3a), and the remainder of execution time processing data. Figures 3b and 4 further break down the time spent on Lambdas. Sampling takes about one second and code generation and compilation take about 1.2 seconds per Lambda. Importantly, it was necessary to restrict ourselves to a set of cheap LLVM optimizations in order to achieve this quick optimization time. Combined with other overheads, the total overheads of hyperspecialization come to 2.43 seconds, while 2.86 seconds are spent running the specialized fast path, and 0.15 seconds on the general compiled code path or in the interpreter. The initial time spent on the client could be reduced by caching information about files stored in S3, as the client spends most of the time accessing S3.

These results indicate that hyperspecialization is effective and can amortize its overheads sufficiently to provide end-to-end runtime reduction and cost savings.

6. Conclusion and Outlook

(b) Cumulative time spent on Lambda executors.

In this paper, we introduced the idea of hyperspecializa

tion. Our preliminary results indicate that hyperspecial

Figure 2 shows the results. Hyperspecialization both ization is a promising direction. Further work will need makes existing optimizations more impactful (“hyperspe- to investigate several research questions. cialization”), reducing runtime by 1.25–2× , and enables What specialization unit size to pick? We want to extra, aggressive optimizations (“aggr. hyperspecializa- quickly identify large, distinct subsets of input data and tion”) that further reduce runtime for a total runtime gain compile efficient code for them. However, optimizing too of 2.3–3× . These reduced runtimes translate into 2.3– narrowly may fail to amortize the overheads of hyperspe2.8× lower cost-per-query. The github query sees larger cialization despite improvements in performance. New improvements from the extra optimizations, as it benefits techniques to identify regions for which hyperspecializafrom both filter promotion and constant folding, while the tion is a good idea and for a query optimizer to utilize this lfights query only benefits from constant folding. These information are needed. early results indicate that generating hyperspecialized How to handle scenarios where compilation cost is code paths for sufcfiiently large specialization units can high? Interpreters with JIT-compilation support typically yield overall improvements in cost and performance, amor- compile only small code regions like individual loops or tizing any overheads incurred. functions. But query compilation for a full query can

Breakdown. We now break down an individual run of become prohibitively expensive. Automating the process the flights query to understand the overheads of hyperspe- of detecting when to perform costly compilation within a cialization. Figure 3 shows a timeline of the query. Viton serverless setting, and what optimizations are affordable, spends 6.9 seconds on the client retrieving data from S3, is part of the set of research questions we are just starting sampling globally, and generating the global code path. to understand better.

This research was supported by a Meta PhD fellowship.

We thank Ben Givertz, Yunzhi Shao, Andrew Wei, Rhea Goyal, Shreeyash Gotmare, Khemarat (March) Boonyapaluk and Rahul Yesantharao for their contributions to Viton’s implementation. This research was supported by NSF awards DGE-2039354 and IIS-1453171, and by funding from Google and VMware. [8] F. McSherry, M. Isard, D. G. Murray, Scalability! but at what COST?, in: Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HotOS), 2015. [9] M. Perron, R. Castro Fernandez, D. DeWitt, S. Madden, Starling: A scalable query engine on cloud functions, in: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 131–141. URL: https://doi.org/10.1145/3318464. 3380609. doi:10.1145/3318464.3380609. [1] L. Spiegelberg, R. Yesantharao, M. Schwarzkopf, [ 10 ] D. Jackson, G. Clynch, An investigation of the imT. Kraska, Tuplex: Data Science in Python at Native pact of language runtime on the performance and Code Speed, Association for Computing Machinery, cost of serverless functions, in: 2018 IEEE/ACM New York, NY, USA, 2021, p. 1718–1731. URL: International Conference on Utility and Cloud Comhttps://doi.org/10.1145/3448016.3457244. puting Companion (UCC Companion), 2018, pp. [2] I. Müller, R. Marroquín, G. Alonso, Lambada: In- 154–160. doi:10.1109/UCC-Companion.2018. teractive data analytics on cold data using serverless 00050. cloud infrastructure, in: Proceedings of the 2020 [11] E. Jonas, Q. Pu, S. Venkataraman, I. Stoica, ACM SIGMOD International Conference on Man- B. Recht, Occupy the cloud: Distributed computing agement of Data, SIGMOD ’20, Association for for the 99%, in: Proceedings of the 2017 SympoComputing Machinery, New York, NY, USA, 2020, sium on Cloud Computing, 2017, pp. 445–451. p. 115–130. URL: https://doi.org/10.1145/3318464. [12] P. Pedreira, O. Erling, M. Basmanova, K. Wilfong, L. Sakka, K. Pai, W. He, B. Chattopadhyay, Velox: [3] J3.38S9o7m5p8o.ldsokii:,10M..11Z4u5k/o3w3s1k8i,4P6.4.B3o3n8cz9,758V.ector- meta’s unified execution engine, Proceedings of the ization vs. compilation in query execution, in: VLDB Endowment 15 (2022) 3372–3384. Proceedings of the Seventh International Work- [13] G. M. Essertel, R. Y. Tahboub, T. Rompf, On-stack shop on Data Management on New Hardware, Da- replacement for program generators and source-toMoN ’11, Association for Computing Machinery, source compilers, in: Proceedings of the 20th ACM New York, NY, USA, 2011, p. 33–40. URL: https: SIGPLAN International Conference on Generative //doi.org/10.1145/1995441.1995446. doi:10.1145/ Programming: Concepts and Experiences, GPCE 2021, Association for Computing Machinery, New [4] 1T9.9N54eu4m1.a1n9n,9544E6f.cfiiently compiling efcfiient York, NY, USA, 2021, p. 156–169. URL: https:// query plans for modern hardware, Proc. VLDB doi.org/10.1145/3486609.3487207. doi:10.1145/ Endow. 4 (2011) 539–550. URL: https://doi. 3486609.3487207. org/10.14778/2002938.2002940. doi:10.14778/ [14] M. Wawrzoniak, I. Müller, R. Fraga Barcelos Paulus Bruno, G. Alonso, Boxer: Data analytics [5] 2K0. 0K2r9ik3e8l.la2s0,0S2. 9D4.0V.iglas, M. Cintra, Generating on network-enabled serverless platforms, in: 11th code for holistic query evaluation, in: 2010 IEEE Annual Conference on Innovative Data Systems Re26th International Conference on Data Engineering search (CIDR 2021), 2021.

(ICDE 2010), IEEE, 2010, pp. 613–624. [15] B. Ra˘ducanu, P. Boncz, M. Zukowski, Micro [6] R. Y. Tahboub, G. M. Essertel, T. Rompf, How adaptivity in vectorwise, in: Proceedings of the to architect a query compiler, revisited, in: Pro- 2013 ACM SIGMOD International Conference ceedings of the 2018 International Conference on on Management of Data, SIGMOD ’13, AssociManagement of Data, SIGMOD ’18, Association for ation for Computing Machinery, New York, NY, Computing Machinery, New York, NY, USA, 2018, USA, 2013, p. 1231–1242. URL: https://doi.org/10. p. 307–322. URL: https://doi.org/10.1145/3183713. 1145/2463676.2465292. doi:10.1145/2463676. [7] 3W1.9Z6h8a9n3g. ,dJo.iK:1i0m.,1K1.4A5/.3R1o8ss3,7E1.3S.e3d1la9r6,L89.3S.tadler, [16] 2D4. 6D5u2,92T.. Yu, Y. Xia, B. Zang, G. Yan, C. Qin, Adaptive code generation for data-intensive ana- Q. Wu, H. Chen, Catalyzer: Sub-millisecond lytics, Proc. VLDB Endow. 14 (2021) 929–942. startup for serverless computing with initializationURL: https://doi.org/10.14778/3447689.3447697. less booting, in: Proceedings of the Twenty-Fifth International Conference on Architectural Support for doi:10.14778/3447689.3447697. Programming Languages and Operating Systems,

ASPLOS '20 , Association for Computing Machin-

ery , New York, NY, USA, 2020 , p. 467 - 481 . URL:

https://doi.org/10.1145/3373376.3378512. doi:10.

1145 /3373376.3378512. [17]

Singhvi ,

Balasubramanian ,

Houck , M. D.

ing, SoCC '21, Association for Computing Machin-

ery , New York, NY, USA, 2021 , p. 138 - 152 . URL:

https://doi.org/10.1145/3472883.3486981. doi:10.

1145 /3472883.3486981. [18]

Mvondo ,

Bacou ,

Nguetchouang , L. Ngale,

EuroSys '21 , Association for Computing Machin-

ery , New York, NY, USA, 2021 , p. 228 - 244 . URL:

https://doi.org/10.1145/3447786.3456239. doi:10.

1145 /3447786.3456239. [19]

Ao , G. Porter,

G. M.

Voelker , Faasnap: Faas

Computer

Systems , EuroSys '22, Association for

Computing

Machinery , New York, NY, USA, 2022 ,

p. 730 - 746 . URL: https://doi.org/10.1145/3492321.

3524270. doi: 10 .1145/3492321.3524270. [20]

Liu ,

Wen ,

Chen ,

Li ,

Chen , Y. Liu,

Softw. Eng. Methodol. ( 2023 ). URL: https://doi.org/

10.1145/3585007. doi: 10 .1145/3585007. [21]

Langdale ,

Lemire , Parsing gigabytes of json

per

second

, The VLDB Journal 28 ( 2019 ) 941 - 960 . [22] Bureau of Transportation Statistics, United States

time performance (1987-present ), 2020 . URL: https:

//www.transtats.bts.gov/Fields.asp?Table_ID= 236 . [23] I. Grigorik , Github archive, https://www.gharchive.

org/ , 2023 . [24] Github , Changelog - github docs, 2022 . URL: https: