=Paper= {{Paper |id=Vol-3163/paper5 |storemode=property |title=Efficient External Sorting in DuckDB |pdfUrl=https://ceur-ws.org/Vol-3163/BICOD21_paper_9.pdf |volume=Vol-3163 |authors=Laurens Kuiper,Mark Raasveldt,Hannes Mühleisen |dblpUrl=https://dblp.org/rec/conf/bncod/KuiperRM21 }} ==Efficient External Sorting in DuckDB== https://ceur-ws.org/Vol-3163/BICOD21_paper_9.pdf
Efficient External Sorting in DuckDB
Laurens Kuiper, Mark Raasveldt and Hannes Mühleisen
CWI, Amsterdam


                                          Abstract
                                          Interactive data analysis is often conveniently done on personal computers that have limited memory. Current analytical data
                                          management systems rely almost exclusively on main memory for computation. When the data size exceeds the memory
                                          limit, many systems cannot complete queries or resort to an external execution strategy that assumes a high I/O cost. These
                                          strategies are often much slower than the in-memory strategy. However, I/O cost has gone down: Most modern laptops have
                                          fast NVMe storage. We believe that the difference between in-memory and external does not have to be this big. We implement
                                          a parallel external sorting operator in DuckDB that demonstrates this. Experimental results with our implementation show
                                          that even when the data size far exceeds the memory size, the performance loss is negligible. From this result, we conclude
                                          that it is possible to have a graceful degradation from in-memory to external sorting.

                                          Keywords
                                          Sorting, Parallel Sorting, In-Memory Sorting, Disk-Based External Sorting, Relational Databases



1. Introduction                                                                                                           pointer   intA
                                                                                                                                           Row Layout
                                                                                                                                             stringA    intB     stringB
                                                                                                                          0x0001    37       0x0001     42       0x0002
It is not uncommon for database systems to have hun-                                                                      0x0003    37       0x0003     66       0x0004
                                                                                                                          0x0005    42       0x0005     66       0x0006
dreds or even thousands of gigabytes of RAM at their                                                                                                  Row Heap
disposal. High-performance systems such as HyPer [1],                                                                                             radix pointer
and ClickHouse [2] fully utilize the available memory                                                                                             CWI swizzling
and perform much better on analytical workloads than                                                                                              DuckDBLabs  goose

their traditional disk-based counterparts. Because these                                                           Figure 1: DuckDB’s row layout and row heap.
systems usually run on machines with such large mem-
ory capacities, the assumption is often that the workload
fits in memory.
   While laptops have also enjoyed increased memory                                                                memory. Fast queries may become slow or run into an
capacity, their physical design has limited space. There-                                                          error when a table grows in size, creating a frustrating
fore they typically have only 16GB of memory. Laptops                                                              experience for users.
are often used in interactive data analysis, with tools like                                                          We can mitigate this problem by implementing oper-
Pandas [3] and dplyr [4], showing that there is a need for                                                         ators such that they optimally use the amount of avail-
analytical data management technology that runs on a                                                               able memory and only write data to disk when this is
laptop. However, these tools operate only in memory. As                                                            necessary. I/O quickly becomes the bottleneck on ma-
a result, users cannot process datasets that are slightly                                                          chines with low-bandwidth storage devices. However,
larger than memory, on their own machine.                                                                          most modern laptops have nVME storage with high write
   Disk-based database systems, on the other hand,                                                                 speeds, making I/O less of a limiting factor.
have long solved the problem of processing larger-than-                                                               We have implemented a parallel, external sorting op-
memory datasets. These systems are generally much                                                                  erator in DuckDB [5] that demonstrates this. Our im-
slower than in-memory systems on analytical workloads.                                                             plementation seamlessly transitions from in-memory to
When a user wants to process a larger-than-memory                                                                  external sorting by storing data in buffer-managed blocks
dataset using an in-memory system, usually one of two                                                              that are offloaded to disk using a least-recently-used
things happens 1) The system throws an error stating it                                                            queue, similar to LeanStore [6].
is out of memory, 2) The system switches to an external                                                               Transitioning from in-memory to disk is made pos-
strategy that is much less efficient than the in-memory                                                            sible by DuckDB’s unified internal row layout, shown
strategy, which results in a slow execution time, even                                                             in Figure 1, which can be spilled to disk using pointer
when, for example, the input is only 10% larger than                                                               swizzling [7].
                                                                                                                      We compare our implementation against four other
BICOD’21: British International Conference on Databases, December                                                  systems using an improvised relational sorting bench-
9–10, 2021, London, UK                                                                                             mark on two tables from TPC-DS [8]. Our implementa-
Envelope-Open laurens.kuiper@cwi.nl (L. Kuiper); raasveld@cwi.nl                                                   tion achieves excellent performance when data fits in
(M. Raasveldt); hannes.muehleisen@cwi.nl (H. Mühleisen)
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative   memory and shows a graceful degradation in perfor-
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR Workshop Proceedings (CEUR-WS.org)                                        mance as we go over the limit of available memory.
2. Sorting Relational Data                                       (a)
                                                                        birth country
                                                                        NETHERLANDS
                                                                                                birth year
                                                                                                1992
                                                                        GERMANY                 1924
Sorting is one of the most well studied problems in com-                birth country                              birth year
puting science. Research in this area forms the basis of         (b)    78 69 84 72 69 82 76 65 78 68 83 0         200 7 0 0
                                                                        71 69 82 77 65 78 89 0                     132 7 0 0
sorting in database systems but focuses mostly on sorting               binary string
large arrays or key/value pairs. Sorting is more complex         (c)    177 186 171 183 186 173 179 190 177 187 172 255 128 0 7 200
for relational data as many different types need to be                  184 186 173 178 190 177 166 255 255 255 255 255 128 0 7 132

supported, as well as NULL values. There can also be          Figure 2: Binary string encoding. The original data in (a)
multiple order clauses. Besides sorting the order clause      is represented as (b) on a little-endian machine. It is en-
(key) columns, all other selected (payload) columns need      coded as (c) when ordered for a query with order clauses
to be re-ordered as well.                                     c _ b i r t h _ c o u n t r y D E S C , c _ b i r t h _ y e a r A S C . Descending order
    In 2006, Goetz Graefe surveyed sorting in database sys-   flips the bits. The string “GERMANY” is padded to ensure
tems [9]. The most important takeaway from this survey        fixed size.
when it comes to performance is that the cost of sort-
ing is dominated by comparing values and re-ordering
data. Anything that makes either of these two operations         A few relational operators are inherently row-based,
cheaper will have an impact on the overall speed.             such as joins and aggregations. For vectorized execution
    There are two obvious ways to go about implementing       engines, it is common practice to physically convert vec-
a comparator in a column-store when we have multiple          tors to and from a row layout for these operators using
O R D E R B Y clauses:                                        scatter-gather. We argue that this layout should be used
                                                              for sorting, as sorting is also a row-based operator.
    1. Iterate through the clauses: Compare columns
                                                                 We show DuckDB’s row layout in Figure 1. Rows have
       until we find one that is not equal, or until we
                                                              a fixed size that can be easily re-ordered while sorting.
       have compared all columns. This comparator
                                                              We represent variable-sized columns with pointers into
       jumps between columns, causing random access
                                                              a string heap where the data resides. The heap data does
       in memory.
                                                              not have to be re-ordered while sorting in memory. Each
    2. Sort the data entirely by the first clause, then
                                                              fixed-size row has an additional p o i n t e r field that points
       sort by the second clause, but only where the
                                                              to its “heap row”. This pointer is not used for in-memory
       first clause was equal, and so on. This approach
                                                              sorting but is crucial for external sorting, which we will
       requires multiple passes over the data.
                                                              explain in section 2.2.
   The binary string comparison technique [10] improves
sorting performance by simplifying the comparator. It         2.1. Parallel Sorting
encodes all columns in the ORDER BY clause into a single
binary sequence that, when compared using m e m c m p will    DuckDB uses Morsel-Driven Parallelism [11], a frame-
yield the correct overall sorting order. This encoding        work for parallel query execution. For the sorting oper-
also yields the correct order with a byte-by-byte Radix       ator, this means that multiple threads collect a roughly
sort. Although this technique has existed since the days      equal amount of data, in parallel, from the input table.
of System R, not many systems use it today and opt for        We use this parallelism by letting each thread sort its
one of the ways listed above.                                 collected data using Radix sort. After this initial sorting
   We implement this comparator and opt for fixed-size        phase, each thread has one or more sorted blocks of data,
encodings, which can be more easily re-ordered. For           which must be combined into the final sorted result using
variable-size types such as strings, we can therefore only    merge sort.
encode a prefix. We compare the whole string only when           There are two main ways of implementing merge sort:
prefixes are equal. The encoding is shown in Figure 2.        K-way and Cascade merge. The K-way merge merges 𝐾
   Not shown in the figure are NULL values and colla-         lists into one sorted list in one pass and is traditionally
tions. NULL values are handled by prefixing each value        used for external sorting because it minimizes I/O [12].
with an additional byte denoting whether the value is         Cascade merge is used for in-memory sorting because it
NULL. Collations are handled by evaluating the collation      is more efficient than K-way merge. It merges two lists of
function before encoding the prefix of the string.            sorted data at-a-time until only one sorted list remains.
   The other high cost of sorting is re-ordering data. Sys-      Recent work on K-way external merge sort [13] on
tems that use columnar storage must re-order all selected     devices with flash memory reduces execution time by
columns, which causes a random access pattern for each.       20% to 35% compared to standard external merge sort.
Row-based systems only have to pay the cost of this ran-      Salah et al. [14]show that K-way merge can achieve better
dom access pattern once. When we select many columns,         performance than cascaded merge when it comes to in-
this becomes a considerable advantage.                        place sorting. This work on K-way merging may seem
attractive for our implementation, but cascaded merge is               3. Evaluation
still a more attractive option when it comes to in-memory
sorting. We also need to deal with variable-size data,      In this section, we evaluate DuckDB’s sorting implemen-
which complicates in-place sorting. We do not want          tation. We compare against ClickHouse [2], HyPer [1],
to compromise in-memory sorting and choose cascade          Pandas [3] and SQLite [16]. HyPer and ClickHouse are
merge.                                                      full-blown analytical database systems that focus on par-
   Cascade merge is embarrassingly parallel when there      allel in-memory computation. Pandas is single-threaded
are many more sorted blocks than available threads. As      and in-memory, and SQLite is a more traditional (single-
the blocks get merged, there will not be enough blocks      threaded) disk-based system.
to keep all threads busy. In the final round, when we           To demonstrate how the systems perform in an envi-
merge two blocks to create the final sorted result, there   ronment with limited RAM, we run our experiments on
is no parallelism: One thread processes all the data. We    a 2020 MacBook Pro. It has 16GB of memory and a fast
parallelize this using Merge Path [15]. Merge path pre-     SSD with a write speed of over 3GB/s. HyPer does not
computes where blocks intersect, creating partitions that   yet run natively on ARM CPUs, so we emulate it using
can merge independently of each other. Binary search        Rosetta 2. See our blog [17] for an experiment on x86
efficiently computes these partitions.                      CPU architecture.
                                                                Benchmarking sorting in database systems is not
                                                            straightforward. We would like to measure only the time
2.2. External Sorting                                       it takes to sort, with as little noise as possible. There-
To sort more data than fits in memory, we write blocks fore, we cannot use S E L E C T queries, as the client-server
of sorted data to disk. Rather than actively doing this in protocol will quickly dominate query runtime [18].
our sorting implementation, DuckDB’s buffer manager             To approach a fair comparison, we measure the
decides when to do this: When memory is full.               end-to-end time of queries that sort the data and write
   Writing data to disk is trivial for fixed-size types but the result to a temporary table, i.e., C R E A T E T E M P O R A R Y
non-trivial for variable-size types, as the pointers in our T A B L E o u t p u t A S S E L E C T . . . F R O M . . . O R D E R B Y . . . ; .
row layout will be invalidated. To be able to offload For Pandas we will use s o r t _ v a l u e s with i n p l a c e = F a l s e
variable-sized types as well, we use pointer swizzling. to mimic this query. To measure stable end-to-end query
When we are sorting externally, we convert the pointers time, we run each query 5 times and report the median
in the row layout to offsets, shown in Figure 3.            run time. Scripts for our experiments are available on
                                                            GitHub1 .
                        Row Layout
                                                                We have created a relational sorting benchmark on
        offset   intA      stringA   intB      stringB
        0        37        0         42        5            the c u s t o m e r and c a t a l o g _ s a l e s tables from TPC-DS [8].
        12       37        0         66        3            The row counts at different scale factors are shown in
        24       42        0         66        10           Table 1.
                                       Row Heap
                                   radix pointer                                     scale factor    catalog_sales        customer
                                   CWI swizzling
                                                                                          10          14.401.261           500.000
                                   DuckDBLabs  goose
                                                                                         100          143.997.065         2.000.000
Figure 3: DuckDB’s internal row layout (swizzled).                                       300          260.014.080         5.000.000
                                                                       Table 1
                                                                       TPC-DS table row count for c u s t o m e r and c a t a l o g _ s a l e s at
   We replace the 8-byte pointer field with an 8-byte
                                                                       scale factor 10, 100, and 300.
offset, which denotes where the strings of this row reside
in the heap block. We also replace the pointers to the
string values within the row with an 8-byte relative offset.               TPC-DS tables are challenging for sorting implemen-
This offset denotes how far this particular string is located          tations because they are wide (many columns, unlike the
from the start of this row’s heap row. Using relative                  tables in TPC-H) and have a mix of fixed- and variable-
offsets within rows rather than absolute offsets is very               sized types. c a t a l o g _ s a l e s has 34 columns, all fixed-size
useful during sorting: These relative offsets stay constant            types: integer and double. c u s t o m e r has 18 columns,
and do not need to be updated when we copy the row.                    fixed- and variable-size: 10 integers, 8 strings.
   With this dual-purpose row-wise representation, we                      We sort the c a t a l o g _ s a l e s table on c s _ q u a n t i t y and
achieve an almost seamless transition between in-                      c s _ i t e m _ s k , and select an increasing number of payload
memory and external sorting. The only difference be-                   columns. This experiment tests both the system’s ability
tween the two is swizzling and unswizzling each pointer                to sort and re-order the payload. We show the results of
once and re-ordering the heap during sorting.                          this experiment in Figure 4.
                                                                            1
                                                                                https://github.com/lnkuiper/experiments/tree/master/sorting
                                      sf = 10                           sf = 100                                           sf = 100                            sf = 300
                     10.0                               600               50%        100%                                                       5
                                                                                                              3
                      7.5                               400
                                                                                                                                                4
          time (s)                                                                                                                              3




                                                                                                       time (s)
                      5.0                                                                                     2
                                                        200
                      2.5                                                                                                                       2
                                                          0                                                   1
                              5    10 15 20 25 30                 5   10 15 20 25 30                                                            1
                            number of payload columns           number of payload columns
                      ClickHouse          DuckDB        HyPer           Pandas         SQLite                 0                                 0
                                                                                                                  CH DuckDBHyPer PandasSQLite       CH DuckDBHyPer PandasSQLite
                                                                                                                                      integer        string
Figure 4: c a t a l o g _ s a l e s ordered by c s _ q u a n t i t y , c s _ i t e m _ s k ,
with an increasing number of payload columns. These                                             Figure 5: c u s t o m e r ordered by different column types: In-
columns have few unique values, creating a difficult challenge                                  teger columns (c _ b i r t h _ y e a r , c _ b i r t h _ m o n t h , c _ b i r t h _ d a y ),
for sorting algorithms. The grey vertical lines in the SF100                                    and string columns (c _ f i r s t _ n a m e , c _ l a s t _ n a m e ).
plot indicate at which points the payload columns take up
50% and 100% of the amount of available memory.

                                                                We show the result of this experiment in Figure 5.
                                                                   As expected, sorting by strings is slower than sorting
   The table fits in memory at SF10, and the systems’ by integers for most systems. In this experiment, the
performances are in the same ballpark, except for SQLite, payload also includes string columns. Pandas has an ad-
and DuckDB is the clear winner. At SF100, around 14 vantage here because it already has the strings in memory,
selected payload columns, the input table takes up 50% and most likely only needs to re-order pointers to these
of memory. It is common for systems to not sort data strings. The database systems need to copy strings twice:
in place, but copy it to a new location, which requires Once when reading the input table, and again when cre-
double the amount of memory. The figure clearly shows ating the output table. Profiling in DuckDB reveals that
this: All systems except DuckDB and SQLite run into an the actual sorting takes less than a second at SF300, and
error due to running out of memory and are unable to most time is spent on (de)serializing strings. See [17]
complete the benchmark.                                         for more details on the difference between integers and
   ClickHouse switches to an external sorting strategy, strings.
which is much slower than its in-memory strategy. There-
fore, adding a few payload columns results in a runtime
that is orders of magnitude higher. Despite switching 4. Conclusion and Future Work
strategy, ClickHouse runs into an out-of-memory error.
HyPer uses the m m a p system call, which creates a map- In this paper, we presented our parallel external sort-
ping between a block of memory and a file, which allows ing implementation in DuckDB. We compared it against
HyPer to continue for a while when the data no longer four data management systems using a relational sorting
fits in memory, before running into an error as well. As benchmark based on TPC-DS. Three of the four systems
we can see, the runtime becomes very slow before HyPer perform well in memory, but crash as the data goes over
runs out of memory: Random memory access becomes the amount of available memory. DuckDB is the only sys-
random disk access.                                             tem under benchmark that performs well both in memory
   Surprisingly, Pandas can load the dataset at SF100 be- and external. These results demonstrate that it is possi-
cause macOS dynamically increases swap size. Most ble to implement a sorting operator that is efficient in
operating systems do not do this, and Pandas will not memory and has a graceful degradation in performance
load the dataset at all. Pandas relies on NumPy’s [19] as the input size exceeds the memory limit.
single-threaded quicksort implementation. Pandas shows
impressive performance, partly because it already has 4.1. Future Work
the input data fully materialized in memory and does not
have to stream data through an execution pipeline from It is unclear how each technique contributed to end-to-
the input to the output table like most DBMSes.                 end performance. Quantifying these contributions e.g.
   Meanwhile, DuckDB and SQLite do not show a visible through simulation is an area of future research.
difference in performance when where data no longer fits           Our sorting implementation uses a row layout that can
in memory. SQLite always opts for a traditional external be offloaded to storage using pointer swizzling. Other
sorting strategy, resulting in a robust, but overall slower blocking operators could benefit from this layout. For in-
performance than DuckDB.                                        stance, join, aggregation and window. Incorporating this
   In our next benchmark, on the c u s t o m e r table, we test layout would enable external computation for these op-
how well the systems can sort by strings and by integers. erators as well. Implementing these operators such that
Both comparing and re-ordering strings are much more their performance degrades gracefully as the input size
expensive than comparing and re-ordering numeric types. exceeds the memory limit is an area of future research.
References                                                                 https://doi.org/10.1145/564691.564759. doi:1 0 . 1 1 4 5 /
                                                                           564691.564759.
[1] A. Kemper, T. Neumann, HyPer: A hybrid                             [9] G. Graefe, Implementing Sorting in Database
    OLTP&OLAP main memory database system based                            Systems, ACM Comput. Surv. 38 (2006) 10–es.
    on virtual memory snapshots, in: S. Abiteboul,                         URL:          https://doi.org/10.1145/1132960.1132964.
    K. Böhm, C. Koch, K. Tan (Eds.), Proceedings of the                    doi:1 0 . 1 1 4 5 / 1 1 3 2 9 6 0 . 1 1 3 2 9 6 4 .
    27th International Conference on Data Engineering,                [10] M. W. Blasgen, R. G. Casey, K. P. Eswaran, An
    ICDE 2011, April 11-16, 2011, Hannover, Germany,                       Encoding Method for Multifield Sorting and In-
    IEEE Computer Society, 2011, pp. 195–206. URL:                         dexing, Commun. ACM 20 (1977) 874–878. URL:
    https://doi.org/10.1109/ICDE.2011.5767867. doi:1 0 .                   https://doi.org/10.1145/359863.359892. doi:1 0 . 1 1 4 5 /
    1109/ICDE.2011.5767867.                                                359863.359892.
[2] B. Imasheva, A. Nakispekov, A. Sidelkovskaya,                     [11] V. Leis, P. Boncz, A. Kemper, T. Neumann, Morsel-
    A. Sidelkovskiy, The Practice of Moving to Big                         Driven Parallelism: A NUMA-Aware Query Evalu-
    Data on the Case of the NoSQL Database, Click-                         ation Framework for the Many-Core Age, in: Pro-
    House, in: H. A. L. Thi, H. M. Le, T. P. Dinh (Eds.),                  ceedings of the 2014 ACM SIGMOD International
    Optimization of Complex Systems: Theory, Models,                       Conference on Management of Data, SIGMOD
    Algorithms and Applications, WCGO 2019, World                          ’14, Association for Computing Machinery, New
    Congress on Global Optimization, Metz, France, 8-                      York, NY, USA, 2014, p. 743–754. URL: https://doi.
    10 July, 2019, volume 991 of Advances in Intelligent                   org/10.1145/2588555.2610507. doi:1 0 . 1 1 4 5 / 2 5 8 8 5 5 5 .
    Systems and Computing, Springer, 2019, pp. 820–828.                    2610507.
    URL: https://doi.org/10.1007/978-3-030-21803-4_82.                [12] D. Knuth, The Art Of Computer Programming, vol.
    doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 0 - 2 1 8 0 3 - 4 \ _ 8 2 .        3: Sorting And Searching, Addison-Wesley, 1973.
[3] W. McKinney, et al., Data structures for statistical              [13] R. Jackson, J. Gresl, R. Lawrence, Efficient Exter-
    computing in Python, in: Proceedings of the 9th                        nal Sorting for Memory-Constrained Embedded De-
    Python in Science Conference, volume 445, Austin,                      vices with Flash Memory, ACM Trans. Embed. Com-
    TX, 2010, pp. 51–56.                                                   put. Syst. 20 (2021). URL: https://doi.org/10.1145/
[4] H. Wickham, R. François, L. Henry, K. Müller,                          3446976. doi:1 0 . 1 1 4 5 / 3 4 4 6 9 7 6 .
    dplyr: A Grammar of Data Manipulation, 2021. URL:                 [14] A. Salah, K. Li, Q. Liao, M. Hashem, Z. Li, A. T.
    https://CRAN.R-project.org/package=dplyr, r pack-                      Chronopoulos, A. Y. Zomaya, A Time-Space Effi-
    age version 1.0.7.                                                     cient Algorithm for Parallel k-Way In-Place Merg-
[5] M. Raasveldt, H. Mühleisen, DuckDB: An Em-                             ing Based on Sequence Partitioning and Perfect
    beddable Analytical Database, in: Proceedings                          Shuffle, ACM Trans. Parallel Comput. 7 (2020).
    of the 2019 International Conference on Manage-                        URL: https://doi.org/10.1145/3391443. doi:1 0 . 1 1 4 5 /
    ment of Data, SIGMOD ’19, Association for Com-                         3391443.
    puting Machinery, New York, NY, USA, 2019, p.                     [15] O. Green, S. Odeh, Y. Birk, Merge Path - A
    1981–1984. URL: https://doi.org/10.1145/3299869.                       Visually Intuitive Approach to Parallel Merging,
    3320212. doi:1 0 . 1 1 4 5 / 3 2 9 9 8 6 9 . 3 3 2 0 2 1 2 .           CoRR abs/1406.2628 (2014). URL: http://arxiv.org/
[6] V. Leis, M. Haubenschild, A. Kemper, T. Neumann,                       abs/1406.2628. a r X i v : 1 4 0 6 . 2 6 2 8 .
    LeanStore: In-Memory Data Management beyond                       [16] R. D. Hipp, SQLite, 2021. URL: https://www.sqlite.
    Main Memory, in: 34th IEEE International Confer-                       org/index.html.
    ence on Data Engineering, ICDE 2018, Paris, France,               [17] L. Kuiper, Fastest table sort in the West - Redesign-
    April 16-19, 2018, IEEE Computer Society, 2018, pp.                    ing DuckDB’s sort, 2021. URL: https://duckdb.org/
    185–196. URL: https://doi.org/10.1109/ICDE.2018.                       2021/08/27/external-sorting.html.
    00026. doi:1 0 . 1 1 0 9 / I C D E . 2 0 1 8 . 0 0 0 2 6 .        [18] M. Raasveldt, H. Mühleisen, Don’t Hold My Data
[7] A. Kemper, D. Kossmann, Adaptable Pointer Swiz-                        Hostage: A Case for Client Protocol Redesign, Proc.
    zling Strategies in Object Bases: Design, Real-                        VLDB Endow. 10 (2017) 1022–1033. URL: https:
    ization, and Quantitative Analysis, VLDB J. 4                          //doi.org/10.14778/3115404.3115408. doi:1 0 . 1 4 7 7 8 /
    (1995) 519–566. URL: http://www.vldb.org/journal/                      3115404.3115408.
    VLDBJ4/P519.pdf.                                                  [19] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gom-
[8] M. Poess, B. Smith, L. Kollar, P. Larson, TPC-DS,                      mers, P. Virtanen, D. Cournapeau, E. Wieser, J. Tay-
    Taking Decision Support Benchmarking to the next                       lor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer,
    Level, in: Proceedings of the 2002 ACM SIGMOD                          M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del
    International Conference on Management of Data,                        Río, M. Wiebe, P. Peterson, P. Gérard-Marchant,
    SIGMOD ’02, Association for Computing Machin-                          K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi,
    ery, New York, NY, USA, 2002, p. 582–587. URL:                         C. Gohlke, T. E. Oliphant, Array programming
with NumPy, Nature 585 (2020) 357–362. URL:
https://doi.org/10.1038/s41586-020-2649-2. doi:1 0 .
1038/s41586- 020- 2649- 2.