Experimental Index Evaluation for Partial Indexes in Horizontally Partitioned In-Memory Databases Marcel Weisgut Hasso Plattner Institute, University of Potsdam, Germany marcel.weisgut@student.hpi.de ABSTRACT TPC-H Number of tuples Join predicate Query ID Indexed Probe A horizontally partitioned storage layout for column- table table oriented relational in-memory databases is a popular design choice. One of the reasons for horizontal table partitioning is 17 l partkey = p partkey ∼60 M 1 986 the increased degree of flexibility in data placement, multi- 19 l partkey = p partkey ∼60 M 4 757 processing, and physical design decisions. For example, auxi- liary data structures such as indexes or filters can be created Table 1: Example index (semi) joins generated in Hyrise with partition-wise with varying implementations. the TPC-H queries 17 and 19 using scale factor 10. However, creating indexes per partition can result in a the index probing compared to query 17. With the aforemen- significant computational overhead for index lookup opera- tioned partition size, the indexed table consists of 916 parti- tions. In this work, we present a partial indexing idea to tions. Since indexes are created partition-wise, one equality overcome this overhead. Our proposed approach creates in- lookup for each of the 916 created indexes must be execu- dexes table-wise, but only tuples of frequently accessed par- ted for each tuple of the probe table to retrieve matching titions are indexed. Our research is still in progress. For the tuples of the indexed table. In total, this results in 1 819 176 realization of our partial index, the maintenance efficiency equality lookups in query 17 and 4 357 412 in query 19. of the underlying index structure is particularly relevant. Consequently, the index joins’ performance depends on Thus, in this work, we evaluate different index implementa- the partition count of the indexed data table and scales in- tions in their lookup speed, maintenance cost, and memory versely proportional with it: As the number of partitions consumption to identify suitable implementations to realize increases, so does the number of indexes and the number of partial indexes. The hash maps Robin Hood (RH) Flat Map lookup operations required. and Tessil’s (TSL) Sparse Map achieve overall the best eva- Our ongoing research focuses on the design and exemplary luation results. Whereas the former is comparatively faster, implementation of a partial indexing strategy to overcome the latter has a lower memory footprint. this computational index probing overhead. The contributi- ons of this work are as follows: 1. INTRODUCTION • Partial Indexes. We introduce an idea of partial inde- Horizontal partitioning allows to create indexes partition- xes to reduce the shown computational index probing wise with varying implementations [15, 16]. However, when overhead. Our partial indexes are to be created table- those indexes are utilized to find qualifying tuples for a given wise instead of partition-wise, but they store only in- search condition, one lookup operation for each partition’s dex entries for frequently accessed partitions. index of the corresponding table has to be performed. Table 1 shows two exemplary joins that we executed with • Index Benchmark Framework. We developed an open- our research database Hyrise1 . Tables in Hyrise are hori- source C++ benchmark framework, which allows rese- zontally partitioned into partitions of 65 535 tuples. The de- archers to measure the performance of read and wri- picted joins utilize indexes which are created partition-wise. te operations of secondary indexes. The evaluation Both joins have the same predicate, the same indexed base aspects are the latency of insert, delete, lookup and table as left input, and a filtered table as the right input. bulk operations and the index memory consumption. The right, non-indexed table is referred to as probe table. • Evaluation of In-Memory Secondary Indexes. We pre- In query 19, the number of tuples in the probe table is sent an experimental performance evaluation of dif- 2 771 higher than in query 17. Although this number is small ferent index implementations, including trees, radix compared to the number of tuples in the indexed table of al- trees, and hash maps. We evaluated their read and most 60 million, it results in a crucial additional effort for write efficiency and identified index candidates that suite specifically well for our partial index. 1 Source code at https://github.com/hyrise/hyrise 2. PARTIAL INDEXES An advantage of partition indexes is the high flexibility: 32nd GI-Workshop on Foundations of Databases (Grundlagen von Daten- For each partition, it can be decided individually whether banken), September 01-03, 2021, Munich, Germany. Copyright © 2021 for this paper by its authors. Use permitted under Crea- an index is to be created. This makes it possible to create tive Commons License Attribution 4.0 International (CC BY 4.0). indexes only for partitions whose data is accessed frequently. Such flexibility is not provided with a traditional full table TLX B+ Tree. This B+ tree, which is part of the TLX index since it must always be updated when a tuple is inser- collection [14, 11], is an improved version of the STX B+ ted into or deleted from the indexed table, or the search-key Tree [13]. According to the description of the predecessor, value of an already indexed tuple is updated. Thus, partition this data structure is an in-memory B+ tree that packs mul- indexes are more efficient in terms of memory consumption tiple value pairs into each node of the tree and thus reduces as they can selectively index data of a table whereas a full heap fragmentation and utilizes cache-line effects [13]. table index contains index entries for all tuples of a table. Abseil B-Tree. This B-tree, which is part of Google’s open- Our proposed partial index is a secondary index that source collection of C++ libraries called Abseil [17, 18], is a breaks the full table index’s requirement of indexing a table cache-friendly and space-efficient B-tree implementation. entirely and can index one or more partitions. Thus, an ar- BB-Tree. The BB-Tree [2] is an “almost-balanced k-ary bitrary subset of partitions can be indexed. In contrast to search tree, where inner nodes recursively split the data tuple-wise indexing of traditional indexes, our partial index’s space into k partitions according to a delimiter dimension maintenance operations are executed partition-wise. Conse- and k −1 delimiter values. Data objects are stored in leaf no- quently, index entries are inserted into or deleted from the des (buckets). When too many data points are inserted (or partial index for the entire data of a given partition. In- deleted) and buckets overflow (or underflow), the structure place updates are not considered in this work since our par- is rebuilt to achieve a balance that is beneficial regarding tial index is specifically designed for insert-only in-memory the depths of leaves.” [30]. DBMSs that use multiversion concurrency control. RH Flat Map. The Robin Hood (RH) Flat Map [5] is a fast The Need for Maintenance Efficiency. Our partial index and memory efficient hash map. For collision resolution, it is only indexes partitions whose data is accessed frequently to based on robin hood hashing. The hash map stores its data reduce the memory footprint. As the workload of a DBMS in a flat array, which results in very fast access operations. can change over time [24], so does the frequency with which RH Node Map. Similar to the Robin Hood Flat Map, the data is accessed. With a changing set of partitions who- Robin Hood Node Map [5] is a fast and memory efficient hash se data is accessed frequently, existing partial indexes must map, based on robin hood hashing. Unlike the RH Flat Map, continuously be adjusted. the RH Node Map stores its data with node indirection. Consequently, the indexing algorithm required for our par- PG Skip List. A skip list is a linked list extended by ad- tial index is adaptive. Depending on the workload, the algo- ditional pointers to skip nodes when searching for certain rithm must continuously insert index entries partition-wise stored elements [27]. The skip list included in the bench- into or delete them from partial indexes. The index main- mark framework is an implementation of P. Goodliffe [7]. tenance effort of a partial index depends on factors such TSL Robin Map. Similar to the above listed RH Maps, as the size of the indexed table, the details of the indexing Tesssil’s (TSL) Robin Map [6] is a fast hash map based on algorithm (e.g., the access classification of data as frequent- robin hood hashing. ly or rarely accessed), the executed database workload, and TSL Sparse Map. Tessil’s (TSL) Sparse Map [9] is a me- the underlying data structure of the partial index. In this mory efficient hash map that uses sparse quadratic probing. work, we focus on the latter aspect and identify an index The design goal of this map is to be the most memory effi- implementation suitable for realizing our partial index. cient possible while keeping reasonable performance. STD Hash Map. This map is provided by the C++ standard 3. INDEX BENCHMARK library as unordered_map. It “is an associative container that We developed an open-source benchmark framework2 to contains key-value pairs with unique keys” [10]. evaluate single-attribute in-memory secondary indexes in their lookup speed, maintenance and memory performance. 3.2 Index Operations It allows to run certain benchmark cases with different index Insert (I). The Insert operation creates a new key-value implementations on different datasets. The benchmark cases entry in the index. Therefore, the key and the value have measure the required execution times of the index operati- to be provided as the operation’s input parameters. If an ons. In selected benchmark cases, the index’s memory con- entry with the given key exists in the index, the entry is not sumption is additionally measured. The index implementa- added. An exception is the PG Skip List, which overwrites tions included in the framework as well as the framework the existing entry. itself are written in C++. Delete (D). The Delete operation removes the stored entry 3.1 Index Implementations that contains a given key from the index. Unsync ART. The Unsync ART [1, 23] is an implementa- Equality Lookup (EL). The Equality Lookup expects a key tion of the ART (Adaptive Radix Tree), which is “a fast and as an input parameter and returns the associated value if space-efficient in-memory indexing structure specifically tu- the index contains a key-value entry with the given key. ned for modern hardware” [22]. The ART is a specialized ra- Range Lookup (RL). The Range Lookup operation expects dix tree that adaptively and dynamically chooses a compact two keys as input parameters, representing a key range, and data structure for each individual internal node, depending returns the values of all the stored entries whose key is within on the number of child nodes. the given key range. MP Judy. Similar to the ART, the Judy Array is an ad- Bulk Insert (BI). Inserting a large number of entries is re- aptive radix tree [12]. It is a variant of a 256-way radix tree ferred to as bulk loading. Since the TLX B+ Tree provides that is designed to reduce the number of cache misses by two different functionalities in this regard, we distinguish using over 20 different compression techniques. We use the between a bulk insert and a bulk load. The Bulk Insert ope- Judy Array implementation of M. Pictor in this work [4]. ration inserts multiple key-value entries into the index wi- thout the requirement of the index being empty before. 2 Source code at https://github.com/mweisgut/IMIB Bulk Load (BL). Similar to the Bulk Insert, the Bulk Load inserts multiple key-value entries into the index. Unlike the domly picked, unique unsigned integer values. This sequence Bulk Insert, the index must be empty before. For example, a is sorted in ascending order for the sorted sparse data. bulk load of the TLX B+ Tree means that for a given sorted Additionally, search-key values and ranges of search-key sequence of index entries, the leaves are created first, and the values are required as input data for the Equality Lookup overlying levels of the B+ tree are constructed afterward. and the Range Lookup experiments, respectively. From a se- quence of entry keys, a number of keys is randomly selected Index Implementation I D EL RL BI BL as equality lookups data sequence. For the range lookup key Unsync ART 3 3 3 7 7 7 ranges, a number of key ranges is selected from a given se- TLX B+ Tree 3 3 3 3 3 3* quence of entry keys. For each of the selected ranges, the Abseil B-Tree 3 3 3 3 3 7 number of qualifying keys is the same. BB-Tree 3 3 3 3 7 3 For our evaluation, we use five different sizes per combi- MP Judy 3 3 3 7 7 7 RH Flat/Node Map 3 3 3 7 3 7 nation of dense/sparse and ascending/random entry keys. PG Skip List 3 3 3 3 3 7 The datasets have the sizes of two, four, six, eight, and ten TSL Robin/Sparse Map 3 3 3 7 3 7 million entry keys. Furthermore, we use one million lookup STD Hash Map 3 3 3 7 3 7 operations in the respective benchmark cases. For the range lookups, we used a selectivity of 0.01%. Table 2: Index operations supported by the evaluated index implementations. * The entries to be inserted must be sorted. 4. EVALUATION Since all of the index implementations listed in Section 3.1 3.3 Benchmark Cases support unique keys but not non-unique keys, we only use The benchmark framework contains the benchmark cases datasets with unique entry keys in our evaluation. For a Insert, Delete, Equality Lookup, Range Lookup, Bulk Insert given dataset, we ran the benchmark cases using 64-bit un- and Bulk Load. These are designed to evaluate the index signed integers as search-keys and TIDs. The measurements operations listed in Section 3.2. Due to space limitations, reported are the median of 12 runs. we only present the Equality Lookup, Delete, and Bulk Insert Join operations between foreign and primary keys are an cases. The descriptions of all benchmark cases can be found exemplary database use case in which an index for a column in the code repository of our benchmark [3]. with unique unsigned integer values can be utilized. Primary Equality Lookup. The time required to execute a given keys are often IDs, which are often represented as unsigned sequence of equality lookups is measured. Before executing integer values. This applies, for example, when primary keys and measuring the equality lookups, the index is initialized are surrogate keys. and a given sequence of index entries is inserted using the The index benchmark cases were executed on a machi- insert operation. ne with four Intel(R) Xeon(R) CPU E7-4880 v2 CPUs on Delete. The execution time required to delete a given se- which each socket serves as a non-uniform memory ac- quence of index entries using the delete operation is measu- cess (NUMA) node. Each NUMA node has 512 GB of red. Before the delete operations are executed and measured, DDR3-1600 memory. Benchmarks were bound to a single the index is initialized and the entire sequence of index ent- NUMA node using numactl -N 2 -m 2. The index bench- ries to be deleted is inserted using the insert operation. The mark was compiled with clang 9.0.1-12 and the options order in which the entries are deleted corresponds to the -O3 -march=native. We used the C++ standard library ver- insertion order. sion libstdc++.so.6.0.28. Bulk Insert. The execution time required to insert a given sequence of index entries using the bulk insert operation is 4.1 Results measured. Before the execution and measurement, the in- Due to space limitations, we only show the benchmark re- dex is initialized. Besides, the index’s memory footprint is sults for the Equality Lookup, Delete, and Bulk Insert bench- measured. Therefore, the allocated memory is measured be- mark cases. The remaining benchmark results can be found fore the index’s initialization and after inserting the index in our code repository [3]. entries. The difference of the allocated memory values is con- Equality Lookups. Figure 1 shows the cumulative executi- sidered the index’s memory consumption. on time of one million equality lookups performed with va- rious entry key distributions, entry key orderings, and num- 3.4 Datasets bers of entries stored in the index. In each quadrant, the To execute the different benchmark cases for the index measurements of one data distribution and ordering combi- implementations, different data is needed. For all the bench- nation is shown for different numbers of stored index entries. mark cases, a sequence of keys is required as input for the As a reference, we included the lookup duration of the benchmark. This input represents a sequence of index ent- Sorted Vector, which is a dynamic array (std::vector) that ries, where a single key represents the search-key value and stores its values in sorted order. To keep the dynamic array the position of the key, starting at one, represents the tuple sorted all the time, new values are inserted into the pro- identifier (TID). We refer to this input data as entry keys. per position. To find the position of certain values during a We use dense and sparse unsigned integer values as the lookup operation, a binary search is performed. entry keys to evaluate the index implementations. Addi- Delete. Figure 2 shows the cumulative execution time of tionally, we consider both ascending sorted and unsorted deleting various numbers of index entries using the delete keys. The sorted dense dataset contains sequentially incre- operation. The data characteristics of the entry keys vary in asing, consecutive integer values, starting at one. For the the four different quadrants. unsorted dense dataset, the sorted dense keys are randomly Bulk Insert. Figure 3 shows the execution time required to shuffled. The unsorted sparse data sequence contains ran- insert various numbers of index entries using the bulk insert operation. The data characteristics of the entry keys vary in Dense data in ascending order Dense data in random order 1.0 8 the four different quadrants. Simple Vector and Sorted Vector are included as refe- 7 rence. The bulk insert of the Simple Vector consecutively 6 inserts the sequence of given entries to a dynamic array 5 0.8 (std::vector). The Sorted Vector additionally sorts the dy- 4 namic array after finishing the insertion. 3 2 Dense data in ascending order Dense data in random order 0.61 Duration in seconds 1.0 0.7 0 0.6 Sparse data in ascending order Sparse data in random order 8 0.5 0.47 0.8 0.4 6 0.3 5 0.2 4 0.23 0.6 Duration in seconds 0.1 2 Sparse data in ascending order Sparse data in random order 1 0.7 0 0.0 0.4 0.6 0.02 4 0.2 6 8 0.4 10 2 0.6 4 6 0.8 8 101.0 Number of index entries in million 0.5 Abseil B-Tree RH Flat Map STD Hash Map TSL Robin Map MP Judy RH Node Map TLX B+ Tree TSL Sparse Map 0.4 0.2 0.3 Figure 2: Cumulative duration of the Delete operation for 0.2 different sized sets of index entries. 0.1 Aspect Data Index implementation 0.0 0.02 4 0.2 6 8 0.4 10 2 0.6 4 6 0.8 8 101.0 TSL Sparse Map TSL Robin Map STD Hash Map RH Node Map Number of index entries in million TLX B+ Tree Abseil B-Tree RH Flat Map Unsync ART Abseil B-Tree RH Node Map TLX B+ Tree TSL Sparse Map Distribution MP Judy STD Hash Map TSL Robin Map Unsync ART MP Judy Ordering RH Flat Map Sorted Vector Figure 1: Cumulative duration of one million equality lookups for different numbers of stored index entries. dense ascending H M L M M H L L M Equality dense random H M L M M H L L M Based on the evaluation measurements, we subjectively Lookup sparse ascending M M L M M H L L M categorized these measurements into the categories low (L), duration sparse random M M L M M H L L M moderate (M) and high (H) as shown in Table 3. Since the dense ascending L M M H L M M L M measurements are execution times and memory consump- Insert dense random H M L M M H M M L tions, respectively, lower categories are better. We did not duration sparse ascending L L M M H M M M M categorize the execution times of range lookups and bulk sparse random H M L M H H M M M loads since the former is supported only by the B-tree im- dense ascending L H L M L M L L - plementations and the latter only by the TLX B+ Tree. Delete dense random H M L M M H L M - Non-Competitive Implementations. Executions of different duration sparse ascending L H L M H M L M - benchmark cases showed that the index operations could not sparse random H H L M H H L M - be executed in a reasonable time for the BB-Tree and the dense ascending L - M H L M L L - Bulk dense random H - L M L H L M - PG Skip List. Therefore, we excluded these implementations Insert sparse ascending L - M H H M M M - from further evaluation experiments. duration sparse random H - L M M H L L - Key Findings. As can be seen in Table 3, the RH Flat Map, dense ascending L L M M M M H L L TSL Sparse Map, and Unsync ART are the only implemen- Memory dense random L L M M M M H L L tations that achieved good or moderate performance on all foot- sparse ascending L M M M M M H L M print datasets in the evaluated aspects. Whereas the RH Flat Map sparse random L M M M M M H L M shows overall lower operation execution times compared to the TSL Sparse Map, the latter has a lower memory foot- Table 3: Categorization of the measured values into low (L), print. The Unsync ART achieves an overall moderate perfor- moderate (M) and high (H) (lower is better). mance in the evaluated aspects. Unfortunately, it does not support bulk inserts and provides a faulty delete operation. Furthermore, the B-trees, which are the only implemen- Apart from the high memory consumption of the TSL Ro- tations supporting range queries, generally perform better bin Map on all datasets, this hash map achieves overall low on sorted data as shown in Table 3. The Abseil B-Tree per- operation execution times. In fact, it mostly requires lower forms generally better than the TLX B+ Tree. However, equality lookup, delete and bulk insert execution times com- even on sorted data, both implementations achieved the hig- pared to the RH Flat Map and TSL Sparse Map. hest equality lookup execution time. Dense data in ascending order Dense data in random order effect of query processing. Thus, a priori workload knowledge 1.07 is not required. Based on access frequencies, the algorithm 6 decides individually for each partition whether and which 5 in-memory partition index is to be created. Thus, instead of 0.84 indexing the entire dataset, indexes exist only for subset of 3 frequently accessed partitions. Olma et al.’s algorithm shares similarities with our partial 2 index approach: Both approaches refine the set of existing 0.61 indexes automatically and dynamically based on data access Duration in seconds 0 frequencies. In addition, both approaches choose the set of Sparse data in ascending order Sparse data in random order data to be indexed on partition granularity. However, while 7 their algorithm creates indexes partition-wise, our approach 0.46 creates indexes table-wise across selected partitions. Addi- 5 tionally, whereas Olma et al.’s work refers to in situ query processing systems, our approach is designed for in-memory 4 DBMS. Thus, their indexes are created for partitions of raw 0.23 data files whereas our indexes are to be created for table 2 data stored in main memory. 1 Index Evaluations and Benchmarks. Xie et al. conducted an experimental performance evaluation of five modern in- 0 0.0 memory database indexes [32]. Their aspects of investigation 0.02 4 0.2 6 8 0.4 10 2 0.6 4 6 0.8 8 101.0 Number of index entries in million were the throughput, latency, scalability, memory consump- Abseil B-Tree STD Hash Map Sorted Vector TSL Robin Map tion and cache miss rate. They executed equality lookups, RH Flat Map Simple Vector TLX B+ Tree TSL Sparse Map inserts and updates with varying experiment configurations RH Node Map in different dimensions. These dimensions are the size of the Figure 3: Duration of the Bulk Insert operation for different dataset, the number of execution threads, the ratios of equa- sized sets of index entries. lity lookup, insert, and update operations, and the skewness in the workload and dataset. 5. RELATED WORK Alvarez and his fellow researchers presented an experimen- tal performance evaluation between the ART and the Judy Partial Indexes. Stonebraker presented the first concept of Array and different hash table variants [12]. Their evaluati- partial indexes and how they can be used in a DBMS [31]. on aspects are the insertion and equality lookup throughput According to his specification, a single partial index is crea- and the memory footprint. ted for a subset of a table’s data depending on a given qua- Kipf, Marcus and their co-authors developed an open- lification condition. Index entries are only created for those source index benchmark framework called Search On Sorted tuples that satisfy the given condition. Data Benchmark (SOSD) [8] that allows researchers to com- Seshadri and Swami extended Stonebraker’s idea and pro- pare in-memory search algorithms, including (learned) inde- posed a generalized partial indexing concept, which includes xes, on equality lookup performance over sorted data [20]. six different strategies that can be used to create partial in- According to the SOSD authors, their framework contains dexes [29]. These strategies use various statistical informa- diverse synthetic and real-world datasets, optimized baseli- tion such as the distribution of column accesses by queries, ne implementations, and the “first performant and publicly query predicate values, and data values to decide which in- available implementation of the [learned] Recursive Model dexes to be created. Index (RMI)” [20], which is proposed by Kraska et al. [21]. Database Cracking [19] is an adaptive indexing technique Marcus, Kipf et al. extended their initial work about the that partially indexes columns. Using this technique, indexes SOSD in a follow-up work [25]. In this work, they presented are created “adaptively and incrementally as a side-product their framework in more detail and conducted a performance of query processing” [28]. The core cracking algorithm crea- analysis of three recent learned index structures. tes a copy of a column – a so-called cracker column – and incrementally sorts it during the execution of range que- ries [19]. The incremental sorting is realized by splitting the 6. CONCLUSION AND FUTURE WORK cracker column into multiple partitions – so-called column In this work, we presented the idea of a partial index that slices – with certain value ranges. During the execution of a is created table-wise but only indexes data of frequently ac- range query, the cracker column or already existing column cessed partitions. Using our index benchmark framework, we slices, respectively, are split based on the range predicate’s evaluated the lookup speed, maintenance cost and memory boundaries. The partially sorted cracker column is utilized consumption of various index implementations to identify a by database operations to speed up the execution time. suitable implementation for partial indexes. Olma et al. recently proposed an online adaptive parti- Since our partial indexes are designed to execute mainte- tioning and indexing algorithm for in situ query processing nance operations on a partition level, the efficiency of bulk engines [26], which they integrated into their query proces- operations is particularly relevant. Regarding the equality sing system Slalom. Their approach reduces data access costs lookup, delete and bulk insert performance, the TSL Robin by partitioning a raw dataset into partitions and applying Map achieved the best overall results. indexing strategies for each partition individually. Based on As a next step, we plan to implement the proposed partial continuously collected statistics, their algorithm performs index in our research database Hyrise so that we can perform the partitioning and indexing in a gradual manner as a side end-to-end DBMS performance evaluations afterward. 7. ACKNOWLEDGMENTS [17] Google Inc. Abseil. https://abseil.io, I sincerely thank my supervisors Jan Kossmann and Mar- Accessed: 2021-03-01. kus Dreseler, who supported me intensively and always ga- [18] Google Inc. Abseil - C++ Common Libraries. ve me constructive impulses and detailed, critical feedback. https://github.com/abseil/abseil-cpp, Furthermore, I would like to thank the anonymous reviewers Accessed: 2021-03-01. for their valuable feedback. [19] S. Idreos, M. L. Kersten, and S. Manegold. Database Cracking. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), pages 8. REFERENCES 68–78, 2007. [1] ARTSynchronized. [20] A. Kipf, R. Marcus, A. van Renen, M. Stoian, https://github.com/flode/ARTSynchronized, A. Kemper, T. Kraska, and T. Neumann. SOSD: A Accessed: 2021-03-01. Benchmark for Learned Indexes. NeurIPS Workshop [2] BB-Tree. on Machine Learning for Systems, 2019. https://github.com/flippingbits/bb-tree, [21] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and Accessed: 2021-03-01. N. Polyzotis. The Case for Learned Index Structures. [3] Index Benchmark Evaluation Results. In Proceedings of the International Conference on https://github.com/mweisgut/IMIB/tree/gvdb21/ Management of Data (SIGMOD), pages 489–504, evaluation_results, Accessed: 2021-03-02. 2018. [4] Judy Template. [22] V. Leis, A. Kemper, and T. Neumann. The adaptive https://github.com/mpictor/judy-template, radix tree: ARTful indexing for main-memory Accessed: 2021-03-01. databases. In Proceedings of the International [5] Robin Hood Map. Conference on Data Engineering (ICDE), pages https://github.com/martinus/robin-hood-hashing, 38–49, 2013. Accessed: 2021-03-01. [23] V. Leis, F. Scheibner, A. Kemper, and T. Neumann. [6] Robin Map. https://github.com/Tessil/robin-map, The ART of practical synchronization. In Proceedings Accessed: 2021-03-01. of the International Workshop on Data Management [7] Skip List. on New Hardware (DaMoN), 2016. https://github.com/petegoodliffe/skip_list, [24] L. Ma, D. V. Aken, A. Hefny, G. Mezerhane, A. Pavlo, Accessed: 2021-03-01. and G. J. Gordon. Query-based Workload Forecasting [8] SOSD. https://github.com/learnedsystems/SOSD, for Self-Driving Database Management Systems. In Accessed: 2021-03-01. Proceedings of the International Conference on [9] Sparse Map. Management of Data (SIGMOD), 2018. https://github.com/Tessil/sparse-mapp, [25] R. Marcus, A. Kipf, A. van Renen, M. Stoian, Accessed: 2021-03-01. S. Misra, A. Kemper, T. Neumann, and T. Kraska. [10] std::unordered map. https://en.cppreference.com/ Benchmarking Learned Indexes. PVLDB, 2021. w/cpp/container/unordered_map, [26] M. Olma, M. Karpathiotakis, I. Alagiannis, Accessed: 2021-03-01. M. Athanassoulis, and A. Ailamaki. Adaptive [11] TLX. https://github.com/tlx/tlx, partitioning and indexing for in situ query processing. Accessed: 2021-03-01. The VLDB Journal, 29(1):569–591, 2020. [12] V. Alvarez, S. Richter, X. Chen, and J. Dittrich. A [27] W. Pugh. Skip Lists: A Probabilistic Alternative to comparison of adaptive radix trees and hash tables. In Balanced Trees. Commun. ACM, 33(6):668–676, 1990. Proceedings of the International Conference on Data [28] F. M. Schuhknecht, A. Jindal, and J. Dittrich. The Engineering (ICDE), pages 1227–1238, 2015. Uncracked Pieces in Database Cracking. Proceedings [13] T. Bingmann. STX B+ Tree C++ Template Classes, of the International Conference on Very Large 2013. https://panthema.net/2007/stx-btree/, Databases (VLDB), 7(2):97–108, 2013. Accessed: 2021-03-01. [29] P. Seshadri and A. N. Swami. Generalized partial [14] T. Bingmann. TLX: Collection of sophisticated C++ indexes. In Proceedings of the International Conference data structures, algorithms, and miscellaneous helpers, on Data Engineering (ICDE), pages 420–427, 1995. 2018. https://panthema.net/tlx, Accessed [30] S. Sprenger, P. Schäfer, and U. Leser. Bb-tree: A 2021-03-01. practical and efficient main-memory index structure [15] C. Chasseur and J. M. Patel. Design and Evaluation for multidimensional workloads. In Proceedings of the of Storage Organizations for Read-Optimized Main International Conference on Extending Database Memory Databases. Proceedings of the International Technology (EDBT), pages 169–180, 2019. Conference on Very Large Databases (VLDB), [31] M. Stonebraker. The Case for Partial Indexes. 6(13):1474–1485, 2013. SIGMOD Record, 18(4):4–11, 1989. [16] M. Dreseler, J. Kossmann, M. Boissier, S. Klauck, [32] Z. Xie, Q. Cai, G. Chen, R. Mao, and M. Zhang. A M. Uflacker, and H. Plattner. Hyrise Re-engineered: Comprehensive Performance Evaluation of Modern An Extensible Database System for Research in In-Memory Indices. In Proceedings of the International Relational In-Memory Data Management. In Conference on Data Engineering (ICDE), 2018. Proceedings of the International Conference on Extending Database Technology (EDBT), pages 313–324, 2019.