<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Experimental Index Evaluation for Partial Indexes in Horizontally Partitioned In-Memory Databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcel Weisgut</string-name>
          <email>marcel.weisgut@student.hpi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hasso Plattner Institute, University of Potsdam</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A horizontally partitioned storage layout for columnoriented relational in-memory databases is a popular design choice. One of the reasons for horizontal table partitioning is the increased degree of exibility in data placement, multiprocessing, and physical design decisions. For example, auxiliary data structures such as indexes or lters can be created partition-wise with varying implementations. However, creating indexes per partition can result in a signi cant computational overhead for index lookup operations. In this work, we present a partial indexing idea to overcome this overhead. Our proposed approach creates indexes table-wise, but only tuples of frequently accessed partitions are indexed. Our research is still in progress. For the realization of our partial index, the maintenance e ciency of the underlying index structure is particularly relevant. Thus, in this work, we evaluate di erent index implementations in their lookup speed, maintenance cost, and memory consumption to identify suitable implementations to realize partial indexes. The hash maps Robin Hood (RH) Flat Map and Tessil's (TSL) Sparse Map achieve overall the best evaluation results. Whereas the former is comparatively faster, the latter has a lower memory footprint.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Horizontal partitioning allows to create indexes
partitionwise with varying implementations [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. However, when
those indexes are utilized to nd qualifying tuples for a given
search condition, one lookup operation for each partition's
index of the corresponding table has to be performed.
      </p>
      <p>Table 1 shows two exemplary joins that we executed with
our research database Hyrise1. Tables in Hyrise are
horizontally partitioned into partitions of 65 535 tuples. The
depicted joins utilize indexes which are created partition-wise.
Both joins have the same predicate, the same indexed base
table as left input, and a ltered table as the right input.
The right, non-indexed table is referred to as probe table.</p>
      <p>In query 19, the number of tuples in the probe table is
2 771 higher than in query 17. Although this number is small
compared to the number of tuples in the indexed table of
almost 60 million, it results in a crucial additional e ort for
1Source code at https://github.com/hyrise/hyrise
32nd GI-Workshop on Foundations of Databases (Grundlagen von
Datenbanken), September 01-03, 2021, Munich, Germany.</p>
      <p>Copyright © 2021 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
TPC-H
Query ID</p>
      <p>Join predicate
l partkey = p partkey
l partkey = p partkey</p>
      <p>Number of tuples
Indexed
table
the index probing compared to query 17. With the
aforementioned partition size, the indexed table consists of 916
partitions. Since indexes are created partition-wise, one equality
lookup for each of the 916 created indexes must be
executed for each tuple of the probe table to retrieve matching
tuples of the indexed table. In total, this results in 1 819 176
equality lookups in query 17 and 4 357 412 in query 19.</p>
      <p>Consequently, the index joins' performance depends on
the partition count of the indexed data table and scales
inversely proportional with it: As the number of partitions
increases, so does the number of indexes and the number of
lookup operations required.</p>
      <p>Our ongoing research focuses on the design and exemplary
implementation of a partial indexing strategy to overcome
this computational index probing overhead. The
contributions of this work are as follows:</p>
      <p>Partial Indexes. We introduce an idea of partial
indexes to reduce the shown computational index probing
overhead. Our partial indexes are to be created
tablewise instead of partition-wise, but they store only
index entries for frequently accessed partitions.</p>
      <p>Index Benchmark Framework. We developed an
opensource C++ benchmark framework, which allows
researchers to measure the performance of read and
write operations of secondary indexes. The evaluation
aspects are the latency of insert, delete, lookup and
bulk operations and the index memory consumption.
Evaluation of In-Memory Secondary Indexes. We
present an experimental performance evaluation of
different index implementations, including trees, radix
trees, and hash maps. We evaluated their read and
write e ciency and identi ed index candidates that
suite speci cally well for our partial index.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>PARTIAL INDEXES</title>
      <p>An advantage of partition indexes is the high exibility:
For each partition, it can be decided individually whether
an index is to be created. This makes it possible to create
indexes only for partitions whose data is accessed frequently.
Such exibility is not provided with a traditional full table
index since it must always be updated when a tuple is
inserted into or deleted from the indexed table, or the search-key
value of an already indexed tuple is updated. Thus, partition
indexes are more e cient in terms of memory consumption
as they can selectively index data of a table whereas a full
table index contains index entries for all tuples of a table.</p>
      <p>Our proposed partial index is a secondary index that
breaks the full table index's requirement of indexing a table
entirely and can index one or more partitions. Thus, an
arbitrary subset of partitions can be indexed. In contrast to
tuple-wise indexing of traditional indexes, our partial index's
maintenance operations are executed partition-wise.
Consequently, index entries are inserted into or deleted from the
partial index for the entire data of a given partition.
Inplace updates are not considered in this work since our
partial index is speci cally designed for insert-only in-memory
DBMSs that use multiversion concurrency control.</p>
      <p>
        The Need for Maintenance E ciency. Our partial index
only indexes partitions whose data is accessed frequently to
reduce the memory footprint. As the workload of a DBMS
can change over time [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], so does the frequency with which
data is accessed. With a changing set of partitions
whose data is accessed frequently, existing partial indexes must
continuously be adjusted.
      </p>
      <p>Consequently, the indexing algorithm required for our
partial index is adaptive. Depending on the workload, the
algorithm must continuously insert index entries partition-wise
into or delete them from partial indexes. The index
maintenance e ort of a partial index depends on factors such
as the size of the indexed table, the details of the indexing
algorithm (e.g., the access classi cation of data as
frequently or rarely accessed), the executed database workload, and
the underlying data structure of the partial index. In this
work, we focus on the latter aspect and identify an index
implementation suitable for realizing our partial index.</p>
    </sec>
    <sec id="sec-3">
      <title>3. INDEX BENCHMARK</title>
      <p>We developed an open-source benchmark framework2 to
evaluate single-attribute in-memory secondary indexes in
their lookup speed, maintenance and memory performance.
It allows to run certain benchmark cases with di erent index
implementations on di erent datasets. The benchmark cases
measure the required execution times of the index
operations. In selected benchmark cases, the index's memory
consumption is additionally measured. The index
implementations included in the framework as well as the framework
itself are written in C++.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Index Implementations</title>
      <p>
        Unsync ART. The Unsync ART [
        <xref ref-type="bibr" rid="ref1 ref23">1, 23</xref>
        ] is an
implementation of the ART (Adaptive Radix Tree), which is \a fast and
space-e cient in-memory indexing structure speci cally
tuned for modern hardware" [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. The ART is a specialized
radix tree that adaptively and dynamically chooses a compact
data structure for each individual internal node, depending
on the number of child nodes.
      </p>
      <p>
        MP Judy. Similar to the ART, the Judy Array is an
adaptive radix tree [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. It is a variant of a 256-way radix tree
that is designed to reduce the number of cache misses by
using over 20 di erent compression techniques. We use the
Judy Array implementation of M. Pictor in this work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
2Source code at https://github.com/mweisgut/IMIB
TLX B+ Tree. This B+ tree, which is part of the TLX
collection [
        <xref ref-type="bibr" rid="ref11 ref14">14, 11</xref>
        ], is an improved version of the STX B+
Tree [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. According to the description of the predecessor,
this data structure is an in-memory B+ tree that packs
multiple value pairs into each node of the tree and thus reduces
heap fragmentation and utilizes cache-line e ects [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Abseil B-Tree. This B-tree, which is part of Google's
opensource collection of C++ libraries called Abseil [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ], is a
cache-friendly and space-e cient B-tree implementation.
      </p>
      <p>
        BB-Tree. The BB-Tree [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is an \almost-balanced k-ary
search tree, where inner nodes recursively split the data
space into k partitions according to a delimiter dimension
and k 1 delimiter values. Data objects are stored in leaf
nodes (buckets). When too many data points are inserted (or
deleted) and buckets over ow (or under ow), the structure
is rebuilt to achieve a balance that is bene cial regarding
the depths of leaves." [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
      <p>
        RH Flat Map. The Robin Hood (RH) Flat Map [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a fast
and memory e cient hash map. For collision resolution, it is
based on robin hood hashing. The hash map stores its data
in a at array, which results in very fast access operations.
      </p>
      <p>
        RH Node Map. Similar to the Robin Hood Flat Map, the
Robin Hood Node Map [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a fast and memory e cient hash
map, based on robin hood hashing. Unlike the RH Flat Map,
the RH Node Map stores its data with node indirection.
      </p>
      <p>
        PG Skip List. A skip list is a linked list extended by
additional pointers to skip nodes when searching for certain
stored elements [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. The skip list included in the
benchmark framework is an implementation of P. Goodli e [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        TSL Robin Map. Similar to the above listed RH Maps,
Tesssil's (TSL) Robin Map [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a fast hash map based on
robin hood hashing.
      </p>
      <p>
        TSL Sparse Map. Tessil's (TSL) Sparse Map [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is a
memory e cient hash map that uses sparse quadratic probing.
The design goal of this map is to be the most memory e
cient possible while keeping reasonable performance.
      </p>
      <p>
        STD Hash Map. This map is provided by the C++ standard
library as unordered_map. It \is an associative container that
contains key-value pairs with unique keys" [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Index Operations</title>
      <p>Insert (I). The Insert operation creates a new key-value
entry in the index. Therefore, the key and the value have
to be provided as the operation's input parameters. If an
entry with the given key exists in the index, the entry is not
added. An exception is the PG Skip List, which overwrites
the existing entry.</p>
      <p>Delete (D). The Delete operation removes the stored entry
that contains a given key from the index.</p>
      <p>Equality Lookup (EL). The Equality Lookup expects a key
as an input parameter and returns the associated value if
the index contains a key-value entry with the given key.</p>
      <p>Range Lookup (RL). The Range Lookup operation expects
two keys as input parameters, representing a key range, and
returns the values of all the stored entries whose key is within
the given key range.</p>
      <p>Bulk Insert (BI). Inserting a large number of entries is
referred to as bulk loading. Since the TLX B+ Tree provides
two di erent functionalities in this regard, we distinguish
between a bulk insert and a bulk load. The Bulk Insert
operation inserts multiple key-value entries into the index
without the requirement of the index being empty before.</p>
      <p>Bulk Load (BL). Similar to the Bulk Insert, the Bulk Load
inserts multiple key-value entries into the index. Unlike the
Bulk Insert, the index must be empty before. For example, a
bulk load of the TLX B+ Tree means that for a given sorted
sequence of index entries, the leaves are created rst, and the
overlying levels of the B+ tree are constructed afterward.</p>
    </sec>
    <sec id="sec-6">
      <title>Benchmark Cases</title>
      <p>
        The benchmark framework contains the benchmark cases
Insert, Delete, Equality Lookup, Range Lookup, Bulk Insert
and Bulk Load. These are designed to evaluate the index
operations listed in Section 3.2. Due to space limitations,
we only present the Equality Lookup, Delete, and Bulk Insert
cases. The descriptions of all benchmark cases can be found
in the code repository of our benchmark [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Equality Lookup. The time required to execute a given
sequence of equality lookups is measured. Before executing
and measuring the equality lookups, the index is initialized
and a given sequence of index entries is inserted using the
insert operation.</p>
      <p>Delete. The execution time required to delete a given
sequence of index entries using the delete operation is
measured. Before the delete operations are executed and measured,
the index is initialized and the entire sequence of index
entries to be deleted is inserted using the insert operation. The
order in which the entries are deleted corresponds to the
insertion order.</p>
      <p>Bulk Insert. The execution time required to insert a given
sequence of index entries using the bulk insert operation is
measured. Before the execution and measurement, the
index is initialized. Besides, the index's memory footprint is
measured. Therefore, the allocated memory is measured
before the index's initialization and after inserting the index
entries. The di erence of the allocated memory values is
considered the index's memory consumption.
3.4</p>
    </sec>
    <sec id="sec-7">
      <title>Datasets</title>
      <p>To execute the di erent benchmark cases for the index
implementations, di erent data is needed. For all the
benchmark cases, a sequence of keys is required as input for the
benchmark. This input represents a sequence of index
entries, where a single key represents the search-key value and
the position of the key, starting at one, represents the tuple
identi er (TID). We refer to this input data as entry keys.</p>
      <p>We use dense and sparse unsigned integer values as the
entry keys to evaluate the index implementations.
Additionally, we consider both ascending sorted and unsorted
keys. The sorted dense dataset contains sequentially
increasing, consecutive integer values, starting at one. For the
unsorted dense dataset, the sorted dense keys are randomly
shu ed. The unsorted sparse data sequence contains
randomly picked, unique unsigned integer values. This sequence
is sorted in ascending order for the sorted sparse data.</p>
      <p>Additionally, search-key values and ranges of search-key
values are required as input data for the Equality Lookup
and the Range Lookup experiments, respectively. From a
sequence of entry keys, a number of keys is randomly selected
as equality lookups data sequence. For the range lookup key
ranges, a number of key ranges is selected from a given
sequence of entry keys. For each of the selected ranges, the
number of qualifying keys is the same.</p>
      <p>For our evaluation, we use ve di erent sizes per
combination of dense/sparse and ascending/random entry keys.
The datasets have the sizes of two, four, six, eight, and ten
million entry keys. Furthermore, we use one million lookup
operations in the respective benchmark cases. For the range
lookups, we used a selectivity of 0.01%.
4.</p>
    </sec>
    <sec id="sec-8">
      <title>EVALUATION</title>
      <p>Since all of the index implementations listed in Section 3.1
support unique keys but not non-unique keys, we only use
datasets with unique entry keys in our evaluation. For a
given dataset, we ran the benchmark cases using 64-bit
unsigned integers as search-keys and TIDs. The measurements
reported are the median of 12 runs.</p>
      <p>Join operations between foreign and primary keys are an
exemplary database use case in which an index for a column
with unique unsigned integer values can be utilized. Primary
keys are often IDs, which are often represented as unsigned
integer values. This applies, for example, when primary keys
are surrogate keys.</p>
      <p>The index benchmark cases were executed on a
machine with four Intel(R) Xeon(R) CPU E7-4880 v2 CPUs on
which each socket serves as a non-uniform memory
access (NUMA) node. Each NUMA node has 512 GB of
DDR3-1600 memory. Benchmarks were bound to a single
NUMA node using numactl -N 2 -m 2. The index
benchmark was compiled with clang 9.0.1-12 and the options
-O3 -march=native. We used the C++ standard library
version libstdc++.so.6.0.28.
4.1</p>
    </sec>
    <sec id="sec-9">
      <title>Results</title>
      <p>
        Due to space limitations, we only show the benchmark
results for the Equality Lookup, Delete, and Bulk Insert
benchmark cases. The remaining benchmark results can be found
in our code repository [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Equality Lookups. Figure 1 shows the cumulative
execution time of one million equality lookups performed with
various entry key distributions, entry key orderings, and
numbers of entries stored in the index. In each quadrant, the
measurements of one data distribution and ordering
combination is shown for di erent numbers of stored index entries.</p>
      <p>As a reference, we included the lookup duration of the
Sorted Vector, which is a dynamic array (std::vector) that
stores its values in sorted order. To keep the dynamic array
sorted all the time, new values are inserted into the
proper position. To nd the position of certain values during a
lookup operation, a binary search is performed.</p>
      <p>Delete. Figure 2 shows the cumulative execution time of
deleting various numbers of index entries using the delete
operation. The data characteristics of the entry keys vary in
the four di erent quadrants.</p>
      <p>Bulk Insert. Figure 3 shows the execution time required to
insert various numbers of index entries using the bulk insert
operation. The data characteristics of the entry keys vary in
the four di erent quadrants.</p>
      <p>Simple Vector and Sorted Vector are included as
reference. The bulk insert of the Simple Vector consecutively
inserts the sequence of given entries to a dynamic array
(std::vector). The Sorted Vector additionally sorts the
dynamic array after nishing the insertion.</p>
      <p>Based on the evaluation measurements, we subjectively
categorized these measurements into the categories low (L),
moderate (M) and high (H) as shown in Table 3. Since the
measurements are execution times and memory
consumptions, respectively, lower categories are better. We did not
categorize the execution times of range lookups and bulk
loads since the former is supported only by the B-tree
implementations and the latter only by the TLX B+ Tree.</p>
      <p>Non-Competitive Implementations. Executions of di erent
benchmark cases showed that the index operations could not
be executed in a reasonable time for the BB-Tree and the
PG Skip List. Therefore, we excluded these implementations
from further evaluation experiments.</p>
      <p>Key Findings. As can be seen in Table 3, the RH Flat Map,
TSL Sparse Map, and Unsync ART are the only
implementations that achieved good or moderate performance on all
datasets in the evaluated aspects. Whereas the RH Flat Map
shows overall lower operation execution times compared to
the TSL Sparse Map, the latter has a lower memory
footprint. The Unsync ART achieves an overall moderate
performance in the evaluated aspects. Unfortunately, it does not
support bulk inserts and provides a faulty delete operation.
Apart from the high memory consumption of the TSL
Robin Map on all datasets, this hash map achieves overall low
operation execution times. In fact, it mostly requires lower
equality lookup, delete and bulk insert execution times
compared to the RH Flat Map and TSL Sparse Map.</p>
      <p>Furthermore, the B-trees, which are the only
implementations supporting range queries, generally perform better
on sorted data as shown in Table 3. The Abseil B-Tree
performs generally better than the TLX B+ Tree. However,
even on sorted data, both implementations achieved the
highest equality lookup execution time.</p>
    </sec>
    <sec id="sec-10">
      <title>RELATED WORK</title>
      <p>
        Partial Indexes. Stonebraker presented the rst concept of
partial indexes and how they can be used in a DBMS [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
According to his speci cation, a single partial index is
created for a subset of a table's data depending on a given
quali cation condition. Index entries are only created for those
tuples that satisfy the given condition.
      </p>
      <p>
        Seshadri and Swami extended Stonebraker's idea and
proposed a generalized partial indexing concept, which includes
six di erent strategies that can be used to create partial
indexes [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. These strategies use various statistical
information such as the distribution of column accesses by queries,
query predicate values, and data values to decide which
indexes to be created.
      </p>
      <p>
        Database Cracking [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] is an adaptive indexing technique
that partially indexes columns. Using this technique, indexes
are created \adaptively and incrementally as a side-product
of query processing" [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. The core cracking algorithm
creates a copy of a column { a so-called cracker column { and
incrementally sorts it during the execution of range
queries [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The incremental sorting is realized by splitting the
cracker column into multiple partitions { so-called column
slices { with certain value ranges. During the execution of a
range query, the cracker column or already existing column
slices, respectively, are split based on the range predicate's
boundaries. The partially sorted cracker column is utilized
by database operations to speed up the execution time.
      </p>
      <p>
        Olma et al. recently proposed an online adaptive
partitioning and indexing algorithm for in situ query processing
engines [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], which they integrated into their query
processing system Slalom. Their approach reduces data access costs
by partitioning a raw dataset into partitions and applying
indexing strategies for each partition individually. Based on
continuously collected statistics, their algorithm performs
the partitioning and indexing in a gradual manner as a side
e ect of query processing. Thus, a priori workload knowledge
is not required. Based on access frequencies, the algorithm
decides individually for each partition whether and which
in-memory partition index is to be created. Thus, instead of
indexing the entire dataset, indexes exist only for subset of
frequently accessed partitions.
      </p>
      <p>Olma et al.'s algorithm shares similarities with our partial
index approach: Both approaches re ne the set of existing
indexes automatically and dynamically based on data access
frequencies. In addition, both approaches choose the set of
data to be indexed on partition granularity. However, while
their algorithm creates indexes partition-wise, our approach
creates indexes table-wise across selected partitions.
Additionally, whereas Olma et al.'s work refers to in situ query
processing systems, our approach is designed for in-memory
DBMS. Thus, their indexes are created for partitions of raw
data les whereas our indexes are to be created for table
data stored in main memory.</p>
      <p>
        Index Evaluations and Benchmarks. Xie et al. conducted
an experimental performance evaluation of ve modern
inmemory database indexes [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Their aspects of investigation
were the throughput, latency, scalability, memory
consumption and cache miss rate. They executed equality lookups,
inserts and updates with varying experiment con gurations
in di erent dimensions. These dimensions are the size of the
dataset, the number of execution threads, the ratios of
equality lookup, insert, and update operations, and the skewness
in the workload and dataset.
      </p>
      <p>
        Alvarez and his fellow researchers presented an
experimental performance evaluation between the ART and the Judy
Array and di erent hash table variants [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Their
evaluation aspects are the insertion and equality lookup throughput
and the memory footprint.
      </p>
      <p>
        Kipf, Marcus and their co-authors developed an
opensource index benchmark framework called Search On Sorted
Data Benchmark (SOSD) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] that allows researchers to
compare in-memory search algorithms, including (learned)
indexes, on equality lookup performance over sorted data [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
According to the SOSD authors, their framework contains
diverse synthetic and real-world datasets, optimized
baseline implementations, and the \ rst performant and publicly
available implementation of the [learned] Recursive Model
Index (RMI)" [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], which is proposed by Kraska et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
Marcus, Kipf et al. extended their initial work about the
SOSD in a follow-up work [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. In this work, they presented
their framework in more detail and conducted a performance
analysis of three recent learned index structures.
6.
      </p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>In this work, we presented the idea of a partial index that
is created table-wise but only indexes data of frequently
accessed partitions. Using our index benchmark framework, we
evaluated the lookup speed, maintenance cost and memory
consumption of various index implementations to identify a
suitable implementation for partial indexes.</p>
      <p>Since our partial indexes are designed to execute
maintenance operations on a partition level, the e ciency of bulk
operations is particularly relevant. Regarding the equality
lookup, delete and bulk insert performance, the TSL Robin
Map achieved the best overall results.</p>
      <p>As a next step, we plan to implement the proposed partial
index in our research database Hyrise so that we can perform
end-to-end DBMS performance evaluations afterward.</p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGMENTS</title>
      <p>I sincerely thank my supervisors Jan Kossmann and
Markus Dreseler, who supported me intensively and always
gave me constructive impulses and detailed, critical feedback.
Furthermore, I would like to thank the anonymous reviewers
for their valuable feedback.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] ARTSynchronized. https://github.com/flode/ARTSynchronized, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>BB-Tree</surname>
          </string-name>
          . https://github.com/flippingbits/bb-tree, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Index</given-names>
            <surname>Benchmark Evaluation Results</surname>
          </string-name>
          . https://github.com/mweisgut/IMIB/tree/gvdb21/ evaluation_results, Accessed:
          <fpage>2021</fpage>
          -03-02.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Judy</given-names>
            <surname>Template</surname>
          </string-name>
          . https://github.com/mpictor/judy-template, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Robin</given-names>
            <surname>Hood</surname>
          </string-name>
          <article-title>Map</article-title>
          . https://github.com/martinus/robin-hood-hashing, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Robin</given-names>
            <surname>Map</surname>
          </string-name>
          . https://github.com/Tessil/robin-map, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Skip</given-names>
            <surname>List</surname>
          </string-name>
          . https://github.com/petegoodliffe/skip_list, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] SOSD. https://github.com/learnedsystems/SOSD, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Sparse</given-names>
            <surname>Map</surname>
          </string-name>
          . https://github.com/Tessil/sparse-mapp, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <article-title>std::unordered map</article-title>
          . https://en.cppreference.com/ w/cpp/container/unordered_map, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11] TLX. https://github.com/tlx/tlx, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Alvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Richter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          .
          <article-title>A comparison of adaptive radix trees and hash tables</article-title>
          .
          <source>In Proceedings of the International Conference on Data Engineering (ICDE)</source>
          , pages
          <fpage>1227</fpage>
          {
          <fpage>1238</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bingmann. STX B+ Tree</surname>
          </string-name>
          <string-name>
            <surname>C+</surname>
          </string-name>
          + Template Classes,
          <year>2013</year>
          . https://panthema.net/2007/stx-btree/, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bingmann</surname>
          </string-name>
          . TLX: Collection of sophisticated C+
          <article-title>+ data structures, algorithms</article-title>
          , and miscellaneous helpers,
          <year>2018</year>
          . https://panthema.net/tlx, Accessed 2021-
          <volume>03</volume>
          -01.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chasseur</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Patel</surname>
          </string-name>
          .
          <article-title>Design and Evaluation of Storage Organizations for Read-Optimized Main Memory Databases</article-title>
          .
          <source>Proceedings of the International Conference on Very Large Databases (VLDB)</source>
          ,
          <volume>6</volume>
          (
          <issue>13</issue>
          ):
          <volume>1474</volume>
          {
          <fpage>1485</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dreseler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Boissier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Klauck</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>U acker, and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Plattner. Hyrise</surname>
          </string-name>
          Re-engineered:
          <article-title>An Extensible Database System for Research in Relational In-Memory Data Management</article-title>
          .
          <source>In Proceedings of the International Conference on Extending Database Technology (EDBT)</source>
          , pages
          <fpage>313</fpage>
          {
          <fpage>324</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Google</given-names>
            <surname>Inc</surname>
          </string-name>
          . Abseil. https://abseil.io, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Google</given-names>
            <surname>Inc. Abseil - C+</surname>
          </string-name>
          <article-title>+ Common Libraries</article-title>
          . https://github.com/abseil/abseil-cpp, Accessed:
          <fpage>2021</fpage>
          -03-01.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Idreos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Kersten</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. Manegold. Database</given-names>
            <surname>Cracking</surname>
          </string-name>
          .
          <source>In Proceedings of the Conference on Innovative Data Systems Research (CIDR)</source>
          , pages
          <fpage>68</fpage>
          {
          <fpage>78</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kipf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. van Renen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stoian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kemper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kraska</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          .
          <article-title>SOSD: A Benchmark for Learned Indexes</article-title>
          .
          <source>NeurIPS Workshop on Machine Learning for Systems</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kraska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beutel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and N.</given-names>
            <surname>Polyzotis</surname>
          </string-name>
          .
          <article-title>The Case for Learned Index Structures</article-title>
          .
          <source>In Proceedings of the International Conference on Management of Data (SIGMOD)</source>
          , pages
          <fpage>489</fpage>
          {
          <fpage>504</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>V.</given-names>
            <surname>Leis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kemper</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          .
          <article-title>The adaptive radix tree: ARTful indexing for main-memory databases</article-title>
          .
          <source>In Proceedings of the International Conference on Data Engineering (ICDE)</source>
          , pages
          <fpage>38</fpage>
          {
          <fpage>49</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>V.</given-names>
            <surname>Leis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scheibner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kemper</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          .
          <article-title>The ART of practical synchronization</article-title>
          .
          <source>In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Aken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hefny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mezerhane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pavlo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Gordon</surname>
          </string-name>
          .
          <article-title>Query-based Workload Forecasting for Self-Driving Database Management Systems</article-title>
          .
          <source>In Proceedings of the International Conference on Management of Data (SIGMOD)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kipf</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. van Renen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stoian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kemper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kraska</surname>
          </string-name>
          .
          <source>Benchmarking Learned Indexes. PVLDB</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Olma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karpathiotakis</surname>
          </string-name>
          , I. Alagiannis,
          <string-name>
            <given-names>M.</given-names>
            <surname>Athanassoulis</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ailamaki</surname>
          </string-name>
          .
          <article-title>Adaptive partitioning and indexing for in situ query processing</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ):
          <volume>569</volume>
          {
          <fpage>591</fpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>W.</given-names>
            <surname>Pugh</surname>
          </string-name>
          .
          <article-title>Skip Lists: A Probabilistic Alternative to Balanced Trees</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>33</volume>
          (
          <issue>6</issue>
          ):
          <volume>668</volume>
          {
          <fpage>676</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Schuhknecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jindal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          .
          <article-title>The Uncracked Pieces in Database Cracking</article-title>
          .
          <source>Proceedings of the International Conference on Very Large Databases (VLDB)</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ):
          <volume>97</volume>
          {
          <fpage>108</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>P.</given-names>
            <surname>Seshadri</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Swami</surname>
          </string-name>
          .
          <article-title>Generalized partial indexes</article-title>
          .
          <source>In Proceedings of the International Conference on Data Engineering (ICDE)</source>
          , pages
          <fpage>420</fpage>
          {
          <fpage>427</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sprenger</surname>
          </string-name>
          , P. Schafer, and
          <string-name>
            <given-names>U.</given-names>
            <surname>Leser</surname>
          </string-name>
          .
          <article-title>Bb-tree: A practical and e cient main-memory index structure for multidimensional workloads</article-title>
          .
          <source>In Proceedings of the International Conference on Extending Database Technology (EDBT)</source>
          , pages
          <fpage>169</fpage>
          {
          <fpage>180</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          .
          <article-title>The Case for Partial Indexes</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>18</volume>
          (
          <issue>4</issue>
          ):4{
          <fpage>11</fpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cai</surname>
          </string-name>
          , G. Chen,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>A Comprehensive Performance Evaluation of Modern In-Memory Indices</article-title>
          .
          <source>In Proceedings of the International Conference on Data Engineering (ICDE)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>