<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>PhD Workshop, August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Query Processing Based on Compressed Intermediates</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universita ̈ t Dresden</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>28</volume>
      <issue>2017</issue>
      <abstract>
        <p>Modern in-memory column-stores employ lightweight data compression to tackle the growing gap between processor speed and main memory bandwidth. However, the compression of intermediate results has not been investigated su ciently although accessing intermediates is as expensive as accessing the base data in these systems. Therefore, we introduce our vision of a balanced query processing based on compressed intermediates to improve query performance. In this paper, we provide an overview of the important research challenges on the way to this goal, present our contributions so far, and give an outlook on our remaining steps.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>With increasingly large amounts of data being collected in
numerous areas ranging from science to industry, the
importance of online analytical processing (OLAP) workloads
increases constantly. OLAP queries typically address a small
number of columns, but a high number of rows and are,
thus, most e ciently processed by column-stores. The
signi cant developments in the main memory domain in recent
years have rendered it possible to keep even large datasets
entirely in main memory. Consequently, modern
columnstores follow a main memory-centric architecture. These
systems have to face some new architectural challenges.</p>
      <p>
        Firstly, they su er from the new bottleneck between main
memory and the CPU caused by the contrast between
increasingly fast multi-core processors and the comparably low
main memory bandwidth. To address this problem,
columnstores make extensive use of data compression. The reduced
data sizes achievable through compression result in lower
transfer times, a better utilization of the cache hierarchy,
and less TLB misses. However, classical heavyweight
compression algorithms such as Hu man [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or Lempel Ziv [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
are too slow for in-memory systems. Therefore, numerous
lightweight compression algorithms such as di erential
coding [
        <xref ref-type="bibr" rid="ref14 ref16">14, 16</xref>
        ] and null suppression [
        <xref ref-type="bibr" rid="ref1 ref16">1, 16</xref>
        ] have been proposed,
which are much faster and, thus, suitable for in-memory
column-stores. Furthermore, especially for lightweight
compression algorithms, many operators can directly process the
compressed data without prior decompression.
      </p>
      <p>
        Secondly, in main memory-centric column-stores,
accessing the intermediate results during query processing is as
expensive as accessing the base data, since both reside in
main memory. Thus, intermediates o er a great potential
for performance improvement, which can be exploited in two
orthogonal ways: (1) intermediates should be avoided
whenever possible [
        <xref ref-type="bibr" rid="ref12 ref15">12, 15</xref>
        ], or (2) intermediates should be
represented in a way that facilitates e cient query processing.
      </p>
      <p>In this thesis, we focus on the second approach by
investigating lightweight compression of intermediates in main
memory-centric column-stores. This direction has not been
investigated su ciently in the literature so far. Existing
systems usually keep the data compressed only until an
operator cannot process the compressed data directly, whereupon
the data is decompressed, but not recompressed { due to the
resulting computational overhead. However, using modern
hardware and state-of-the-art lightweight compression
algorithms, this computational overhead can be outweighed by
the bene ts of compressed data. Thus, our vision is a
balanced query processing based on compressed intermediates.
That is, in a query execution plan of compression-aware
physical operators, every intermediate result shall be
represented using a suitable lightweight compression algorithm
which is selected in a compression-aware query optimization
such that the bene ts of compression outweigh its costs. To
achieve this goal, this thesis addresses three aspects of the
problem: the structural aspect on the basics of lightweight
compression (Section 2), the operational aspect on physical
operators for compressed data (Section 3), and the
optimization aspect on compression-aware optimization
strategies (Section 4). These aspects are designed to build upon
each other, as will become clear in the following sections.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>STRUCTURAL ASPECT</title>
      <p>The structural aspect lays the foundations of this thesis by
focusing on the basics of lightweight compression algorithms
and on e cient transformations between the compressed
formats of di erent algorithms. Thereby, our primary focus is
on integer sequences due to their outstanding importance in
column-stores: Other xed-width data types, such as
decimals, can be stored as integers and variable-width data
types, such as strings, usually need to be represented by
xed-width integer codes from a dictionary to enable e
cient processing. We have already completed our planned
research in this aspect of the thesis.</p>
      <p>(a) compr. time [sec]</p>
      <p>(b) decompr. time [sec]
DS 1</p>
      <p>DS 2</p>
      <p>DS 3
(c) compr. rate [bits/int]
DS 1</p>
      <p>DS 2</p>
      <p>DS 3
(d) SUM op. time [sec]
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Lightweight Data Compression</title>
      <p>
        In the eld of lossless lightweight data compression, we
distinguish between techniques, i.e., the abstract ideas of
how compression works conceptually, and algorithms, i.e.,
concrete instanciations of one or more techniques. So far,
we consider ve lightweight compression techniques for
sequences of integers: frame-of-reference (FOR) [
        <xref ref-type="bibr" rid="ref21 ref8">8, 21</xref>
        ], di
erential coding (DELTA) [
        <xref ref-type="bibr" rid="ref14 ref16">14, 16</xref>
        ], dictionary coding (DICT)
[
        <xref ref-type="bibr" rid="ref1 ref21">1, 21</xref>
        ], run-length encoding (RLE) [
        <xref ref-type="bibr" rid="ref1 ref16">1, 16</xref>
        ], and null
suppression (NS) [
        <xref ref-type="bibr" rid="ref1 ref16">1, 16</xref>
        ]. FOR and DELTA represent each data
element as the di erence to either a certain given reference
value (FOR) or to its predecessor (DELTA). DICT replaces
each value by its unique key in a dictionary. The objective of
these three well-known techniques is to represent the
original data as a sequence of small integers, which is then suited
for the actual compression using the NS technique. NS is
the most studied lightweight compression technique. Its
basic idea is the omission of leading zero bits in small integers.
Finally, RLE tackles uninterrupted sequences of occurrences
of the same value, so called runs. Each run is represented
by its value and length. Obviously, these techniques exploit
di erent data characteristics, such as the value range, the
number of distinct values, and repeated values.
      </p>
      <p>
        In the literature, numerous algorithms have been proposed
for these techniques, e.g., [
        <xref ref-type="bibr" rid="ref1 ref14 ref16 ref17 ref19 ref21 ref8">1, 8, 14, 16, 17, 19, 21</xref>
        ], to name
just a few examples. For our purposes of applying
decompression and recompression during query execution, we
depend on highly e cient implementations of these existing
algorithms. One way to achieve these is to use single
instruction multiple data (SIMD) extensions of modern processors,
such as Intel's SSE and AVX, which allow the application
of one operation to multiple data elements at once. In fact,
the employment of SIMD instructions has been the major
driver of the research in this eld in recent years [
        <xref ref-type="bibr" rid="ref14 ref17 ref19">14, 17,
19</xref>
        ]. We have contributed to the corpus of proposed e cient
implementations, e.g., through our vectorized algorithm for
RLE [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which is based on vectorized comparisons.
      </p>
      <p>
        As lightweight compression algorithms are always tailored
to certain data characteristics, their behavior in terms of
performance and compression rate depends strongly on the
data. Selecting the best algorithm for a given base column
or intermediate requires a thorough understanding of the
algorithms' behaviors subject to the data properties.
Unfortunately, a su cient comparative analysis had been
missing in the literature. Thus, we conducted an experimental
survey of several vectorized state-of-the-art compression
algorithms from all ve techniques as well as combinations
thereof on numerous datasets, whereby we systematically
varied the data characteristics [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. Figure 1a-c provide a
sample of our results (the code was compiled using g++ -O3
and the evaluation system was equipped with an Intel Core
i7-4710MQ and 16 GB RAM). Our comparative analysis
revealed several new insights. For instance, we could show
how di erent data distributions a ect the algorithms. We
found that especially outliers in the distributions lead to a
signi cant degradation in the performance and/or
compression rate of certain algorithms. Furthermore, for xed data
characteristics, the best algorithm regarding performance
is not necessarily the best regarding compression rate.
Finally, we could show that combinations of di erent
techniques can heavily improve the compression rate and even
the (de)compression speed depending on the data. Summing
up our ndings, we can state that there is no single-best
compression algorithm, but the choice depends on the data
properties and is non-trivial. Our extensive experimental
survey was made feasible by our benchmark framework for
compression algorithms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which facilitates an e cient and
organized evaluation process.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Direct Data Transformation</title>
      <p>Assuming that the optimal compression algorithm was
selected for a column, this algorithm might become
suboptimal if the data properties change after the decision. The
properties of the base data might change over time through
DML operations. While this case might be handled o ine,
the problem is more urgent for intermediates, whose
properties can change dramatically through the application of an
operator. For instance, a column containing outliers might
be stored in a format that can tolerate these, perhaps at the
price of a slow decompression. A selection operator might
remove the outliers, making a faster non-outlier-tolerant
algorithm a better choice than the original one. This motivates
the need for a transformation of the compressed
representation of the data in some source format to the compressed
representation in some destination format.</p>
      <p>A nave approach would take two steps: (1) Apply the
decompression algorithm of the source format to the data,
thereby materialize the entire uncompressed data in main
memory. (2) Apply the compression algorithm of the
destination format to that uncompressed data. The
advantage of this approach is that it builds only upon existing
(de)compression algorithms. However, since it materializes
the uncompressed data in main memory, it is prohibitively
expensive from our point of view, since we need to transform
intermediates during query execution.</p>
      <p>
        To address this issue, we introduced direct transformation
algorithms in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This novel class of algorithms is capable
of accomplishing the transformation in one step, i.e.,
without the materialization of the uncompressed data. To
provide an example, we proposed a direct transformation from
RLE to 4-Wise NS. In 4-Wise NS [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], the compressed data
is a sequence of compressed blocks of four data elements
each. The direct transformation algorithm RLE-2-4WiseNS
roughly works as follows: For each pair of run value and run
length in the RLE-compressed input, it creates one block of
four copies of the run value, compresses it once, and stores
it out multiple times until it reaches the run length. That
way, it saves the intermediate store and load as well as the
repeated block compression performed by the nave approach.
Our experiments showed that this and other direct
transformations yield signi cant speed ups over the nave approach,
if the data characteristics are suitable.
      </p>
    </sec>
    <sec id="sec-5">
      <title>OPERATIONAL ASPECT</title>
      <p>
        In our currently ongoing work in the operational aspect,
we investigate how to integrate lightweight data
compression into the query execution. Thereby, we assume that
a multitude of compression algorithms is available to the
system. We addressed the challenge of easily ful lling this
prerequisite in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
3.1
      </p>
      <p>Processing Model for Compressed Data
Our vision of a query processing based on compressed
intermediates can best be investigated using a processing
model that actually materializes all intermediates.
Furthermore, since we focus on column-stores and since lightweight
compression algorithms are designed for sequences of
values, all intermediates should use a columnar representation.
Hence, we chose column-at-a-time as the processing model.</p>
      <p>
        One example of a system that uses this processing model
is MonetDB [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which internally expresses queries in the
Monet Algebraic Language (MAL) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The central data
structure of MAL is the binary association table (BAT),
which is used to represent both, base data and
intermediates. Conceptually, a BAT consists of a head containing
record ids and a tail containing the actual data. However,
since the head always contains a dense sequence of integers,
it can be omitted. Thus, a BAT is essentially just an array
of data elements, making it a perfect target for lightweight
compression. MAL formally de nes a set of operators that
consume and produce BATs, such as selection, join, and
projection. We decided to use MAL as the foundation of
our work, but intend to adapt MAL operators to multiple
compressed formats, which we discus in the next section.
3.2
      </p>
      <p>Physical Operators for Compressed Data
When adapting MAL operators to compressed data,
different degrees of integration are possible. Figure 2 presents
the cases we plan to investigate. In general, an operator
might consume i inputs and produce o outputs, each of
which might be represented in its individual compressed
format. Figure 2a shows the baseline case of processing only
uncompressed data. In the following, we assume we want to
support n compressed formats for one operator.</p>
      <p>A rst approach to support compressed intermediates is
shown in Figure 2b. The original operator for uncompressed
data is surrounded by a wrapper, which temporarily
decompresses the inputs and recompresses the outputs. This
(a)
(b)
(c)
(d)
U
OPU</p>
      <p>U
U</p>
      <p>C
U
OPU</p>
      <p>U</p>
      <p>B
U
A</p>
      <p>C
B
OPB</p>
      <p>B
B
A</p>
      <p>C
OPABC
A B
baseline
operator</p>
      <p>
        wrapper with
(de)compression
wrapper with
transformation
specialized
operator
approach is called transient decompression and was
proposed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], but to the best of our knowledge, it has never
been investigated in practice. For e ciency, the
decompression(recompression) should not work on the entire
inputs(outputs), but on small chunks tting into the L1 cache.
Changing the compressed format of the intermediates is
possible by con guring the wrapper's input and output formats
accordingly. The advantage of this approach is its simplicity:
It reuses the existing operator and relies only on n already
existing (de)compression algorithms. However, it does not
exploit the bene ts of working directly on compressed data.
      </p>
      <p>
        The second approach is to adapt the operator such that
it can work directly on compressed data (Figure 2c).
Existing works such as [
        <xref ref-type="bibr" rid="ref1 ref13 ref18">1, 13, 18</xref>
        ] have already proposed certain
operators on certain compressed formats. We plan to
contribute to this line of research by covering the formats of
recent vectorized compression algorithms. We have already
investigated a SUM operator on compressed data and
Figure 1d illustrates how signi cantly its performance depends
on the data properties. We assume a common format for all
inputs and outputs of the operator; for arbitrary
combinations of formats, the operator is again wrapped. However,
in this case the wrapper utilizes the direct transformation
algorithms we developed in the structural aspect. Note that
transformations are required only for those inputs(outputs)
that are not represented in the operator's native format.
The idea of bringing compressed inputs into a common
format has already been proposed in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], but only for joins on
dictionary encoded data { and without direct
transformations. We expect this approach to yield considerable speed
ups compared to the rst approach, since (i) the compressed
data inside the wrapper is smaller, and (ii) the operator
works directly on the compressed representation, such that
it might, e.g., process more data elements in parallel using
SIMD instructions. This approach requires n variants of the
operator and n2 n transformations, whereby the latter can
be reused for all other operators. Nevertheless, the existence
of a wrapper still causes a certain overhead.
      </p>
      <p>The nal approach tries to maximize the e ciency by
tailoring the operator to a speci c combination of formats
(Figure 2d), making a wrapper unnecessary. Unfortunately,
this approach implies the highest integration e ort,
requiring ni+o operator variants. Thus, we intend to evaluate the
potential of this approach rst by considering a few
promising combinations. If the results show signi cant
improvements over the second approach, we could address the high
integration e ort, e.g., using code generation techniques.</p>
      <p>The investigation of the above approaches is our current
work-in-progress. Our ultimate goal is to integrate them
into an existing column-store, most likely into MonetDB.</p>
    </sec>
    <sec id="sec-6">
      <title>4. OPTIMIZATION ASPECT</title>
      <p>
        There is no single-best compression algorithm, but the
decision depends on the data characteristics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Thus,
compression must be employed wisely in a query plan to make
its bene ts outweigh its computational overhead. This
motivates the development of compression-aware query
optimization strategies, our future work in the optimization aspect.
      </p>
      <p>The query optimizer is one of the most complex
components of a DBMS. The crucial tasks it ful lls { such as
algebraic restructuring and mapping logical to physical
operators { are still fundamental for compressed query
execution. Due to the high complexity, deeply integrating our
compression-aware strategies into an existing optimizer is
beyond the scope of this thesis. Instead, we envision a
second optimization phase. This phase takes the optimal plan
output by an existing optimizer as input and enriches it with
compression by selecting an appropriate compressed format
for each intermediate and replacing the physical operators
by our derived operators for compressed data (Figure 3). In
the following, we brie y describe the research challenges we
will have to face to achieve this goal.</p>
      <p>Local vs. global optimization. A simple approach
could be to select the best format for each intermediate in
isolation. While this implies a small search space, it might
fail to nd the optimal plan, e.g., by changing the format
too often. A global optimization, on the other hand, requires
e ective pruning rules to cope with the huge search space.</p>
      <p>Creation of a cost model. Due to the complex
behavior of lightweight compression algorithms and, therefore,
the operators based on them, the comparison of alternative
decisions should be based on a cost model. Given a set of
data properties, this model must provide estimates for, e.g.,
the compression rate and operator runtimes.</p>
      <p>Estimation of the data characteristics. To use the
cost model e ectively, the characteristics of the data must
be known. However, estimating the properties of all
intermediates prior to query execution is non-trivial. Erroneous
estimates might result in sub-optimal decisions. Therefore,
adaptive optimization strategies might be a solution.</p>
    </sec>
    <sec id="sec-7">
      <title>5. CONCLUSIONS</title>
      <p>Modern in-memory column-stores address the
RAM-CPUbottleneck through lightweight data compression. However,
employing compression has not been investigated su ciently
for intermediate results, although they o er great potential
for performance improvement. In this context, we
introduced our vision of a balanced query processing based on
compressed intermediates. We discussed all relevant aspects
of the problem in detail: (1) Our completed work in the
structural aspect, where we contributed (i) an extensive
experimental survey of lightweight compression algorithms and
(ii) direct transformation algorithms. (2) Our ongoing work
in the operational aspect, where we contribute di erent
variants of physical operators on compressed data. (3) Our
future work in the optimization aspect, where we will
contribute compression-aware query optimization strategies.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was partly funded by the German Research
Foundation (DFG) in the context of the project "Lightweight
Compression Techniques for the Optimization of Complex
Database Queries" (LE-1416/26-1).</p>
      <p>existing
query optimizer</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          .
          <article-title>Integrating compression and execution in column-oriented database systems</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <volume>671</volume>
          {
          <fpage>682</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          and
          <string-name>
            <surname>M. L. Kersten.</surname>
          </string-name>
          <article-title>MIL primitives for querying a fragmented world</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <volume>101</volume>
          {
          <fpage>119</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Korn</surname>
          </string-name>
          .
          <article-title>Query optimization in compressed database systems</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <volume>271</volume>
          {
          <fpage>282</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Damme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Habich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hildebrandt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          .
          <article-title>Insights into the comparative evaluation of lightweight data compression algorithms</article-title>
          .
          <source>In EDBT</source>
          , pages
          <volume>562</volume>
          {
          <fpage>565</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Damme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Habich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hildebrandt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          .
          <article-title>Lightweight data compression algorithms: An experimental survey (experiments and analyses)</article-title>
          .
          <source>In EDBT</source>
          , pages
          <volume>72</volume>
          {
          <fpage>83</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Damme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Habich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          .
          <article-title>A benchmark framework for data compression techniques</article-title>
          .
          <source>In TPCTC</source>
          , pages
          <volume>77</volume>
          {
          <fpage>93</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Damme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Habich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          .
          <article-title>Direct transformation techniques for compressed data: General approach and application scenarios</article-title>
          .
          <source>In ADBIS</source>
          , pages
          <volume>151</volume>
          {
          <fpage>165</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Shaft</surname>
          </string-name>
          .
          <article-title>Compressing relations and indexes</article-title>
          .
          <source>In ICDE</source>
          , pages
          <volume>370</volume>
          {
          <fpage>379</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hildebrandt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Habich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Damme</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          .
          <article-title>Compression-aware in-memory query processing: Vision, system design and beyond</article-title>
          .
          <source>In ADMS/IMDM@VLDB</source>
          , pages
          <volume>40</volume>
          {
          <fpage>56</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>D. A.</surname>
          </string-name>
          <article-title>Hu man. A method for the construction of minimum-redundancy codes</article-title>
          .
          <source>Proceedings of the Institute of Radio Engineers</source>
          ,
          <volume>40</volume>
          (
          <issue>9</issue>
          ):
          <volume>1098</volume>
          {
          <fpage>1101</fpage>
          ,
          <year>1952</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Idreos</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Gro en</article-title>
          , N. Nes,
          <string-name>
            <given-names>S.</given-names>
            <surname>Manegold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Mullender</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. L. Kersten.</surname>
          </string-name>
          <article-title>MonetDB: Two decades of research in column-oriented database architectures</article-title>
          .
          <source>Bulletin of the IEEE Computer Society Technical Committee on Data Engineering</source>
          ,
          <volume>35</volume>
          (
          <issue>1</issue>
          ):
          <volume>40</volume>
          {
          <fpage>45</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kissinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Habich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          . QPPT:
          <article-title>query processing on pre x trees</article-title>
          .
          <source>In CIDR</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Attaluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Barber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chainani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Draese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Idreos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lightstone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Lohman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Morfonios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murthy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Pandis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Samy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sidle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stolze</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Zhang.</surname>
          </string-name>
          <article-title>Joins on encoded and partitioned data</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>7</volume>
          (
          <issue>13</issue>
          ):
          <volume>1355</volume>
          {
          <fpage>1366</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lemire</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Boytsov</surname>
          </string-name>
          .
          <article-title>Decoding billions of integers per second through vectorization</article-title>
          .
          <source>Software { Practice and Experience</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>29</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          .
          <article-title>E ciently compiling e cient query plans for modern hardware</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>9</issue>
          ):
          <volume>539</volume>
          {
          <fpage>550</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Roth</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. J. Van</given-names>
            <surname>Horn</surname>
          </string-name>
          .
          <article-title>Database compression</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>22</volume>
          (
          <issue>3</issue>
          ):
          <volume>31</volume>
          {
          <fpage>39</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gemulla</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          .
          <article-title>Fast integer compression using SIMD instructions</article-title>
          .
          <source>In DaMoN</source>
          , pages
          <volume>34</volume>
          {
          <fpage>40</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Willhalm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Popovici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Boshmaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Plattner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeier</surname>
          </string-name>
          , and
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Scha ner</article-title>
          . SIMD-Scan:
          <article-title>Ultra fast in-memory table scan using on-chip vector processing units</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <volume>385</volume>
          {
          <fpage>394</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lemire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Wen</surname>
          </string-name>
          .
          <article-title>A general simd-based approach to accelerating compression algorithms</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>33</volume>
          (
          <issue>3</issue>
          ):
          <volume>15</volume>
          :1{
          <fpage>15</fpage>
          :
          <fpage>28</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ziv</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Lempel</surname>
          </string-name>
          .
          <article-title>A universal algorithm for sequential data compression</article-title>
          .
          <source>IEEE Transactions on Information Theory</source>
          ,
          <volume>23</volume>
          (
          <issue>3</issue>
          ):
          <volume>337</volume>
          {
          <fpage>343</fpage>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zukowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          .
          <article-title>Super-scalar RAM-CPU cache compression</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>