=Paper=
{{Paper
|id=Vol-1882/paper05
|storemode=property
|title=Query Processing Based on Compressed Intermediates
|pdfUrl=https://ceur-ws.org/Vol-1882/paper05.pdf
|volume=Vol-1882
|authors=Patrick Damme
|dblpUrl=https://dblp.org/rec/conf/vldb/Damme17
}}
==Query Processing Based on Compressed Intermediates==
<pdf width="1500px">https://ceur-ws.org/Vol-1882/paper05.pdf</pdf>
<pre>
     Query Processing Based on Compressed Intermediates

                                                              Patrick Damme
                                                  Supervised by Wolfgang Lehner
                                                    Database Technology Group
                                              Technische Universität Dresden, Germany
                                                   patrick.damme@tu-dresden.de


ABSTRACT                                                                  column-stores. Furthermore, especially for lightweight com-
Modern in-memory column-stores employ lightweight data                    pression algorithms, many operators can directly process the
compression to tackle the growing gap between processor                   compressed data without prior decompression.
speed and main memory bandwidth. However, the com-                           Secondly, in main memory-centric column-stores, access-
pression of intermediate results has not been investigated                ing the intermediate results during query processing is as
sufficiently although accessing intermediates is as expensive             expensive as accessing the base data, since both reside in
as accessing the base data in these systems. Therefore, we                main memory. Thus, intermediates offer a great potential
introduce our vision of a balanced query processing based on              for performance improvement, which can be exploited in two
compressed intermediates to improve query performance. In                 orthogonal ways: (1) intermediates should be avoided when-
this paper, we provide an overview of the important research              ever possible [12, 15], or (2) intermediates should be repre-
challenges on the way to this goal, present our contributions             sented in a way that facilitates efficient query processing.
so far, and give an outlook on our remaining steps.                          In this thesis, we focus on the second approach by inves-
                                                                          tigating lightweight compression of intermediates in main
                                                                          memory-centric column-stores. This direction has not been
1.    INTRODUCTION                                                        investigated sufficiently in the literature so far. Existing sys-
   With increasingly large amounts of data being collected in             tems usually keep the data compressed only until an opera-
numerous areas ranging from science to industry, the impor-               tor cannot process the compressed data directly, whereupon
tance of online analytical processing (OLAP) workloads in-                the data is decompressed, but not recompressed – due to the
creases constantly. OLAP queries typically address a small                resulting computational overhead. However, using modern
number of columns, but a high number of rows and are,                     hardware and state-of-the-art lightweight compression algo-
thus, most efficiently processed by column-stores. The sig-               rithms, this computational overhead can be outweighed by
nificant developments in the main memory domain in recent                 the benefits of compressed data. Thus, our vision is a bal-
years have rendered it possible to keep even large datasets               anced query processing based on compressed intermediates.
entirely in main memory. Consequently, modern column-                     That is, in a query execution plan of compression-aware
stores follow a main memory-centric architecture. These                   physical operators, every intermediate result shall be rep-
systems have to face some new architectural challenges.                   resented using a suitable lightweight compression algorithm
   Firstly, they suffer from the new bottleneck between main              which is selected in a compression-aware query optimization
memory and the CPU caused by the contrast between in-                     such that the benefits of compression outweigh its costs. To
creasingly fast multi-core processors and the comparably low              achieve this goal, this thesis addresses three aspects of the
main memory bandwidth. To address this problem, column-                   problem: the structural aspect on the basics of lightweight
stores make extensive use of data compression. The reduced                compression (Section 2), the operational aspect on physical
data sizes achievable through compression result in lower                 operators for compressed data (Section 3), and the opti-
transfer times, a better utilization of the cache hierarchy,              mization aspect on compression-aware optimization strate-
and less TLB misses. However, classical heavyweight com-                  gies (Section 4). These aspects are designed to build upon
pression algorithms such as Huffman [10] or Lempel Ziv [20]               each other, as will become clear in the following sections.
are too slow for in-memory systems. Therefore, numerous
lightweight compression algorithms such as differential cod-
ing [14, 16] and null suppression [1, 16] have been proposed,
                                                                          2.   STRUCTURAL ASPECT
which are much faster and, thus, suitable for in-memory                      The structural aspect lays the foundations of this thesis by
                                                                          focusing on the basics of lightweight compression algorithms
                                                                          and on efficient transformations between the compressed for-
                                                                          mats of different algorithms. Thereby, our primary focus is
                                                                          on integer sequences due to their outstanding importance in
                                                                          column-stores: Other fixed-width data types, such as dec-
                                                                          imals, can be stored as integers and variable-width data
                                                                          types, such as strings, usually need to be represented by
Proceedings of the VLDB 2017 PhD Workshop, August 28, 2017. Munich,       fixed-width integer codes from a dictionary to enable effi-
Germany.
Copyright (c) 2017 for this paper by its authors. Copying permitted for   cient processing. We have already completed our planned
private and academic purposes.                                            research in this aspect of the thesis.
2.1    Lightweight Data Compression                                        (a) compr. time [sec]              (b) decompr. time [sec]
                                                                    0.8                                0.20
   In the field of lossless lightweight data compression, we
distinguish between techniques, i.e., the abstract ideas of         0.6                                0.15
how compression works conceptually, and algorithms, i.e.,           0.4                                0.10
concrete instanciations of one or more techniques. So far,          0.2                                0.05
we consider five lightweight compression techniques for se-
quences of integers: frame-of-reference (FOR) [8, 21], differ-      0.0                                0.00
ential coding (DELTA) [14, 16], dictionary coding (DICT)                   DS 1     DS 2      DS 3             DS 1     DS 2   DS 3
[1, 21], run-length encoding (RLE) [1, 16], and null suppres-             (c) compr. rate [bits/int]          (d) SUM op. time [sec]
sion (NS) [1, 16]. FOR and DELTA represent each data                64                                 0.06
element as the difference to either a certain given reference       48                                 0.04
value (FOR) or to its predecessor (DELTA). DICT replaces
                                                                    32
each value by its unique key in a dictionary. The objective of                                         0.02
these three well-known techniques is to represent the origi-        16
nal data as a sequence of small integers, which is then suited       0                                 0.00
for the actual compression using the NS technique. NS is                   DS 1     DS 2      DS 3             DS 1     DS 2   DS 3
the most studied lightweight compression technique. Its ba-
sic idea is the omission of leading zero bits in small integers.                     4­Wise NS          SIMD­FastPFOR           RLE
Finally, RLE tackles uninterrupted sequences of occurrences
of the same value, so called runs. Each run is represented          DS Value distribution           Run length
by its value and length. Obviously, these techniques exploit         1 normal(µ = 210 , σ = 20)     constant(1)
different data characteristics, such as the value range, the         2 67% normal(µ = 26 , σ = 20) constant(1)
number of distinct values, and repeated values.                        33% normal(µ = 227 , σ = 20)
   In the literature, numerous algorithms have been proposed         3 uniform([0, 232 − 1])        normal(µ = 20, σ = 2)
for these techniques, e.g., [1, 8, 14, 16, 17, 19, 21], to name
just a few examples. For our purposes of applying decom-           Figure 1: Behavior of three compression algorithms
pression and recompression during query execution, we de-          (a-c) and a SUM operator on the respective com-
pend on highly efficient implementations of these existing         pressed formats (d) for three datasets with different
algorithms. One way to achieve these is to use single instruc-     characteristics, each having 100M data elements.
tion multiple data (SIMD) extensions of modern processors,
such as Intel’s SSE and AVX, which allow the application
of one operation to multiple data elements at once. In fact,       compression algorithm, but the choice depends on the data
the employment of SIMD instructions has been the major             properties and is non-trivial. Our extensive experimental
driver of the research in this field in recent years [14, 17,      survey was made feasible by our benchmark framework for
19]. We have contributed to the corpus of proposed efficient       compression algorithms [6], which facilitates an efficient and
implementations, e.g., through our vectorized algorithm for        organized evaluation process.
RLE [5], which is based on vectorized comparisons.
   As lightweight compression algorithms are always tailored       2.2     Direct Data Transformation
to certain data characteristics, their behavior in terms of           Assuming that the optimal compression algorithm was
performance and compression rate depends strongly on the           selected for a column, this algorithm might become sub-
data. Selecting the best algorithm for a given base column         optimal if the data properties change after the decision. The
or intermediate requires a thorough understanding of the al-       properties of the base data might change over time through
gorithms’ behaviors subject to the data properties. Unfor-         DML operations. While this case might be handled offline,
tunately, a sufficient comparative analysis had been miss-         the problem is more urgent for intermediates, whose proper-
ing in the literature. Thus, we conducted an experimental          ties can change dramatically through the application of an
survey of several vectorized state-of-the-art compression al-      operator. For instance, a column containing outliers might
gorithms from all five techniques as well as combinations          be stored in a format that can tolerate these, perhaps at the
thereof on numerous datasets, whereby we systematically            price of a slow decompression. A selection operator might re-
varied the data characteristics [4, 5]. Figure 1a-c provide a      move the outliers, making a faster non-outlier-tolerant algo-
sample of our results (the code was compiled using g++ -O3         rithm a better choice than the original one. This motivates
and the evaluation system was equipped with an Intel Core          the need for a transformation of the compressed represen-
i7-4710MQ and 16 GB RAM). Our comparative analysis re-             tation of the data in some source format to the compressed
vealed several new insights. For instance, we could show           representation in some destination format.
how different data distributions affect the algorithms. We            A naı̈ve approach would take two steps: (1) Apply the
found that especially outliers in the distributions lead to a      decompression algorithm of the source format to the data,
significant degradation in the performance and/or compres-         thereby materialize the entire uncompressed data in main
sion rate of certain algorithms. Furthermore, for fixed data       memory. (2) Apply the compression algorithm of the des-
characteristics, the best algorithm regarding performance          tination format to that uncompressed data. The advan-
is not necessarily the best regarding compression rate. Fi-        tage of this approach is that it builds only upon existing
nally, we could show that combinations of different tech-          (de)compression algorithms. However, since it materializes
niques can heavily improve the compression rate and even           the uncompressed data in main memory, it is prohibitively
the (de)compression speed depending on the data. Summing           expensive from our point of view, since we need to transform
up our findings, we can state that there is no single-best         intermediates during query execution.
   To address this issue, we introduced direct transformation      (a)            (b)               (c)              (d)
                                                                                         C                 C
algorithms in [7]. This novel class of algorithms is capable
                                                                          U              U                 B                C
of accomplishing the transformation in one step, i.e., with-
                                                                         OPU            OPU               OPB              OPABC
out the materialization of the uncompressed data. To pro-
vide an example, we proposed a direct transformation from                U   U          U   U             B   B            A   B
RLE to 4-Wise NS. In 4-Wise NS [17], the compressed data                                A   B             A
is a sequence of compressed blocks of four data elements
                                                                     baseline      wrapper with     wrapper with      specialized
each. The direct transformation algorithm RLE-2-4WiseNS              operator    (de)compression   transformation      operator
roughly works as follows: For each pair of run value and run
length in the RLE-compressed input, it creates one block of        Figure 2: Integration of compression and operators.
four copies of the run value, compresses it once, and stores       A to C are compressed formats; U is uncompressed.
it out multiple times until it reaches the run length. That
way, it saves the intermediate store and load as well as the re-
peated block compression performed by the naı̈ve approach.
                                                                   approach is called transient decompression and was pro-
Our experiments showed that this and other direct transfor-
                                                                   posed in [3], but to the best of our knowledge, it has never
mations yield significant speed ups over the naı̈ve approach,
                                                                   been investigated in practice. For efficiency, the decom-
if the data characteristics are suitable.
                                                                   pression(recompression) should not work on the entire in-
                                                                   puts(outputs), but on small chunks fitting into the L1 cache.
3.    OPERATIONAL ASPECT                                           Changing the compressed format of the intermediates is pos-
   In our currently ongoing work in the operational aspect,        sible by configuring the wrapper’s input and output formats
we investigate how to integrate lightweight data compres-          accordingly. The advantage of this approach is its simplicity:
sion into the query execution. Thereby, we assume that             It reuses the existing operator and relies only on n already
a multitude of compression algorithms is available to the          existing (de)compression algorithms. However, it does not
system. We addressed the challenge of easily fulfilling this       exploit the benefits of working directly on compressed data.
prerequisite in [9].                                                  The second approach is to adapt the operator such that
                                                                   it can work directly on compressed data (Figure 2c). Exist-
3.1    Processing Model for Compressed Data                        ing works such as [1, 13, 18] have already proposed certain
   Our vision of a query processing based on compressed            operators on certain compressed formats. We plan to con-
intermediates can best be investigated using a processing          tribute to this line of research by covering the formats of
model that actually materializes all intermediates. Further-       recent vectorized compression algorithms. We have already
more, since we focus on column-stores and since lightweight        investigated a SUM operator on compressed data and Fig-
compression algorithms are designed for sequences of val-          ure 1d illustrates how significantly its performance depends
ues, all intermediates should use a columnar representation.       on the data properties. We assume a common format for all
Hence, we chose column-at-a-time as the processing model.          inputs and outputs of the operator; for arbitrary combina-
   One example of a system that uses this processing model         tions of formats, the operator is again wrapped. However,
is MonetDB [11], which internally expresses queries in the         in this case the wrapper utilizes the direct transformation
Monet Algebraic Language (MAL) [2]. The central data               algorithms we developed in the structural aspect. Note that
structure of MAL is the binary association table (BAT),            transformations are required only for those inputs(outputs)
which is used to represent both, base data and intermedi-          that are not represented in the operator’s native format.
ates. Conceptually, a BAT consists of a head containing            The idea of bringing compressed inputs into a common for-
record ids and a tail containing the actual data. However,         mat has already been proposed in [13], but only for joins on
since the head always contains a dense sequence of integers,       dictionary encoded data – and without direct transforma-
it can be omitted. Thus, a BAT is essentially just an array        tions. We expect this approach to yield considerable speed
of data elements, making it a perfect target for lightweight       ups compared to the first approach, since (i) the compressed
compression. MAL formally defines a set of operators that          data inside the wrapper is smaller, and (ii) the operator
consume and produce BATs, such as selection, join, and             works directly on the compressed representation, such that
projection. We decided to use MAL as the foundation of             it might, e.g., process more data elements in parallel using
our work, but intend to adapt MAL operators to multiple            SIMD instructions. This approach requires n variants of the
compressed formats, which we discus in the next section.           operator and n2 − n transformations, whereby the latter can
                                                                   be reused for all other operators. Nevertheless, the existence
3.2    Physical Operators for Compressed Data                      of a wrapper still causes a certain overhead.
   When adapting MAL operators to compressed data, dif-               The final approach tries to maximize the efficiency by
ferent degrees of integration are possible. Figure 2 presents      tailoring the operator to a specific combination of formats
the cases we plan to investigate. In general, an operator          (Figure 2d), making a wrapper unnecessary. Unfortunately,
might consume i inputs and produce o outputs, each of              this approach implies the highest integration effort, requir-
which might be represented in its individual compressed for-       ing ni+o operator variants. Thus, we intend to evaluate the
mat. Figure 2a shows the baseline case of processing only          potential of this approach first by considering a few promis-
uncompressed data. In the following, we assume we want to          ing combinations. If the results show significant improve-
support n compressed formats for one operator.                     ments over the second approach, we could address the high
   A first approach to support compressed intermediates is         integration effort, e.g., using code generation techniques.
shown in Figure 2b. The original operator for uncompressed            The investigation of the above approaches is our current
data is surrounded by a wrapper, which temporarily de-             work-in-progress. Our ultimate goal is to integrate them
compresses the inputs and recompresses the outputs. This           into an existing column-store, most likely into MonetDB.
4.   OPTIMIZATION ASPECT                                                                  existing                                       compr.-aware


                                                                              T


                                                                                                                      our contribution
                                                                     SQL query ∙ LEC
                                                                                       query optimizer                                   opt. strategies


                                                                             ∙∙
                                                                    SE
   There is no single-best compression algorithm, but the


                                                                                                                                                           w/ compr.
                                                                                                          best plan
                                                                                                         w/o compr.


                                                                                                                                                           best plan
decision depends on the data characteristics [5]. Thus, com-
pression must be employed wisely in a query plan to make
its benefits outweigh its computational overhead. This moti-                            2    1                                           2     1
                                                                                                   3                                                  3
vates the development of compression-aware query optimiza-
tion strategies, our future work in the optimization aspect.
                                                                 Figure 3: Compression-aware query optimization.
   The query optimizer is one of the most complex compo-
                                                                 Colors in the query plans stand for different com-
nents of a DBMS. The crucial tasks it fulfills – such as al-
                                                                 pressed formats; grey stands for uncompressed data.
gebraic restructuring and mapping logical to physical oper-
ators – are still fundamental for compressed query execu-
tion. Due to the high complexity, deeply integrating our         6.[1] D.REFERENCES
                                                                          J. Abadi, S. Madden, and M. Ferreira. Integrating
compression-aware strategies into an existing optimizer is            compression and execution in column-oriented database
beyond the scope of this thesis. Instead, we envision a sec-          systems. In SIGMOD, pages 671–682, 2006.
ond optimization phase. This phase takes the optimal plan         [2] P. A. Boncz and M. L. Kersten. MIL primitives for querying a
output by an existing optimizer as input and enriches it with         fragmented world. The VLDB Journal, 8(2):101–119, 1999.
                                                                  [3] Z. Chen, J. Gehrke, and F. Korn. Query optimization in
compression by selecting an appropriate compressed format             compressed database systems. In SIGMOD, pages 271–282,
for each intermediate and replacing the physical operators            2001.
by our derived operators for compressed data (Figure 3). In       [4] P. Damme, D. Habich, J. Hildebrandt, and W. Lehner. Insights
the following, we briefly describe the research challenges we         into the comparative evaluation of lightweight data
                                                                      compression algorithms. In EDBT, pages 562–565, 2017.
will have to face to achieve this goal.                           [5] P. Damme, D. Habich, J. Hildebrandt, and W. Lehner.
   Local vs. global optimization. A simple approach                   Lightweight data compression algorithms: An experimental
could be to select the best format for each intermediate in           survey (experiments and analyses). In EDBT, pages 72–83,
                                                                      2017.
isolation. While this implies a small search space, it might
                                                                  [6] P. Damme, D. Habich, and W. Lehner. A benchmark
fail to find the optimal plan, e.g., by changing the format           framework for data compression techniques. In TPCTC, pages
too often. A global optimization, on the other hand, requires         77–93, 2015.
effective pruning rules to cope with the huge search space.       [7] P. Damme, D. Habich, and W. Lehner. Direct transformation
                                                                      techniques for compressed data: General approach and
   Creation of a cost model. Due to the complex be-                   application scenarios. In ADBIS, pages 151–165, 2015.
havior of lightweight compression algorithms and, therefore,      [8] J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing
the operators based on them, the comparison of alternative            relations and indexes. In ICDE, pages 370–379, 1998.
decisions should be based on a cost model. Given a set of         [9] J. Hildebrandt, D. Habich, P. Damme, and W. Lehner.
                                                                      Compression-aware in-memory query processing: Vision,
data properties, this model must provide estimates for, e.g.,         system design and beyond. In ADMS/IMDM@VLDB, pages
the compression rate and operator runtimes.                           40–56, 2016.
   Estimation of the data characteristics. To use the            [10] D. A. Huffman. A method for the construction of
cost model effectively, the characteristics of the data must          minimum-redundancy codes. Proceedings of the Institute of
                                                                      Radio Engineers, 40(9):1098–1101, 1952.
be known. However, estimating the properties of all inter-       [11] S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender,
mediates prior to query execution is non-trivial. Erroneous           and M. L. Kersten. MonetDB: Two decades of research in
estimates might result in sub-optimal decisions. Therefore,           column-oriented database architectures. Bulletin of the IEEE
                                                                      Computer Society Technical Committee on Data Engineering,
adaptive optimization strategies might be a solution.                 35(1):40–45, 2012.
                                                                 [12] T. Kissinger, B. Schlegel, D. Habich, and W. Lehner. QPPT:
5.   CONCLUSIONS                                                      query processing on prefix trees. In CIDR, 2013.
                                                                 [13] J. Lee, G. K. Attaluri, R. Barber, N. Chainani, O. Draese,
   Modern in-memory column-stores address the RAM-CPU-                F. Ho, S. Idreos, M. Kim, S. Lightstone, G. M. Lohman,
bottleneck through lightweight data compression. However,             K. Morfonios, K. Murthy, I. Pandis, L. Qiao, V. Raman, V. K.
                                                                      Samy, R. Sidle, K. Stolze, and L. Zhang. Joins on encoded and
employing compression has not been investigated sufficiently          partitioned data. PVLDB, 7(13):1355–1366, 2014.
for intermediate results, although they offer great potential    [14] D. Lemire and L. Boytsov. Decoding billions of integers per
for performance improvement. In this context, we intro-               second through vectorization. Software – Practice and
duced our vision of a balanced query processing based on              Experience, 45(1):1–29, 2015.
                                                                 [15] T. Neumann. Efficiently compiling efficient query plans for
compressed intermediates. We discussed all relevant aspects           modern hardware. PVLDB, 4(9):539–550, 2011.
of the problem in detail: (1) Our completed work in the          [16] M. A. Roth and S. J. Van Horn. Database compression.
structural aspect, where we contributed (i) an extensive ex-          SIGMOD Record, 22(3):31–39, 1993.
perimental survey of lightweight compression algorithms and      [17] B. Schlegel, R. Gemulla, and W. Lehner. Fast integer
                                                                      compression using SIMD instructions. In DaMoN, pages 34–40,
(ii) direct transformation algorithms. (2) Our ongoing work           2010.
in the operational aspect, where we contribute different vari-   [18] T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier,
ants of physical operators on compressed data. (3) Our fu-            and J. Schaffner. SIMD-Scan: Ultra fast in-memory table scan
ture work in the optimization aspect, where we will con-              using on-chip vector processing units. PVLDB, 2(1):385–394,
                                                                      2009.
tribute compression-aware query optimization strategies.         [19] W. X. Zhao, X. Zhang, D. Lemire, D. Shan, J. Nie, H. Yan, and
                                                                      J. Wen. A general simd-based approach to accelerating
                                                                      compression algorithms. ACM Transactions on Information
Acknowledgments                                                       Systems, 33(3):15:1–15:28, 2015.
This work was partly funded by the German Research Foun-         [20] J. Ziv and A. Lempel. A universal algorithm for sequential data
                                                                      compression. IEEE Transactions on Information Theory,
dation (DFG) in the context of the project ”Lightweight               23(3):337–343, 1977.
Compression Techniques for the Optimization of Complex           [21] M. Zukowski, S. Héman, N. Nes, and P. A. Boncz. Super-scalar
Database Queries” (LE-1416/26-1).                                     RAM-CPU cache compression. In ICDE, 2006.

</pre>