<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Position Caching in a Column-Store with Late Materialization: An Initial Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viacheslav Galaktionov</string-name>
          <email>viacheslav.galaktionov@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgeniy Klyuchikov</string-name>
          <email>evgeniy.klyuchikov@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Chernishev</string-name>
          <email>chernishev@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Information Systems Engineering Lab</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>JetBrains Research</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Select R</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>.F From R</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Where R</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>.A = R</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>.G = R</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saint Petersburg State University</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A common technique to speed up DBMS query processing is to cache parts of query results and reuse them later. In this paper we propose a novel approach which is aimed specifically at caching intermediates in a late-materialization-oriented column-store. The idea of our approach is to cache positions (row numbers) instead of data values. The small size of positional representation is a valuable advantage: cache can accommodate more entries and consider intermediates that involve “heavy” operators, e.g. joins of large tables. Position caching thrives in late materialization environments since position exchange is prevalent in them. In particular, expensive predicates and heavy joins are usually processed based on positions. Our approach is able to cache them eficiently, thus significantly reducing system load. To assess the importance of intermediates our position caching technique features a cost model that is based on usage statistics and complexity estimations. Furthermore, to allow intermediate reuse for the queries that are not fully identical, we proposed an eficient query containment checking algorithm. Several policies for cache population and eviction were proposed. Finally, our approach is enhanced by lightweight compression schemes. Experimental evaluation was performed using a stream of randomly generated Star-Schema-Benchmark-like queries. It showed up to 3 times improvement in query run times. Additionally, compressing the intermediates reduces the space requirements by up to 2 times without a noticeable performance overhead.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Late materialization [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a query processing technique that aims
to operate on positions as long as possible. Essentially, it defers
tuple reconstruction to later stages of a query plan. The goal is to
conserve disk bandwidth by: 1) reading individual columns only
when they are necessary for processing at a given point, and 2)
organizing column access in such an order that subsequent reads
require fewer disk accesses due to data filtering.
      </p>
      <p>
        This approach leads to unconventional query plans with novel
opportunities [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for eficient query processing. In such query
plans operators: 1) usually ingest only necessary columns, 2)
exchange not only data, but also positions. Positions are essentially
row ids which will be later used to request data. An example of
late materialization processing is presented in Figure 1.
      </p>
      <p>Here, for column G, query processor reads only the values
that have passed join predicate. For columns E and H it is even
better: query processor will have to read values that have passed
both join predicates. Of course, the benefit of such processing</p>
      <sec id="sec-1-1">
        <title>Project</title>
      </sec>
      <sec id="sec-1-2">
        <title>Join A=D</title>
      </sec>
      <sec id="sec-1-3">
        <title>Project</title>
      </sec>
      <sec id="sec-1-4">
        <title>Join G=K</title>
        <p>K</p>
        <p>F
C</p>
        <p>B</p>
        <p>A</p>
        <p>D</p>
        <p>G</p>
        <p>E</p>
        <p>H
scheme depends on many parameters: join filtering power, data
distribution, disk parameters such as seek times and bandwidth.</p>
        <p>
          Late materialization has formed the basis of several
columnoriented DBMSes [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Its many aspects, such as the design of
LMenabled operators [
          <xref ref-type="bibr" rid="ref17 ref3">3, 17</xref>
          ] and the query processing schemes [
          <xref ref-type="bibr" rid="ref11 ref6">6,
11</xref>
          ], have been extensively studied. However, some aspects
received less attention.
        </p>
        <p>In this paper we further investigate one of them — intermediate
result caching. Unlike many existing caching techniques, our
approach does not cache the data itself. Instead, it aims to cache
useful data in positional form which allows us to:
(1) reduce the memory footprint, since all table row data,
which can be arbitrarily large, is replaced with a single
position (a 32-bit number)
(2) use thusly saved memory to store intermediates that are
closer to the bottom part of the plan. It comes more costly,
but on the other hand, they have higher reuse chance.</p>
        <p>Our caching approach also employs query containment checks
which allows reusing intermediates from queries that are not
fully identical, but one of them subsumes another.</p>
        <p>To demonstrate the viability of the proposed position caching,
we have implemented it inside PosDB — a distributed disk-based
column-store with late materialization. Since PosDB currently
lacks a full-fledged cost-based optimizer, we had to devise a
number of simpler heuristic algorithms. While our approach is simpler
than existing methods, it still allows us to obtain considerable
performance gains.</p>
        <p>Apart from that we have also implemented compression for
cached positions. Keeping only sequences of integers favours
compression which allows us to reduce memory footprint even
further. For now, we have implemented only delta encoding and
null suppression since our experiments indicated that these
methods result in good space savings with a small overhead.
Overall, the contribution of this paper is the following:
(1) A novel approach to caching intermediates in a
columnstore that caches positions instead of data.
(2) A study of applicability of compression to caching inside
a column-store.
(3) An experimental evaluation of several policies for cache
population and eviction.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Caching intermediates. The idea of reusing intermediate
results for query evaluation in databases has existed for a long time.
It comes in two formulations. In the first one there is a number
of simultaneously running queries and the goal is to generate
query plans that would maximize result sharing. This is called
multi-query optimization [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. The second option is caching:
parts of query plans that were evaluated in the past are stored
and reused when appropriate query arrives. In this paper we
consider the second option.
      </p>
      <p>
        The next two papers study caching of intermediates inside
the MonetDB [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] system family, an in-memory column-store
with late materialization. In reference [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] authors propose to
exploit MonetDB operator-at-a-time processing model. The idea
is to cache and reuse BAT by-products of query execution. The
proposed approach featured plan subsumption and both
interand intra- query reuse. The study [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] addresses the case of
pipelined operator evaluation inside Vectorwise database engine.
      </p>
      <p>
        In reference [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] another mechanism for in-memory result
caching was proposed. The idea is to cache internal data
structures of a database engine in order to lower memory pressure
and preserve cache and register locality. In particular, hash tables
that are produced by hash-joins and hash-aggregations.
      </p>
      <p>
        Disk-based caching was considered in reference [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Apart
from being able to cache data, this system can also cache positions.
Also, authors proposed a cost-based optimizer to find the best
plan. Experiments showed that position caching performs better
than data caching when cache entries are stored on disk.
      </p>
      <p>
        Caching and compression. Compression has been used in
databases for a long time [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], however it is column-stores where
it brought significant benefits. Compression in column-stores
is diferent from row-stores: 1) columnar storage results in
homogeneity of data which in turn leads to better compression
rates and, 2) compression should be lightweight, i.e. it should not
excessively load CPU.
      </p>
      <p>
        We reference two papers which we believe to be most
important, while many others can be found. Authors of reference [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
studied several lightweight compression algorithms in a
diskbased column-store. They compared their behavior in case of
eager decompression and in case when query engine operated
on compressed data directly. Also, they have presented a
decision tree for selecting compression method depending on data
characteristics. Compression for in-memory column-stores was
studied in reference [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Authors have proposed and evaluated
three diferent compression schemes which aimed to eficiently
use CPU resources.
      </p>
      <p>To the best of our knowledge, no paper considered
compression in the context of caching intermediates in column-stores.
3</p>
    </sec>
    <sec id="sec-3">
      <title>PROPOSED APPROACH</title>
      <sec id="sec-3-1">
        <title>Preliminaries and prerequisites. A common workflow of a</title>
        <p>DBMS caching subsystem is as follows. When a query arrives,
its plan is generated and inspected for subtrees that a) can be
reused later, b) can reuse current cache. If any are found, they
interact with cache during query evaluation by either a) putting
the resulting data into cache or b) reusing cached data instead of
subtree evaluation.</p>
        <p>In this study we assume that the target DBMS represents
query plans as a tree of operators, each of which returns values
or positions exclusively. As we target late materialization, most of
the filtering and join operators are positional and (due to OLAP
specifics) appear to be the most costly parts of a query plan.</p>
        <p>Our caching approach intentionally targets only the positional
data. As we stated in the introduction, “positional” cache entries
are lightweight and are more susceptible to compression.
Therefore, “heavy” operators from the bottom of the tree (usually costly
iflters and joins) can be cached with more ease. Thus, a significant
decrease in system load can be achieved.</p>
        <p>An essential drawback of position only caching is repeated
data value reading. However, disk subsystem decreases the
overhead. Caching is usually applied when the queries are somewhat
similar in the terms of referenced disk pages. Therefore, the disk
pages necessary for materialization are often already stored
inside the bufer manager.</p>
        <p>Caching: general overview. Our caching algorithm consists
of pre- and post- processing of query plans. The former adds
special operators to gather statistical information and substitutes
subtrees with existing cache entries. The latter collects the final
statistical information and updates the cache if needed.</p>
        <p>The cache consists of query history and a set of cache
entries (each being a list of position blocks). Query history is a list
of subtrees of N most recently executed queries together with
the subtrees that are still cached. Each history entry contains a
pointer to the corresponding cache entry (if any), a high-level
description of the subtree, its result’s size and usage statistic. A
usage statistic is a set of queries the entry can be potentially
useful for. All this information is used in the cost models of cache
population and eviction policies.</p>
        <p>There might not be a perfectly matching cache entry for a
subtree, but at the same time there may be an entry that provides
a larger, subsuming, result. This is determined with a containment
check. If the check is passed, the original subtree is replaced by
this cache entry with a special filtering operator on top.</p>
        <p>Caching: preprocessing. Preprocessing is the most
sophisticated part, see Figure 2. The algorithm iterates through the query
plan and for each subtree : a) checks if it fits into cache b) adds a
special operator PBCounter on top of it c) updates cache history
d) tries to replace  with a (possibly filtered) cache entry.</p>
        <p>The first two steps are related to size estimation of
intermediates. An auxiliary PBCounter is used to count the number of
blocks passed during query evaluation. Then, this number is used
to: a) update the benefit measure of  (described later) b) filter
future subtrees that are guaranteed not to fit into cache.</p>
        <p>After size estimation has been taken care of, we add a history
entry for  and update the usage statistics of other entries that can
(potentially) be used by . Usage statistics in their turn participate
in the cost model of population and eviction policies. Therefore,
we try to add the best yet uncached entry into cache.</p>
        <p>Finally, after the cache was adjusted, we search for a cache
entry that can replace  with minimum additional filtering. If one
is found, we use a special operator FetchCache instead of direct 
evaluation. This operator passes position blocks, decompressing
them if needed.</p>
        <p>Caching: postrocessing. This stage has two goals: gather the
results of every PBCounter that is a part of the current query’s</p>
        <sec id="sec-3-1-1">
          <title>Initial query plan</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Patched query plan</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>Select next</title>
          <p>subtree, S</p>
          <p>no
operator tree, and calculate the benefit estimates for each
intermediate.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Policies for cache population and eviction. We have con</title>
        <p>sidered several diferent policies for adding and removing
intermediates. We assume that for each history entry a numeric
benefit can be calculated. We also define total benefit as a
product of benefit and the usage frequency (number of queries
in the the entry usage statistics).</p>
        <p>For cache population we have considered:
• MIN: choose an uncached intermediate whose total benefit
is larger than minimal among cached intermediates.
• AVG: choose an uncached intermediate whose total benefit
is larger than average among cached intermediates.
Cache eviction has a larger number of options:
• LRU (Least Recently Used): evict intermediates that were
not used for the longest time.
• LFU (Least Frequently Used): evict intermediates with the
shortest history of usage.
• LCS (Largest Cache Space): evict intermediates that occupy
the largest amount of space.
• BENEFIT: evict entries that have the lowest benefit.
• BENEFITHIST: evict entries that have the lowest total
benefit.</p>
        <p>• RANDOM: evict random intermediates.</p>
        <p>Cost model. The benefit of a query history entry is defined
as a function of:
• Subtree complexity: a weighted sum of the subtree
operators. Each operator type corresponds to a constant weight
which is a model parameter;
• Working set complexity: an estimated amount of data
the query executor requests from the storage system to
process the subtree;
• Actual result size: computed by the corresponding PBCounter
operator;
• Usage statistics: a list of unique query numbers that could
(or did) use it.</p>
        <p>Note that we rely on two kinds of complexity: subtree and
working set complexity. While the former favours the“structurally
complex” subtrees, the latter favours operation on larger amounts
of data. They allow to distinguish simple queries on a large data
and complex queries an a small data.</p>
        <p>Overall, the following formula is used to calculate benefit:
   ( ) =</p>
        <p>Complexity( ) × EstimatedSize( ) .</p>
        <p>ActualSize( )</p>
        <p>Query Containment. Implementing a caching subsystem
implies solving the query containment problem. It arises when
it is necessary to determine which cache entry corresponds to a
newly arrived query. Given the sheer expressive power of SQL,
it is hard to check even the simple equivalence of two queries.
However, in order to build an eficient caching subsystem it is
essential to find cached entries that can be used as base table to
evaluate the given query. For example, consider two queries:
• SELECT * FROM T1 WHERE T1.A &gt; 100
• SELECT * FROM T1 WHERE T1.A &gt; 110
If the first query is cached, then the second one can use the stored
answer with additional filtering. Depending on the predicate
selectivity and hardware parameters this approach may result in
savings.</p>
        <p>
          It is well-known that in the general case checking query
containment is very demanding computationally. For example, it was
shown [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] that the problem of conjunctive query containment
is NP complete. Therefore, we had to restrict admissible query
class to “mostly” conjunctive SPJ (Select-Project-Join) queries.
“Mostly” means that OR can appear in WHERE clause only to list
possible individual values of an attribute, e.g. T1.A = 5 OR T1.A
= 6 OR T1.A = 7 is allowed, while T1.A &lt; 5 OR T1.A = 10 is
not. Therefore, the resulting class is limited, but big enough to
cover all queries from the Star Schema Benchmark (SSB). Note,
that since only positional data is involved, there is no need in
checking containment of aggregation or sort.
        </p>
        <p>In order to perform containment checks we have designed a
special data structure to represent query metadata. Essentially, it
records: 1) the set of attribute fragments covered, 2) all conducted
joins, 3) all predicates that were used for filtering.</p>
        <p>Obtaining such a structure is just a matter of traversing the
corresponding operator tree.</p>
        <p>Now, in order to perform a containment check given two such
structures, all that is required is to make sure they share their sets
of attribute fragments and joins, and every predicate from the first
structure is implied by some predicate from the second structure.
If that condition holds, then the second structure describes a
containing query.</p>
        <p>Compression. Cached position blocks appear to be a good
target for compression. First, columnar representation guarantees
data homogeneity and positions are just integers of fixed size.
Secondly, the structure of a query plan explicitly restricts the
possible position block structure. For example, large tables like
SSB LINEORDER are usually treated in the way that preserves
their initial order. Therefore, the corresponding position stream
T1
row1
row2
row3</p>
        <p>T3
3
2
3</p>
        <sec id="sec-3-2-1">
          <title>Join Index</title>
          <p>Sort
Operator
Reader
Operator
internals
local access
remote access
...</p>
          <p>Aggregate
Read(d_datekey)
Join
Read(lo_orderdate)
Read(s_suppkey)
Join
Read(lo_suppkey)
DS(date)
Read(s_region)
Filter
Read(p_partkey)
Join
Read(lo_partkey)
DS(supplier)
DS(lineorder)
Read(p_category)
Filter
DS(part)</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Tuples</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>Columns</title>
          <p>(a) Example of join index
(b) Query plan example
is ordered and relatively dense. On the contrary, small tables
which participate in joins often provide sparse and unordered
sequences of positions.</p>
          <p>Turning to concrete compression strategies, we have tried
diferent encoding methods and found the delta and null
suppression to be the most appropriate. The former allows to narrow
down the range of absolute values inside each position block,
especially when they have ascending or descending order. Then,
the latter transforms 32-bit (signed or unsigned) numbers into
16-bit and 8-bit ones. Thus, up to four times space compression
can be achieved with an almost negligible overhead of two for
loops.</p>
          <p>Two other apparently eficient compression strategies are RLE
and Range. However, experiments demonstrated their limited use
in practice. After the first filtering or join operator the consecutive
positions are usually diferent or repeat just a little. In the latter
case delta + null suppression give a comparable or better result.</p>
          <p>We have also assessed and turned down some of the more
complex compression methods, such as Dictionary compression
and LZW. They either require significant time overhead or fail to
provide noticeable space gains, encoding numbers with almost
the same numbers.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>POSDB</title>
      <p>
        To evaluate our approach we have implemented it inside PosDB —
a distributed column-store with a processing model that adheres
to late materialization. Each PosDB query plan consists of two
parts: position- and tuple-based (see Figure 3b). In the former
(the bottom one) operators exchange only positions that are
represented as a generalized join index [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] (Figure 3a). In our
system this data structure is used to represent results of filter
and join operators. For a filter, it is just a list of positions (row
numbers) that satisfy a predicate. This structure can describe
results of an arbitrary number of joins, e.g. consider the join
index presented in Figure 3a. Here, the second row of 1 was
joined with the first row of 2 and then the resulting tuple was
joined with the second row of 3.
      </p>
      <p>The upper part of any plan consists of operators that work
with tuples, similarly to classic row-stores. The lowest of these
operators is responsible for materialization, i.e. it transforms join
indexes into tuples of actual values. Such materializing operators
perform aggregation and window functions.</p>
      <p>Often, positional operators (e.g. Join, Filter) require data
values in addition to positions. To provide such operators with data, a
set of auxiliary entities called readers were introduced. For
example, ColumnReader retrieves values of a specific attribute, while
SyncReader provides values of several attributes synchronously.</p>
      <p>PosDB is a both distributed and parallel column-store. It is
distributed in terms of both data and query execution: a table may
be fragmented and replicated across several nodes. A number
of table-level fragmentation methods is supported: round-robin,
hash and range partitioning strategies. Query distribution allows
it to run a query on diferent nodes, possible with individual query
plan parts residing on distinct nodes. Query distribution is
implemented by two pairs of special operators: {SendPos, ReceivePos}
and {SendTuple, ReceiveTuple}. Therefore, both positional and
tuple operators can be executed on arbitrary nodes, regardless
where their children reside.</p>
      <p>
        Both inter- and intra-query parallelism [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] are supported. To
implement intra-query parallelism two special operators were
created. Asynchronizer allows to execute an operator tree in a
separate thread and UnionAll is used to collect data from several
subtrees that are executed in their own threads.
      </p>
      <p>
        Recently [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], fully distributed operators like distributed join
and aggregation were added. A detailed description of PosDB
architecture can be found in papers [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>EXPERIMENTAL EVALUATION</title>
      <p>We have evaluated the proposed approach with two experiments,
designed to measure: a) the impact of diferent eviction policies on
query execution performance and b) the impact of compression
on the amount of space used for storing intermediates.</p>
      <p>We fixed the size of cache history to 50 last queries and used
only the MIN population policy. Cache size was varied from 2 to
32 thousands of position blocks, each of them being 32 KB large
when uncompressed.</p>
      <p>Finally, we relied on SSB with SF 10 and conducted a series
of runs, a thousand of random “SSB-like” queries each. These
queries difer from the usual SSB ones only in the way the filtering
predicates are generated: in addition to randomly choosing the
boundaries, we also randomly select the relation. For example,
DATE.D_YEAR = 1993 can become DATE.D_YEAR &lt; 1995.</p>
      <p>Experimental environment. We used a laptop with an
Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz; 8 GB RAM; HDD
4,000
2,000</p>
      <p>LRU</p>
      <p>RANDOM
BENEFITHIST</p>
      <p>LFU
LCS
0 )30,000
1
2
×
2
3(
s/
te20,000
y
b
,
e
c
a
p
s
e10,000
h
c
a
c
d
e
s
U
0</p>
      <p>LRU</p>
      <p>RANDOM
NO COMPRESSION
0
5000
156
Cache size (upper in blocks, lower in MB)
Cache size (upper in blocks, lower in MB)
Samsung ST1000LM024, 5400 rpm; Funtoo Linux, kernel version
5.2.7; gcc 9.2.0.</p>
      <sec id="sec-5-1">
        <title>Query execution performance. First, we measured the per</title>
        <p>formance of query execution when diferent eviction policies are
used. The results are given in Figure 4 and show that with smaller
cache sizes, the choice of a strategy had no significant impact on
query performance. With larger cache sizes, however, BENEFIT
turned out to be the best, RANDOM — the worst.</p>
        <p>Another interesting outcome is that BENEFITHIST, which took
history into account, resulted in worse performance than BENEFIT,
which did not. We suppose that just relying on the usage
frequency results in a longer eviction time for already-cold cache
entries (which were hot some time ago). We believe the situation
could be improved by also considering the times of several last
uses.</p>
        <p>Space consumption due to compression. Next, we enabled
compression for intermediates and ran the same experiment
using only the BENEFIT, LRU, and RANDOM eviction policies. Figure 5
shows the maximum amount of main memory used to store the
intermediates during a particular run. The values were initially
measured in bytes, and then divided by the standard size of a
position block (32 KB).</p>
        <p>It is worth noting that enabling compression has not had any
noticeable efect on query execution times. Therefore, for the
sake of brevity, the corresponding graph has been omitted.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>In this paper we have proposed a novel approach to in-memory
positional caching inside a column-store with late
materialization. Compared to data caching, positional caching allows the
system to save space by storing only integers which can be easily
compressed. Experiments performed in PosDB demonstrated the
viability of this approach. Also we have evaluated a number of
policies for cache population and eviction, including our own.
Future work will include evaluation of caching in a distributed
environment.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Boncz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Stavros</given-names>
            <surname>Harizopoulos</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>The Design and Implementation of Modern Column-Oriented Database Systems</article-title>
          . Now Publishers Inc., Hanover, MA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Abadi</surname>
          </string-name>
          , Samuel Madden, and
          <string-name>
            <given-names>Miguel</given-names>
            <surname>Ferreira</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Integrating Compression and Execution in Column-oriented Database Systems (SIGMOD '06).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Myers</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. J. DeWitt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Madden</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Materialization Strategies in a Column-Oriented DBMS</article-title>
          . In ICDE'
          <volume>07</volume>
          .
          <fpage>466</fpage>
          -
          <lpage>475</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Chungmin</given-names>
            <surname>Melvin</surname>
          </string-name>
          Chen and
          <string-name>
            <given-names>Nicholas</given-names>
            <surname>Roussopoulos</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>The Implementation and Performance Evaluation of the ADMS Query Optimizer: Integrating Query Result Caching and Matching (</article-title>
          <source>EDBT '94)</source>
          .
          <fpage>323</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>George</given-names>
            <surname>Chernishev</surname>
          </string-name>
          , Viacheslav Galaktionov, Valentin Grigorev, Evgeniy Klyuchikov, Evgeniy Slobodkin, and Kirill Smirnov. [n.d.].
          <source>PosDB</source>
          <year>2019</year>
          :
          <article-title>A Distributed Disk-Based Column-Store with Late Materialization (submitted)</article-title>
          .
          <source>Proc. VLDB Endow</source>
          . (Oct. [n. d.]).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Chernishev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Galaktionov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Grigorev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Klyuchikov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Smirnov</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>PosDB: An Architecture Overview</article-title>
          .
          <source>Programming and Computer Software</source>
          <volume>44</volume>
          ,
          <issue>1</issue>
          (
          <issue>01</issue>
          <year>Jan 2018</year>
          ),
          <fpage>62</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Chernishev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Galaktionov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Grigorev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Klyuchikov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Smirnov</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>PosDB: An Architecture Overview</article-title>
          .
          <source>Programming and Computer Software</source>
          <volume>44</volume>
          ,
          <issue>1</issue>
          (Jan.
          <year>2018</year>
          ),
          <fpage>62</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Rada</given-names>
            <surname>Chirkova</surname>
          </string-name>
          .
          <year>2009</year>
          . Query Containment. Springer US,
          <volume>2249</volume>
          -
          <fpage>2253</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Kayhan</given-names>
            <surname>Dursun</surname>
          </string-name>
          , Carsten Binnig, Ugur Cetintemel, and
          <string-name>
            <given-names>Tim</given-names>
            <surname>Kraska</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Revisiting Reuse in Main Memory Database Systems (</article-title>
          <source>SIGMOD '17)</source>
          .
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Stavros</surname>
            <given-names>Harizopoulos</given-names>
          </string-name>
          , Daniel Abadi, and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Boncz</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Column-Oriented Database Systems</article-title>
          , VLDB 2009 Tutorial. nms.csail.mit.edu/~stavros/pubs/ tutorial2009-column_stores.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Stratos</surname>
            <given-names>Idreos</given-names>
          </string-name>
          , Fabian Grofen, Niels Nes, Stefan Manegold,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sjoerd Mullender</surname>
          </string-name>
          , and
          <string-name>
            <surname>Martin</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kersten</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng</article-title>
          .
          <source>Bull. 35</source>
          ,
          <issue>1</issue>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Milena</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Ivanova</surname>
            , Martin L. Kersten,
            <given-names>Niels J.</given-names>
          </string-name>
          <string-name>
            <surname>Nes</surname>
          </string-name>
          , and
          <string-name>
            <surname>Romulo</surname>
            <given-names>A.P.</given-names>
          </string-name>
          <string-name>
            <surname>Gonçalves</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>An Architecture for Recycling Intermediates in a Column-store</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .
          <volume>35</volume>
          ,
          <issue>4</issue>
          , Article 24 (Oct.
          <year>2010</year>
          ),
          <volume>24</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          :43 pages.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Fabian</surname>
            <given-names>Nagel</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Boncz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Stratis D.</given-names>
            <surname>Viglas</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Recycling in Pipelined Query Evaluation (ICDE '13)</article-title>
          .
          <source>IEEE Computer Society</source>
          ,
          <fpage>338</fpage>
          -
          <lpage>349</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M. Tamer</given-names>
            <surname>Ozsu</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Principles of Distributed Database Systems</article-title>
          (3rd ed.). Prentice Hall Press, Upper Saddle River, NJ, USA.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Prasan</given-names>
            <surname>Roy</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sudarshan</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <string-name>
            <surname>Multi-Query Optimization</surname>
          </string-name>
          . Springer US, Boston, MA,
          <fpage>1849</fpage>
          -
          <lpage>1852</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Timos</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Sellis</surname>
          </string-name>
          .
          <year>1988</year>
          .
          <article-title>Multiple-query Optimization</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .
          <volume>13</volume>
          ,
          <issue>1</issue>
          (March
          <year>1988</year>
          ),
          <fpage>23</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Dimitris</surname>
            <given-names>Tsirogiannis</given-names>
          </string-name>
          , Stavros Harizopoulos, Mehul A.
          <string-name>
            <surname>Shah</surname>
          </string-name>
          , Janet L.
          <string-name>
            <surname>Wiener</surname>
            , and
            <given-names>Goetz</given-names>
          </string-name>
          <string-name>
            <surname>Graefe</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Query Processing Techniques for Solid State Drives (SIGMOD '09)</article-title>
          .
          <source>Association for Computing Machinery</source>
          ,
          <fpage>59</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Valduriez</surname>
          </string-name>
          .
          <year>1987</year>
          .
          <article-title>Join Indices</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .
          <volume>12</volume>
          ,
          <issue>2</issue>
          (
          <year>June 1987</year>
          ),
          <fpage>218</fpage>
          -
          <lpage>246</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Till</surname>
            <given-names>Westmann</given-names>
          </string-name>
          , Donald Kossmann, Sven Helmer, and
          <string-name>
            <given-names>Guido</given-names>
            <surname>Moerkotte</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>The Implementation and Performance of Compressed Databases</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>29</volume>
          ,
          <issue>3</issue>
          (Sept.
          <year>2000</year>
          ),
          <fpage>55</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zukowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Boncz</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Super-Scalar RAM-CPU Cache Compression</article-title>
          . In ICDE'
          <volume>06</volume>
          .
          <fpage>59</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>