=Paper= {{Paper |id=Vol-2572/short14 |storemode=property |title=Position Caching in a Column-Store with Late Materialization: An Initial Study |pdfUrl=https://ceur-ws.org/Vol-2572/short14.pdf |volume=Vol-2572 |authors=Viacheslav Galaktionov,Evgeniy Klyuchikov,George Chernishev |dblpUrl=https://dblp.org/rec/conf/dolap/GalaktionovKC20 }} ==Position Caching in a Column-Store with Late Materialization: An Initial Study== https://ceur-ws.org/Vol-2572/short14.pdf

Position Caching in a Column-Store with Late Materialization:
An Initial Study
Viacheslav Galaktionov1, 2 , Evgeniy Klyuchikov1, 2 , George Chernishev1, 2, 3
1 Information Systems Engineering Lab, JetBrains Research
2 Saint Petersburg State University, Russia
3 National Research University Higher School of Economics

{viacheslav.galaktionov,evgeniy.klyuchikov,chernishev}@gmail.com

ABSTRACT Select R1.B, R1.C,
A common technique to speed up DBMS query processing is to R2.E, R2.H, R3.F
cache parts of query results and reuse them later. In this paper we From R1, R2, R3
Where R1.A = R2.D
propose a novel approach which is aimed specifically at caching Project
AND R2.G = R3.K
intermediates in a late-materialization-oriented column-store. Join
The idea of our approach is to cache positions (row numbers) G=K
instead of data values. The small size of positional representation G
is a valuable advantage: cache can accommodate more entries
K F
and consider intermediates that involve “heavy” operators, e.g. Project
joins of large tables. Join
Position caching thrives in late materialization environments A=D
since position exchange is prevalent in them. In particular, ex-
pensive predicates and heavy joins are usually processed based C B A D G E H
on positions. Our approach is able to cache them efficiently, thus
significantly reducing system load.
Figure 1: Late materialization example, adapted from [17]
To assess the importance of intermediates our position caching
technique features a cost model that is based on usage statistics
and complexity estimations. Furthermore, to allow intermediate
scheme depends on many parameters: join filtering power, data
reuse for the queries that are not fully identical, we proposed an
distribution, disk parameters such as seek times and bandwidth.
efficient query containment checking algorithm. Several policies
Late materialization has formed the basis of several column-
for cache population and eviction were proposed. Finally, our
oriented DBMSes [1]. Its many aspects, such as the design of LM-
approach is enhanced by lightweight compression schemes.
enabled operators [3, 17] and the query processing schemes [6,
Experimental evaluation was performed using a stream of ran-
11], have been extensively studied. However, some aspects re-
domly generated Star-Schema-Benchmark-like queries. It showed
ceived less attention.
up to 3 times improvement in query run times. Additionally, com-
In this paper we further investigate one of them — intermediate
pressing the intermediates reduces the space requirements by up
result caching. Unlike many existing caching techniques, our
to 2 times without a noticeable performance overhead.
approach does not cache the data itself. Instead, it aims to cache
useful data in positional form which allows us to:
1 INTRODUCTION (1) reduce the memory footprint, since all table row data,
Late materialization [1] is a query processing technique that aims which can be arbitrarily large, is replaced with a single
to operate on positions as long as possible. Essentially, it defers position (a 32-bit number)
tuple reconstruction to later stages of a query plan. The goal is to (2) use thusly saved memory to store intermediates that are
conserve disk bandwidth by: 1) reading individual columns only closer to the bottom part of the plan. It comes more costly,
when they are necessary for processing at a given point, and 2) but on the other hand, they have higher reuse chance.
organizing column access in such an order that subsequent reads Our caching approach also employs query containment checks
require fewer disk accesses due to data filtering. which allows reusing intermediates from queries that are not
This approach leads to unconventional query plans with novel fully identical, but one of them subsumes another.
opportunities [10] for efficient query processing. In such query To demonstrate the viability of the proposed position caching,
plans operators: 1) usually ingest only necessary columns, 2) ex- we have implemented it inside PosDB — a distributed disk-based
change not only data, but also positions. Positions are essentially column-store with late materialization. Since PosDB currently
row ids which will be later used to request data. An example of lacks a full-fledged cost-based optimizer, we had to devise a num-
late materialization processing is presented in Figure 1. ber of simpler heuristic algorithms. While our approach is simpler
Here, for column G, query processor reads only the values than existing methods, it still allows us to obtain considerable
that have passed join predicate. For columns E and H it is even performance gains.
better: query processor will have to read values that have passed Apart from that we have also implemented compression for
both join predicates. Of course, the benefit of such processing cached positions. Keeping only sequences of integers favours
compression which allows us to reduce memory footprint even
© Copyright 2020 for this paper held by its author(s). Published in the proceedings of further. For now, we have implemented only delta encoding and
DOLAP 2020 (March 30, 2020, Copenhagen, Denmark, co-located with EDBT/ICDT
2020) on CEUR-WS.org. Use permitted under Creative Commons License Attribution null suppression since our experiments indicated that these meth-
4.0 International (CC BY 4.0). ods result in good space savings with a small overhead.
Overall, the contribution of this paper is the following: reused later, b) can reuse current cache. If any are found, they
(1) A novel approach to caching intermediates in a column- interact with cache during query evaluation by either a) putting
store that caches positions instead of data. the resulting data into cache or b) reusing cached data instead of
(2) A study of applicability of compression to caching inside subtree evaluation.
a column-store. In this study we assume that the target DBMS represents
(3) An experimental evaluation of several policies for cache query plans as a tree of operators, each of which returns values
population and eviction. or positions exclusively. As we target late materialization, most of
the filtering and join operators are positional and (due to OLAP
2 RELATED WORK specifics) appear to be the most costly parts of a query plan.
Our caching approach intentionally targets only the positional
Caching intermediates. The idea of reusing intermediate re- data. As we stated in the introduction, “positional” cache entries
sults for query evaluation in databases has existed for a long time. are lightweight and are more susceptible to compression. There-
It comes in two formulations. In the first one there is a number fore, “heavy” operators from the bottom of the tree (usually costly
of simultaneously running queries and the goal is to generate filters and joins) can be cached with more ease. Thus, a significant
query plans that would maximize result sharing. This is called decrease in system load can be achieved.
multi-query optimization [15, 16]. The second option is caching: An essential drawback of position only caching is repeated
parts of query plans that were evaluated in the past are stored data value reading. However, disk subsystem decreases the over-
and reused when appropriate query arrives. In this paper we head. Caching is usually applied when the queries are somewhat
consider the second option. similar in the terms of referenced disk pages. Therefore, the disk
The next two papers study caching of intermediates inside pages necessary for materialization are often already stored in-
the MonetDB [11] system family, an in-memory column-store side the buffer manager.
with late materialization. In reference [12] authors propose to Caching: general overview. Our caching algorithm consists
exploit MonetDB operator-at-a-time processing model. The idea of pre- and post- processing of query plans. The former adds
is to cache and reuse BAT by-products of query execution. The special operators to gather statistical information and substitutes
proposed approach featured plan subsumption and both inter- subtrees with existing cache entries. The latter collects the final
and intra- query reuse. The study [13] addresses the case of statistical information and updates the cache if needed.
pipelined operator evaluation inside Vectorwise database engine. The cache consists of query history and a set of cache en-
In reference [9] another mechanism for in-memory result tries (each being a list of position blocks). Query history is a list
caching was proposed. The idea is to cache internal data struc- of subtrees of N most recently executed queries together with
tures of a database engine in order to lower memory pressure the subtrees that are still cached. Each history entry contains a
and preserve cache and register locality. In particular, hash tables pointer to the corresponding cache entry (if any), a high-level
that are produced by hash-joins and hash-aggregations. description of the subtree, its result’s size and usage statistic. A
Disk-based caching was considered in reference [4]. Apart usage statistic is a set of queries the entry can be potentially
from being able to cache data, this system can also cache positions. useful for. All this information is used in the cost models of cache
Also, authors proposed a cost-based optimizer to find the best population and eviction policies.
plan. Experiments showed that position caching performs better There might not be a perfectly matching cache entry for a
than data caching when cache entries are stored on disk. subtree, but at the same time there may be an entry that provides
Caching and compression. Compression has been used in a larger, subsuming, result. This is determined with a containment
databases for a long time [19], however it is column-stores where check. If the check is passed, the original subtree is replaced by
it brought significant benefits. Compression in column-stores this cache entry with a special filtering operator on top.
is different from row-stores: 1) columnar storage results in ho- Caching: preprocessing. Preprocessing is the most sophisti-
mogeneity of data which in turn leads to better compression cated part, see Figure 2. The algorithm iterates through the query
rates and, 2) compression should be lightweight, i.e. it should not plan and for each subtree 𝑆: a) checks if it fits into cache b) adds a
excessively load CPU. special operator PBCounter on top of it c) updates cache history
We reference two papers which we believe to be most impor- d) tries to replace 𝑆 with a (possibly filtered) cache entry.
tant, while many others can be found. Authors of reference [2] The first two steps are related to size estimation of interme-
studied several lightweight compression algorithms in a disk- diates. An auxiliary PBCounter is used to count the number of
based column-store. They compared their behavior in case of blocks passed during query evaluation. Then, this number is used
eager decompression and in case when query engine operated to: a) update the benefit measure of 𝑆 (described later) b) filter
on compressed data directly. Also, they have presented a deci- future subtrees that are guaranteed not to fit into cache.
sion tree for selecting compression method depending on data After size estimation has been taken care of, we add a history
characteristics. Compression for in-memory column-stores was entry for 𝑆 and update the usage statistics of other entries that can
studied in reference [20]. Authors have proposed and evaluated (potentially) be used by 𝑆. Usage statistics in their turn participate
three different compression schemes which aimed to efficiently in the cost model of population and eviction policies. Therefore,
use CPU resources. we try to add the best yet uncached entry into cache.
To the best of our knowledge, no paper considered compres- Finally, after the cache was adjusted, we search for a cache
sion in the context of caching intermediates in column-stores. entry that can replace 𝑆 with minimum additional filtering. If one
is found, we use a special operator FetchCache instead of direct 𝑆
3 PROPOSED APPROACH evaluation. This operator passes position blocks, decompressing
Preliminaries and prerequisites. A common workflow of a them if needed.
DBMS caching subsystem is as follows. When a query arrives, Caching: postrocessing. This stage has two goals: gather the
its plan is generated and inspected for subtrees that a) can be results of every PBCounter that is a part of the current query’s
Initial query plan Patched query plan Try to replace S with a cache entry
Loop Replace S with FetchCache(T) or
Select next yes All subtrees
Filter(FetchCache(T))
subtree, S no processed?

found
Result size allows no Replace S with Search for best ﬁtting
caching? PBCounter(S) not found cached intermediate, T
no info
yes
Update cache
Save in cache history Traverse cache history for yet Consult cache policies
Update containment and uncached entry with max beneﬁt, Admit UT to cache if
usability statistics UT required

Figure 2: PosDB caching workflow: preprocessing

operator tree, and calculate the benefit estimates for each inter- Query Containment. Implementing a caching subsystem
mediate. implies solving the query containment problem. It arises when
Policies for cache population and eviction. We have con- it is necessary to determine which cache entry corresponds to a
sidered several different policies for adding and removing in- newly arrived query. Given the sheer expressive power of SQL,
termediates. We assume that for each history entry a numeric it is hard to check even the simple equivalence of two queries.
benefit can be calculated. We also define total benefit as a However, in order to build an efficient caching subsystem it is
product of benefit and the usage frequency (number of queries essential to find cached entries that can be used as base table to
in the the entry usage statistics). evaluate the given query. For example, consider two queries:
For cache population we have considered: • SELECT * FROM T1 WHERE T1.A > 100
• MIN: choose an uncached intermediate whose total benefit • SELECT * FROM T1 WHERE T1.A > 110
is larger than minimal among cached intermediates.
If the first query is cached, then the second one can use the stored
• AVG: choose an uncached intermediate whose total benefit
answer with additional filtering. Depending on the predicate
is larger than average among cached intermediates.
selectivity and hardware parameters this approach may result in
Cache eviction has a larger number of options: savings.
• LRU (Least Recently Used): evict intermediates that were It is well-known that in the general case checking query con-
not used for the longest time. tainment is very demanding computationally. For example, it was
• LFU (Least Frequently Used): evict intermediates with the shown [8] that the problem of conjunctive query containment
shortest history of usage. is NP complete. Therefore, we had to restrict admissible query
• LCS (Largest Cache Space): evict intermediates that occupy class to “mostly” conjunctive SPJ (Select-Project-Join) queries.
the largest amount of space. “Mostly” means that OR can appear in WHERE clause only to list
• BENEFIT: evict entries that have the lowest benefit. possible individual values of an attribute, e.g. T1.A = 5 OR T1.A
• BENEFITHIST: evict entries that have the lowest total = 6 OR T1.A = 7 is allowed, while T1.A < 5 OR T1.A = 10 is
benefit. not. Therefore, the resulting class is limited, but big enough to
• RANDOM: evict random intermediates. cover all queries from the Star Schema Benchmark (SSB). Note,
Cost model. The benefit of a query history entry is defined that since only positional data is involved, there is no need in
as a function of: checking containment of aggregation or sort.
In order to perform containment checks we have designed a
• Subtree complexity: a weighted sum of the subtree opera-
special data structure to represent query metadata. Essentially, it
tors. Each operator type corresponds to a constant weight
records: 1) the set of attribute fragments covered, 2) all conducted
which is a model parameter;
joins, 3) all predicates that were used for filtering.
• Working set complexity: an estimated amount of data
Obtaining such a structure is just a matter of traversing the
the query executor requests from the storage system to
corresponding operator tree.
process the subtree;
Now, in order to perform a containment check given two such
• Actual result size: computed by the corresponding PBCounter
structures, all that is required is to make sure they share their sets
operator;
of attribute fragments and joins, and every predicate from the first
• Usage statistics: a list of unique query numbers that could
structure is implied by some predicate from the second structure.
(or did) use it.
If that condition holds, then the second structure describes a
Note that we rely on two kinds of complexity: subtree and containing query.
working set complexity. While the former favours the“structurally Compression. Cached position blocks appear to be a good
complex” subtrees, the latter favours operation on larger amounts target for compression. First, columnar representation guarantees
of data. They allow to distinguish simple queries on a large data data homogeneity and positions are just integers of fixed size.
and complex queries an a small data. Secondly, the structure of a query plan explicitly restricts the
Overall, the following formula is used to calculate benefit: possible position block structure. For example, large tables like
Complexity(𝐼 ) × EstimatedSize(𝐼 ) SSB LINEORDER are usually treated in the way that preserves
𝐵𝑒𝑛𝑒 𝑓 𝑖𝑡 (𝐼 ) = . their initial order. Therefore, the corresponding position stream
ActualSize(𝐼 )
SELECT sum(lo_revenue), d_year, p_brand1
FROM lineorder, date, part, supplier To user Operator
Operator Reader
WHERE lo_orderdate = d_datekey internals
and lo_partkey = p_partkey
Join Index and lo_suppkey = s_suppkey Sort local access remote access
and p_category = 'MFGR#12'
T1 T2 T3 and s_region = 'AMERICA'
GROUP BY d_year, p_brand1 ... Aggregate
1 2 3 ORDER BY d_year, p_brand1; Tuples

2 1 2 Columns
Read(d_datekey) Join Read(lo_orderdate)
3 3 3
Read(s_suppkey) Join Read(lo_suppkey) DS(date)
T1 T2 T3
row1 row1 row1 Read(s_region) Filter Read(p_partkey) Join Read(lo_partkey)
row2 row2 row2
row3 row3 row3 DS(supplier) DS(lineorder) Read(p_category) Filter DS(part)

(a) Example of join index (b) Query plan example

Figure 3: PosDB internals

is ordered and relatively dense. On the contrary, small tables Often, positional operators (e.g. Join, Filter) require data val-
which participate in joins often provide sparse and unordered ues in addition to positions. To provide such operators with data, a
sequences of positions. set of auxiliary entities called readers were introduced. For exam-
Turning to concrete compression strategies, we have tried ple, ColumnReader retrieves values of a specific attribute, while
different encoding methods and found the delta and null suppres- SyncReader provides values of several attributes synchronously.
sion to be the most appropriate. The former allows to narrow PosDB is a both distributed and parallel column-store. It is
down the range of absolute values inside each position block, distributed in terms of both data and query execution: a table may
especially when they have ascending or descending order. Then, be fragmented and replicated across several nodes. A number
the latter transforms 32-bit (signed or unsigned) numbers into of table-level fragmentation methods is supported: round-robin,
16-bit and 8-bit ones. Thus, up to four times space compression hash and range partitioning strategies. Query distribution allows
can be achieved with an almost negligible overhead of two for it to run a query on different nodes, possible with individual query
loops. plan parts residing on distinct nodes. Query distribution is imple-
Two other apparently efficient compression strategies are RLE mented by two pairs of special operators: {SendPos, ReceivePos}
and Range. However, experiments demonstrated their limited use and {SendTuple, ReceiveTuple}. Therefore, both positional and
in practice. After the first filtering or join operator the consecutive tuple operators can be executed on arbitrary nodes, regardless
positions are usually different or repeat just a little. In the latter where their children reside.
case delta + null suppression give a comparable or better result. Both inter- and intra-query parallelism [14] are supported. To
We have also assessed and turned down some of the more implement intra-query parallelism two special operators were
complex compression methods, such as Dictionary compression created. Asynchronizer allows to execute an operator tree in a
and LZW. They either require significant time overhead or fail to separate thread and UnionAll is used to collect data from several
provide noticeable space gains, encoding numbers with almost subtrees that are executed in their own threads.
the same numbers. Recently [5], fully distributed operators like distributed join
and aggregation were added. A detailed description of PosDB
4 POSDB architecture can be found in papers [5, 7].
To evaluate our approach we have implemented it inside PosDB —
a distributed column-store with a processing model that adheres 5 EXPERIMENTAL EVALUATION
to late materialization. Each PosDB query plan consists of two We have evaluated the proposed approach with two experiments,
parts: position- and tuple-based (see Figure 3b). In the former designed to measure: a) the impact of different eviction policies on
(the bottom one) operators exchange only positions that are query execution performance and b) the impact of compression
represented as a generalized join index [18] (Figure 3a). In our on the amount of space used for storing intermediates.
system this data structure is used to represent results of filter We fixed the size of cache history to 50 last queries and used
and join operators. For a filter, it is just a list of positions (row only the MIN population policy. Cache size was varied from 2 to
numbers) that satisfy a predicate. This structure can describe 32 thousands of position blocks, each of them being 32 KB large
results of an arbitrary number of joins, e.g. consider the join when uncompressed.
index presented in Figure 3a. Here, the second row of 𝑇1 was Finally, we relied on SSB with SF 10 and conducted a series
joined with the first row of 𝑇2 and then the resulting tuple was of runs, a thousand of random “SSB-like” queries each. These
joined with the second row of 𝑇3 . queries differ from the usual SSB ones only in the way the filtering
The upper part of any plan consists of operators that work predicates are generated: in addition to randomly choosing the
with tuples, similarly to classic row-stores. The lowest of these boundaries, we also randomly select the relation. For example,
operators is responsible for materialization, i.e. it transforms join DATE.D_YEAR = 1993 can become DATE.D_YEAR < 1995.
indexes into tuples of actual values. Such materializing operators Experimental environment. We used a laptop with an In-
perform aggregation and window functions. tel(R) Core(TM) i7-3630QM CPU @ 2.40GHz; 8 GB RAM; HDD
BENEFIT BENEFIT

Used cache space, bytes/(32 × 210 )
LRU 30,000 LRU
4,000 RANDOM RANDOM
BENEFITHIST NO COMPRESSION
Execution time, s

LFU
LCS 20,000
3,000

10,000

2,000

0
0 5000 10000 15000 20000 25000 30000 35000 0 5000 10000 15000 20000 25000 30000 35000
156 313 469 625 781 938 1094 156 313 469 625 781 938 1094
Cache size (upper in blocks, lower in MB) Cache size (upper in blocks, lower in MB)

Figure 4: Query execution performance Figure 5: Memory usage before and after compression

Inc., Hanover, MA, USA.
Samsung ST1000LM024, 5400 rpm; Funtoo Linux, kernel version [2] Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating Com-
5.2.7; gcc 9.2.0. pression and Execution in Column-oriented Database Systems (SIGMOD ’06).
Query execution performance. First, we measured the per- [3] D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. 2007. Materialization
Strategies in a Column-Oriented DBMS. In ICDE’07. 466–475.
formance of query execution when different eviction policies are [4] Chungmin Melvin Chen and Nicholas Roussopoulos. 1994. The Implementa-
used. The results are given in Figure 4 and show that with smaller tion and Performance Evaluation of the ADMS Query Optimizer: Integrating
cache sizes, the choice of a strategy had no significant impact on Query Result Caching and Matching (EDBT ’94). 323–336.
[5] George Chernishev, Viacheslav Galaktionov, Valentin Grigorev, Evgeniy
query performance. With larger cache sizes, however, BENEFIT Klyuchikov, Evgeniy Slobodkin, and Kirill Smirnov. [n.d.]. PosDB 2019: A
turned out to be the best, RANDOM — the worst. Distributed Disk-Based Column-Store with Late Materialization (submitted).
Proc. VLDB Endow. (Oct. [n. d.]).
Another interesting outcome is that BENEFITHIST, which took [6] G. A. Chernishev, V. A. Galaktionov, V. D. Grigorev, E. S. Klyuchikov, and
history into account, resulted in worse performance than BENEFIT, K. K. Smirnov. 2018. PosDB: An Architecture Overview. Programming and
which did not. We suppose that just relying on the usage fre- Computer Software 44, 1 (01 Jan 2018), 62–74.
[7] G. A. Chernishev, V. A. Galaktionov, V. D. Grigorev, E. S. Klyuchikov, and
quency results in a longer eviction time for already-cold cache K. K. Smirnov. 2018. PosDB: An Architecture Overview. Programming and
entries (which were hot some time ago). We believe the situation Computer Software 44, 1 (Jan. 2018), 62–74.
could be improved by also considering the times of several last [8] Rada Chirkova. 2009. Query Containment. Springer US, 2249–2253.
[9] Kayhan Dursun, Carsten Binnig, Ugur Cetintemel, and Tim Kraska. 2017.
uses. Revisiting Reuse in Main Memory Database Systems (SIGMOD ’17). 15.
Space consumption due to compression. Next, we enabled [10] Stavros Harizopoulos, Daniel Abadi, and Peter Boncz. 2009. Column-Oriented
Database Systems, VLDB 2009 Tutorial. nms.csail.mit.edu/~stavros/pubs/
compression for intermediates and ran the same experiment us- tutorial2009-column_stores.pdf
ing only the BENEFIT, LRU, and RANDOM eviction policies. Figure 5 [11] Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mul-
shows the maximum amount of main memory used to store the lender, and Martin L. Kersten. 2012. MonetDB: Two Decades of Research in
Column-oriented Database Architectures. IEEE Data Eng. Bull. 35, 1 (2012).
intermediates during a particular run. The values were initially [12] Milena G. Ivanova, Martin L. Kersten, Niels J. Nes, and Romulo A.P. Gonçalves.
measured in bytes, and then divided by the standard size of a 2010. An Architecture for Recycling Intermediates in a Column-store. ACM
position block (32 KB). Trans. Database Syst. 35, 4, Article 24 (Oct. 2010), 24:1–24:43 pages.
[13] Fabian Nagel, Peter Boncz, and Stratis D. Viglas. 2013. Recycling in Pipelined
It is worth noting that enabling compression has not had any Query Evaluation (ICDE ’13). IEEE Computer Society, 338–349.
noticeable effect on query execution times. Therefore, for the [14] M. Tamer Ozsu. 2007. Principles of Distributed Database Systems (3rd ed.).
Prentice Hall Press, Upper Saddle River, NJ, USA.
sake of brevity, the corresponding graph has been omitted. [15] Prasan Roy and S. Sudarshan. 2009. Multi-Query Optimization. Springer US,
Boston, MA, 1849–1852.
6 CONCLUSION AND FUTURE WORK [16] Timos K. Sellis. 1988. Multiple-query Optimization. ACM Trans. Database
Syst. 13, 1 (March 1988), 23–52.
In this paper we have proposed a novel approach to in-memory [17] Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah, Janet L. Wiener,
positional caching inside a column-store with late materializa- and Goetz Graefe. 2009. Query Processing Techniques for Solid State Drives
(SIGMOD ’09). Association for Computing Machinery, 59–72.
tion. Compared to data caching, positional caching allows the [18] Patrick Valduriez. 1987. Join Indices. ACM Trans. Database Syst. 12, 2 (June
system to save space by storing only integers which can be easily 1987), 218–246.
[19] Till Westmann, Donald Kossmann, Sven Helmer, and Guido Moerkotte. 2000.
compressed. Experiments performed in PosDB demonstrated the The Implementation and Performance of Compressed Databases. SIGMOD
viability of this approach. Also we have evaluated a number of Rec. 29, 3 (Sept. 2000), 55–67.
policies for cache population and eviction, including our own. [20] M. Zukowski, S. Heman, N. Nes, and P. Boncz. 2006. Super-Scalar RAM-CPU
Cache Compression. In ICDE’06. 59–59.
Future work will include evaluation of caching in a distributed
environment.

REFERENCES
[1] Daniel Abadi, Peter Boncz, and Stavros Harizopoulos. 2013. The Design and
Implementation of Modern Column-Oriented Database Systems. Now Publishers