<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>” Journal
of Big Data</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.14778/2536222.2536233</article-id>
      <title-group>
        <article-title>A study of PosDB Performance in a Distributed Environment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>George Chernishev</string-name>
          <email>chernishev@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vyacheslav Galaktionov</string-name>
          <email>viacheslav.galaktionov@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentin Grigorev</string-name>
          <email>valentin.d.grigorev@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgeniy Klyuchikov</string-name>
          <email>evgeniy.klyuchikov@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kirill Smirnov</string-name>
          <email>kirill.k.smirnov@math.spbu.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Saint-Petersburg State University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saint-Petersburg State University, JetBrains Research</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>6</volume>
      <issue>11</issue>
      <fpage>1080</fpage>
      <lpage>1091</lpage>
      <abstract>
        <p>-PosDB is a new disk-based distributed column-store relational engine aimed for research purposes. It uses the Volcano pull-based model and late materialization for query processing, and join indexes for internal data representation. In its current state PosDB is capable of both local and distributed processing of all SSB (Star Schema Benchmark) queries. Data, as well as query plans, can be distributed among network nodes in our system. Data distribution is performed by horizontal partitioning. In this paper we experimentally evaluate the performance of our system in a distributed environment. We analyze system performance and report a number of metrics, such as speedup and scaleup. For our evaluation we use the standard benchmark the SSB.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Column-stores have been actively investigated for the last
ten years. Many open-source [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
commercial [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] products with different features and aims have
been developed. The core design issues such as compression
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [10], materialization strategy [11], [12] and result reuse
[13] got significant attention. Nevertheless, distribution of
data and control in disk-based column-store systems was not
studied at all.
      </p>
      <p>
        The reason for this is that none of open-source systems are
truly distributed, although some of them [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] support
mediatorbased [14] distribution. Several commercial systems, such as
Vertica [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], are distributed but closed-source. To the best of our
knowledge, no investigation of distribution aspects in
columnstores has been conducted.
      </p>
      <p>To address this problem, we are developing a disk-based
distributed relational column-store engine — PosDB. In its
current state it is based on the Volcano pull-based model [15]
and late materialization. Data distribution is supported in the
form of horizontal per-table partitioning. Each fragment can
be additionally replicated on an arbitrary number of nodes,
i.e. our system is partially replicated [16]. Control (query)
distribution is also supported: parts of a query plan can be
sent to a remote node for execution.</p>
      <p>In our earlier studies [17], [18] we have described
opportunities offered by such a system and sketched its design. Later,
an initial version of our system, PosDB, was presented and its
high-level features were described [19].</p>
      <p>In this paper, we present the results of first distributed
experiments with PosDB. We evaluate system performance
by studying several performance metrics, namely speedup and
scaleup. For evaluation we use a standard OLAP benchmark —
the Star Schema Benchmark [20].</p>
      <p>The paper is structured as follows. The architecture of the
system is described in detail in section II. A short survey of
distributed technology in databases is presented in section I.
In section III we discuss used metrics (scaleup and speedup).
The experimental evaluation and its results are presented in
section IV.</p>
    </sec>
    <sec id="sec-2">
      <title>II. RELATED WORK</title>
      <p>There is a shortage of distribution-related studies for
relational column-oriented databases [18]. The main reasons are
the scarcity of research prototypes and the drawbacks in the
existing ones.</p>
      <p>
        Two research prototypes of distributed column-store
systems are known to the authors — [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [21]. Both studies
use an in-memory DBMS, MonetDB, some of whose parts
were rewritten to add distribution-related functionality. This
approach cannot be considered “true” distribution, because, in
general, it restricts the pool of available distributed processing
techniques. Developers have to take into account the
architecture of the underlying centralized DBMS in order to employ
it. Unfortunately, the degree of these restrictions is unclear for
the aforementioned systems.
      </p>
      <p>Another distributed column-store, the ClickHouse system, is
an industrial open-source disk-based system. However, there
are two issues with this system. Firstly, it was open-sourced
only recently, in 2016, and there are no research papers based
on this system, known to the authors. Secondly, it has several
serious architectural drawbacks: a very restricted partitioning
[22] and issues with distributed joins [23].</p>
      <p>At the same time, there are hundreds, if not thousands, of
papers on the subject in application to row-stores [16], [14].</p>
      <sec id="sec-2-1">
        <title>Node 1</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>DataSource(DATE#1)</title>
    </sec>
    <sec id="sec-4">
      <title>DataSource(SUPPLIER#1)</title>
    </sec>
    <sec id="sec-5">
      <title>DataSource(PART#1) Join</title>
    </sec>
    <sec id="sec-6">
      <title>Join</title>
    </sec>
    <sec id="sec-7">
      <title>Join</title>
      <p>To user</p>
      <sec id="sec-7-1">
        <title>Server</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Sort</title>
    </sec>
    <sec id="sec-9">
      <title>Aggregate</title>
    </sec>
    <sec id="sec-10">
      <title>UnionAll</title>
    </sec>
    <sec id="sec-11">
      <title>Join</title>
    </sec>
    <sec id="sec-12">
      <title>Join</title>
    </sec>
    <sec id="sec-13">
      <title>Join</title>
      <sec id="sec-13-1">
        <title>Node n</title>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>DataSource(DATE#n)</title>
    </sec>
    <sec id="sec-15">
      <title>DataSource(SUPPLIER#n)</title>
    </sec>
    <sec id="sec-16">
      <title>DataSource(PART#n)</title>
    </sec>
    <sec id="sec-17">
      <title>DataSource(LINORDER(0-900))</title>
      <sec id="sec-17-1">
        <title>Node 2</title>
        <p>.
.
.</p>
      </sec>
      <sec id="sec-17-2">
        <title>Node n</title>
        <p>DataSource(LINORDER(. . . – . . . ))</p>
        <p>PosDB uses the Volcano pull-based model [15], so each
query plan is represented as a tree with operators as vertexes
and data flows as edges. All operators support the “open()–
getNext()–close()” interface and can be divided into two
groups:</p>
        <p>Operators that produce blocks of positions.</p>
        <p>Operators that produce individual tuples.</p>
        <p>PosDB relies on late materialization, so operators of the
second type are always deployed on the top of a query tree.
They are used to build tuples from position blocks and to
perform aggregation. The whole tree below the materialization
point consists of operators which return blocks.</p>
        <p>Each position block stores several position vectors of equal
length, one per table. This structure is essentially a join index
[24], [25], which we use to process a chain of join operators.</p>
        <p>Currently we have the following operators that produce join
indexes:</p>
        <p>DataSource, FilteredDataSource, operators for
creating initial position streams. The former generates a
list of contiguous positions without incurring any disk
I/O, while the latter conducts a full column scan and
produces a stream of positions whose corresponding
values satisfy a given predicate. These operators are the
only possible leaves of a query tree in our system;
GeneralPosAnd, SortedPosAnd, binary operators
for the intersection of two position streams related to one
table;
NestedLoopJoin, MergeJoin, HashJoin, binary
operators, which implement the join operation in different
ways;
UnionAll — an n-ary operator that processes its
subtrees in separate threads and merges their output into a
single stream in an arbitrary order;
ReceivePos — an ancillary unary operator that sends a
query plan subtree to a remote node, receives join indexes
from it and returns them to the ancestor;
Asynchronizer — an ancillary unary operator that
processes its child operator in a separate thread and stores
the results in the internal fixed-size buffer;
and the following that produce tuples:</p>
      </sec>
    </sec>
    <sec id="sec-18">
      <title>Select, for tuple reconstruction; Aggregate, for simple aggregation without grouping;</title>
      <p>SGAggregate, HashSGAggregate, for complex
aggregation with grouping and sorting;</p>
      <p>SparseTupleSorter, for tuple sorting.</p>
      <p>As can be seen, query distribution is maintained on the
operator level using two ancillary operators: ReceivePos
and UnionAll. It should be emphasized, that a multithreaded
implementation of UnionAll is essential here, because
sequential execution would definitely incur severe waiting penalties,
completely negating the benefits of a distributed environment.
Figure 1 presents an example of a distributed query plan for
the query 2.1 from the SSB which is as follows:
select sum(lo_revenue), d_year, p_brand1
from lineorder, date, part, supplier
where lo_orderdate = d_datekey
and lo_partkey = p_partkey
and lo_suppkey = s_suppkey
and p_category = ’MFGR#12’
and s_region = ’AMERICA’
group by d_year, p_brand1
order by d_year, p_brand1;</p>
      <p>Also, there is a notion of data readers in our system. Data
reader is a special entity used for reading attribute values
corresponding to the position stream. Currently, we support
the following hierarchy of readers:</p>
      <p>ContinuousReader and NetworkReader, basic
readers for accessing a local or remote partition
respectively;
PartitionedDataReader, an advanced reader for
accessing the whole column, whose partitions are stored
on one or several machines. For each partition it creates
a corresponding basic reader to perform local or remote
full scan. Then, using information from the catalog, a
PartitionedDataReader automatically determines
which reader to use for a position in a join index;
SyncReader, an advanced reader responsible for
synchronous reading of multiple attributes. This reader
maintains a PartitionedDataReader for each column.</p>
      <p>Initially, a query plan does not contain readers. Each
operator creates readers on demand and feeds them
positions to receive necessary data. Operators that
materialize tuples use SyncReader, others usually employ
PartitionedDataReader. Using these advanced readers
allows operators to be unaware of data distribution.</p>
      <p>IV. GENERAL CONSIDERATIONS AND USED METRICS
Distributing the DBMS has two important goals [16]:
improving performance and ensuring easy system expansion.
These goals are usually evaluated using two metrics [26]:
scaleup and speedup.</p>
      <p>Speedup reflects the dependency of system performance on
the number of processing nodes under the fixed workload.
Thus, it shows the performance improvement that can be
achieved by using additional equipment and without system
redesign.</p>
      <p>Linear speedup is highly desired but rarely can be achieved
in practice. Superlinear speed points out an unaccounted
distributed system resources or poor algorithm. So, a good
system should try to approximate linear dependency as well
as it can.</p>
      <p>Scaleup is a similar metric that reflects how easy it is to
sustain the achieved performance level under an increased
workload. The number of processing nodes and a size of the
workload are increased by the same number of times. An
ideal system achieves linear scaleup, but again, it is rarely
achievable in practice.</p>
      <p>Workload can be increased either by increasing the number
of queries or the amount of data. The former is the
transactional scaleup and the latter is the data scaleup. We do not
investigate transactional scaleup, because PosDB is oriented
towards OLAP processing — a kind of processing that implies
long-running queries. Taniar et al. [26] argue that transactional
scaleup is relevant in transaction processing systems where the
transactions are small queries. On the other hand, data scaleup
is very important for our system because the amount of data
in OLAP environments can exhibit feasible growth.</p>
    </sec>
    <sec id="sec-19">
      <title>V. EXPERIMENTS</title>
      <p>In order to conduct the experiments, we selected the
following setup of data and query distributions. We designate
one processing node as a server and assign it several worker
nodes. The server only processes user requests, while the data
is stored on worker nodes (see Figure 2). Each worker node
stores a horizontal partition of the fact table (LINEORDER)
along with the replicas of all other (dimension) tables.
Dimension tables are always tiny compared to the fact table, so their
replication incurs almost no storage overhead.</p>
      <p>Figure 1 shows the distributed query for query 2.1 from the
workload. It illustrates the general approach which we follow
e
d
o
N
r
e
t
s
a
M</p>
      <sec id="sec-19-1">
        <title>SUPPLIER#1</title>
        <p>1
ed LINEORDER(1–200)
o
N</p>
      </sec>
      <sec id="sec-19-2">
        <title>DATE#1</title>
      </sec>
      <sec id="sec-19-3">
        <title>CUSTOMER#1</title>
      </sec>
      <sec id="sec-19-4">
        <title>PART#2</title>
      </sec>
      <sec id="sec-19-5">
        <title>SUPPLIER#2</title>
        <p>2
edLINEORDER(200–400)
o
N</p>
      </sec>
      <sec id="sec-19-6">
        <title>DATE#2</title>
      </sec>
      <sec id="sec-19-7">
        <title>CUSTOMER#2</title>
        <p>...</p>
      </sec>
      <sec id="sec-19-8">
        <title>PART#k</title>
      </sec>
      <sec id="sec-19-9">
        <title>SUPPLIER#k</title>
        <p>k
ed LINEORDER(...–...)
o
N DATE#k</p>
      </sec>
      <sec id="sec-19-10">
        <title>CUSTOMER#3</title>
        <p>in this paper for each query. The server is responsible for
receiving data from worker nodes and for aggregation. Note
that all queries in this benchmark can be distributed in such
a manner that no inter node (worker node) communication is
required.</p>
        <p>A. Description of Experiments, Hardware and Software
Experimental Setups</p>
        <p>We consider three different experiments, all using the SSB
workload:
1) The dependency of PosDB performance on SSB scale
factor in a local (one node) case.
2) The speedup of PosDB, i.e. the dependency of the
performance for a fixed workload (scale factor 50) on the
number of nodes. The number of nodes includes server
and 1; 2; 4; 6; 8 worker nodes.
3) The scaleup of PosDB, i.e. the performance on k =
1; 2; 4; 6; 8 nodes for scale factor 10 k workload.</p>
        <p>These experiments are conducted on a cluster of ten
machines connected by 1GB local network. Each machine has
the following characteristics: Intel(R) Core(TM) i5-2310 CPU
@ 2.90GHz (4 cores total), 4 GB RAM. The software used is
Ubuntu Linux 16.04.1 (64 bit), GCC 5.4.0, JSON for Modern
C++ 2.1.0.</p>
        <p>B. Experiment 1</p>
        <p>In this experiment we study PosDB behavior in a local case
under the full SSB workload. We have chosen six different
Q1.1</p>
        <p>Q1.2</p>
        <p>Q1.3</p>
        <p>Q2.1</p>
        <p>Q2.2</p>
        <p>Q2.3</p>
        <p>Q3.1</p>
        <p>Q3.2</p>
        <p>Q3.3</p>
        <p>Q3.4</p>
        <p>Q4.1</p>
        <p>Q4.2</p>
        <p>Q4.3
scale factors: 1; 10; 30; 50; 100; 200. The results of this
experiment are presented in Figure 3. It should be emphasized that
logarithmic y-axis is used here.</p>
        <p>2
1:5
)
s
(
em 1
i
T
0:5
0
104
50
100
150
200
After careful analysis, several interesting conclusions can
be drawn:</p>
        <p>Although query 1.3 has a higher selectivity, its execution
time is higher than that of the other queries of its flight.
Perhaps it is due to a more expensive aggregation.
Execution time of the queries from the second flight
decreases with the increase in selectivity, as is to be
expected.</p>
        <p>Query flight 3 reveals two interesting points. Query 3.1 is
much more expensive than the others, because its first join
operator returns a significantly higher number of records,
thus loading the rest of the query tree. With high scale
factors, query 3.3 become much more expensive than
others. We suppose that it is due to intensive disk usage.
Queries 4.1 and 4.2 behave in a very similar way, however
the last join in the query 4.1 produces more results. This
is the reason for the slightly extended run times for the
whole query. Also, there is an anomaly in query 4.3 which
still has to be explained. We plan to explore it in our
further studies.</p>
        <p>The total time of the whole workload is presented in
Figure 4. In order to obtain this graph we summed up the
run times of all queries described in the SSB. Essentially,
this graph is just another representation of the information
presented in Figure 3.</p>
        <p>C. Experiment 2</p>
        <p>You can see the results of the second experiment in Figure 5.
Starting with 1, the number of nodes is increased by 2 with
8
6
4
2
1:5</p>
        <p>1
0:5
0
2
4
6
8
As you can see, PosDB’s performance increases when new
nodes are added, although not linearly, which is because our
system is yet in its infancy. We believe that such high overhead
can be written off on the lack of a proper buffer manager,
which means that the same data may be transferred over the
network many times.</p>
        <p>D. Experiment 3</p>
        <p>In this experiment we measured PosDB data scaleup under
scale factors 10; 20; 40; 60; 80 on 1; 2; 4; 6; 8 nodes. Data and
query for each test configuration are distributed similar to
the experiment 2. LINEORDER is partitioned between nodes,
other tables are fully replicated. Then, parts of query plan
that lie below aggregation (or tuple construction) are sent
to different nodes, each with a DataSource operator for the
corresponding LINEORDER partition. See Figures 2 and 1
for more details.</p>
      </sec>
      <sec id="sec-19-11">
        <title>Linear scaleup PosDB No scaleup</title>
        <p>2
4
6
8
We consider scaleup as ((sseerrvveerr++k1mmaacchhiinnees))eexxeeccuuttiioonnttiimmee
relation and present the results in Figure 6. To estimate the
each step. The contents of the LINEORDER table (about 11
GBs) are evenly partitioned and distributed across them. Other
tables are fully replicated. The red line shows how much faster
the queries are executed when the number of nodes increases.
The green line represents the “ideal” case, where the speedup
grows linearly.</p>
      </sec>
      <sec id="sec-19-12">
        <title>Linear speedup PosDB</title>
        <p>PosDB scaleup, we also plotted the “linear scaleup” and
“no scaleup” cases. The former is a situation when scaleup
is constant (ideal value) during all experiments. In the “no
scaleup” case we assume that the amount of data grows
linearly, but the computing power remains constant, so scaleup
is 1=(number of machines).</p>
        <p>We can see that PosDB scaleup is in [0:5; 0:75] boundaries,
slowly decreasing with the number of servers growing. Thus,
comparing to the case “no scaleup,” we can conclude that our
system can offer a good scale-up.</p>
      </sec>
    </sec>
    <sec id="sec-20">
      <title>VI. CONCLUSION</title>
      <p>In this paper we presented an evaluation of PosDB, our
distributed column-store query engine. We used the Star Schema
Benchmark — a standard benchmark used for evaluation of
OLAP systems. We studied several performance metrics, such
as speedup and scaleup. In our experiments we were able
to achieve scale factor 200 on a single machine, our system
demonstrated sublinear speedup and a good data scaleup. The
evaluation also allowed us to discover some anomalies and
bottlenecks in our system. They are the subject of our future
research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Batkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cherniack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. O</given-names>
            <surname>'Neil</surname>
            , P. O'Neil
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rasin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Zdonik</surname>
          </string-name>
          , “
          <article-title>C-store: A column-oriented dbms</article-title>
          ,”
          <source>in Proceedings of the 31st International Conference on Very Large Data Bases, ser. VLDB '05. VLDB Endowment</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>553</fpage>
          -
          <lpage>564</lpage>
          . [Online]. Available: http://dl.acm.org/citation.cfm?id=
          <volume>1083592</volume>
          .
          <fpage>1083658</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Idreos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Groffen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Manegold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Mullender</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Kersten</surname>
          </string-name>
          , “
          <article-title>Monetdb: Two decades of research in column-oriented database architectures,” IEEE Data Eng</article-title>
          .
          <source>Bull.</source>
          , vol.
          <volume>35</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>45</lpage>
          ,
          <year>2012</year>
          . [Online]. Available: http://sites.computer.org/debull/ A12mar/monetdb.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>“</given-names>
            <surname>Google</surname>
          </string-name>
          . supersonic library,” https://code.google.com/archive/p/ supersonic/,
          <year>2017</year>
          , acessed:
          <volume>12</volume>
          /02/
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Arulraj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pavlo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Menon</surname>
          </string-name>
          , “
          <article-title>Bridging the archipelago between row-stores and column-stores for hybrid workloads</article-title>
          ,”
          <source>in Proceedings of the 2016 International Conference on Management of Data, ser. SIGMOD '16</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>583</fpage>
          -
          <lpage>598</lpage>
          . [Online]. Available: http://db.cs.cmu.edu/papers/2016/arulraj-sigmod2016.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <source>ScaMMDB: Facing Challenge of Mass Data Processing with MMDB</source>
          . Berlin, Heidelberg: Springer Berlin Heidelberg,
          <year>2009</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . [Online]. Available: http://dx.doi.
          <source>org/10.1007/978-3-642-03996-6 1</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fuller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Varadarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vandiver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Doshi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bear</surname>
          </string-name>
          , “
          <article-title>The vertica analytic database: C-store 7 years later</article-title>
          ,
          <source>” Proc. VLDB Endow.</source>
          , vol.
          <volume>5</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>1790</fpage>
          -
          <lpage>1801</lpage>
          , Aug.
          <year>2012</year>
          . [Online]. Available: http://dx.doi.org/10.14778/2367502.2367518
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zukowski</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Boncz</surname>
          </string-name>
          , “
          <article-title>From x100 to vectorwise: Opportunities, challenges and things most researchers do not think about,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ser</article-title>
          .
          <source>SIGMOD '12</source>
          . New York, NY, USA: ACM,
          <year>2012</year>
          , pp.
          <fpage>861</fpage>
          -
          <lpage>862</lpage>
          . [Online]. Available: http://doi.acm.
          <source>org/10</source>
          . 1145/2213836.2213967
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Boncz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Harizopoulos</surname>
          </string-name>
          ,
          <article-title>The Design and Implementation of Modern Column-Oriented Database Systems</article-title>
          . Hanover, MA, USA: Now Publishers Inc.,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          , “
          <article-title>Integrating compression and execution in column-oriented database systems,” in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, ser</article-title>
          .
          <source>SIGMOD '06</source>
          . New York, NY, USA: ACM,
          <year>2006</year>
          , pp.
          <fpage>671</fpage>
          -
          <lpage>682</lpage>
          . [Online]. Available: http://doi.acm.
          <source>org/10</source>
          .1145/1142473.1142548
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>