<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A study of PosDB Performance in a Distributed Environment</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">George</forename><surname>Chernishev</surname></persName>
							<email>chernishev@gmail.com</email>
						</author>
						<author>
							<persName><forename type="first">Vyacheslav</forename><surname>Galaktionov</surname></persName>
							<email>viacheslav.galaktionov@gmail.com</email>
						</author>
						<author>
							<persName><forename type="first">Valentin</forename><surname>Grigorev</surname></persName>
							<email>valentin.d.grigorev@gmail.com</email>
						</author>
						<author>
							<persName><forename type="first">Evgeniy</forename><surname>Klyuchikov</surname></persName>
							<email>evgeniy.klyuchikov@gmail.com</email>
						</author>
						<author>
							<persName><forename type="first">Kirill</forename><surname>Smirnov</surname></persName>
							<email>kirill.k.smirnov@math.spbu.ru</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Saint-Petersburg State University</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">JetBrains Research</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution">Saint-Petersburg State University</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="institution">Saint-Petersburg State University</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="institution">Saint-Petersburg State University</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff5">
								<orgName type="institution">Saint-Petersburg State University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">A study of PosDB Performance in a Distributed Environment</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E1D8F9715DFAB13CDD74FBEDBA5C9434</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:56+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>PosDB is a new disk-based distributed column-store relational engine aimed for research purposes. It uses the Volcano pull-based model and late materialization for query processing, and join indexes for internal data representation. In its current state PosDB is capable of both local and distributed processing of all SSB (Star Schema Benchmark) queries.</p><p>Data, as well as query plans, can be distributed among network nodes in our system. Data distribution is performed by horizontal partitioning.</p><p>In this paper we experimentally evaluate the performance of our system in a distributed environment. We analyze system performance and report a number of metrics, such as speedup and scaleup. For our evaluation we use the standard benchmarkthe SSB.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Column-stores have been actively investigated for the last ten years. Many open-source <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b1">[2]</ref>, <ref type="bibr" target="#b2">[3]</ref>, <ref type="bibr" target="#b3">[4]</ref>, <ref type="bibr" target="#b4">[5]</ref> and commercial <ref type="bibr" target="#b5">[6]</ref>, <ref type="bibr" target="#b6">[7]</ref>, <ref type="bibr" target="#b7">[8]</ref> products with different features and aims have been developed. The core design issues such as compression <ref type="bibr" target="#b8">[9]</ref>, <ref type="bibr" target="#b9">[10]</ref>, materialization strategy <ref type="bibr" target="#b10">[11]</ref>, <ref type="bibr" target="#b11">[12]</ref> and result reuse <ref type="bibr" target="#b12">[13]</ref> got significant attention. Nevertheless, distribution of data and control in disk-based column-store systems was not studied at all.</p><p>The reason for this is that none of open-source systems are truly distributed, although some of them <ref type="bibr" target="#b4">[5]</ref> support mediatorbased <ref type="bibr" target="#b13">[14]</ref> distribution. Several commercial systems, such as Vertica <ref type="bibr" target="#b5">[6]</ref>, are distributed but closed-source. To the best of our knowledge, no investigation of distribution aspects in columnstores has been conducted.</p><p>To address this problem, we are developing a disk-based distributed relational column-store engine -PosDB. In its current state it is based on the Volcano pull-based model <ref type="bibr" target="#b14">[15]</ref> and late materialization. Data distribution is supported in the form of horizontal per-table partitioning. Each fragment can be additionally replicated on an arbitrary number of nodes, i.e. our system is partially replicated <ref type="bibr" target="#b15">[16]</ref>. Control (query) distribution is also supported: parts of a query plan can be sent to a remote node for execution.</p><p>In our earlier studies <ref type="bibr" target="#b16">[17]</ref>, <ref type="bibr" target="#b17">[18]</ref> we have described opportunities offered by such a system and sketched its design. Later, an initial version of our system, PosDB, was presented and its high-level features were described <ref type="bibr" target="#b18">[19]</ref>.</p><p>In this paper, we present the results of first distributed experiments with PosDB. We evaluate system performance by studying several performance metrics, namely speedup and scaleup. For evaluation we use a standard OLAP benchmarkthe Star Schema Benchmark <ref type="bibr" target="#b19">[20]</ref>.</p><p>The paper is structured as follows. The architecture of the system is described in detail in section II. A short survey of distributed technology in databases is presented in section I. In section III we discuss used metrics (scaleup and speedup). The experimental evaluation and its results are presented in section IV.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>There is a shortage of distribution-related studies for relational column-oriented databases <ref type="bibr" target="#b17">[18]</ref>. The main reasons are the scarcity of research prototypes and the drawbacks in the existing ones.</p><p>Two research prototypes of distributed column-store systems are known to the authors - <ref type="bibr" target="#b4">[5]</ref>, <ref type="bibr" target="#b20">[21]</ref>. Both studies use an in-memory DBMS, MonetDB, some of whose parts were rewritten to add distribution-related functionality. This approach cannot be considered "true" distribution, because, in general, it restricts the pool of available distributed processing techniques. Developers have to take into account the architecture of the underlying centralized DBMS in order to employ it. Unfortunately, the degree of these restrictions is unclear for the aforementioned systems.</p><p>Another distributed column-store, the ClickHouse system, is an industrial open-source disk-based system. However, there are two issues with this system. Firstly, it was open-sourced only recently, in 2016, and there are no research papers based on this system, known to the authors. Secondly, it has several serious architectural drawbacks: a very restricted partitioning <ref type="bibr" target="#b21">[22]</ref> and issues with distributed joins <ref type="bibr" target="#b22">[23]</ref>.</p><p>At the same time, there are hundreds, if not thousands, of papers on the subject in application to row-stores <ref type="bibr" target="#b15">[16]</ref>, <ref type="bibr" target="#b13">[14]</ref>.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. POSDB: ARCHITECTURE</head><p>PosDB uses the Volcano pull-based model <ref type="bibr" target="#b14">[15]</ref>, so each query plan is represented as a tree with operators as vertexes and data flows as edges. All operators support the "open()-getNext()-close()" interface and can be divided into two groups:</p><p>• Operators that produce blocks of positions.</p><p>• Operators that produce individual tuples.</p><p>PosDB relies on late materialization, so operators of the second type are always deployed on the top of a query tree. They are used to build tuples from position blocks and to perform aggregation. The whole tree below the materialization point consists of operators which return blocks.</p><p>Each position block stores several position vectors of equal length, one per table. This structure is essentially a join index <ref type="bibr" target="#b23">[24]</ref>, <ref type="bibr" target="#b24">[25]</ref>, which we use to process a chain of join operators.</p><p>Currently we have the following operators that produce join indexes:</p><p>• DataSource, FilteredDataSource, operators for creating initial position streams. The former generates a list of contiguous positions without incurring any disk I/O, while the latter conducts a full column scan and produces a stream of positions whose corresponding values satisfy a given predicate. These operators are the only possible leaves of a query tree in our system; • GeneralPosAnd, SortedPosAnd, binary operators for the intersection of two position streams related to one table; • NestedLoopJoin, MergeJoin, HashJoin, binary operators, which implement the join operation in different ways; • UnionAll -an n-ary operator that processes its subtrees in separate threads and merges their output into a single stream in an arbitrary order; • ReceivePos -an ancillary unary operator that sends a query plan subtree to a remote node, receives join indexes from it and returns them to the ancestor;</p><p>• Asynchronizer -an ancillary unary operator that processes its child operator in a separate thread and stores the results in the internal fixed-size buffer; and the following that produce tuples:</p><p>• Select, for tuple reconstruction;</p><p>• Aggregate, for simple aggregation without grouping; • SGAggregate, HashSGAggregate, for complex aggregation with grouping and sorting; • SparseTupleSorter, for tuple sorting. As can be seen, query distribution is maintained on the operator level using two ancillary operators: ReceivePos and UnionAll. It should be emphasized, that a multithreaded implementation of UnionAll is essential here, because sequential execution would definitely incur severe waiting penalties, completely negating the benefits of a distributed environment. Figure <ref type="figure" target="#fig_1">1</ref> presents an example of a distributed query plan for the query 2.1 from the SSB which is as follows:</p><p>select sum(lo_revenue), d_year, p_brand1 from lineorder, date, part, supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand1 order by d_year, p_brand1; Also, there is a notion of data readers in our system. Data reader is a special entity used for reading attribute values corresponding to the position stream. Currently, we support the following hierarchy of readers:</p><p>• ContinuousReader and NetworkReader, basic readers for accessing a local or remote partition respectively; • PartitionedDataReader, an advanced reader for accessing the whole column, whose partitions are stored on one or several machines. For each partition it creates a corresponding basic reader to perform local or remote full scan. Then, using information from the catalog, a</p><p>PartitionedDataReader automatically determines which reader to use for a position in a join index; • SyncReader, an advanced reader responsible for synchronous reading of multiple attributes. This reader maintains a PartitionedDataReader for each column. Initially, a query plan does not contain readers. Each operator creates readers on demand and feeds them positions to receive necessary data. Operators that materialize tuples use SyncReader, others usually employ PartitionedDataReader. Using these advanced readers allows operators to be unaware of data distribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. GENERAL CONSIDERATIONS AND USED METRICS</head><p>Distributing the DBMS has two important goals <ref type="bibr" target="#b15">[16]</ref>: improving performance and ensuring easy system expansion. These goals are usually evaluated using two metrics <ref type="bibr" target="#b25">[26]</ref>: scaleup and speedup.</p><p>Speedup reflects the dependency of system performance on the number of processing nodes under the fixed workload. Thus, it shows the performance improvement that can be achieved by using additional equipment and without system redesign.</p><p>Linear speedup is highly desired but rarely can be achieved in practice. Superlinear speed points out an unaccounted distributed system resources or poor algorithm. So, a good system should try to approximate linear dependency as well as it can.</p><p>Scaleup is a similar metric that reflects how easy it is to sustain the achieved performance level under an increased workload. The number of processing nodes and a size of the workload are increased by the same number of times. An ideal system achieves linear scaleup, but again, it is rarely achievable in practice.</p><p>Workload can be increased either by increasing the number of queries or the amount of data. The former is the transactional scaleup and the latter is the data scaleup. We do not investigate transactional scaleup, because PosDB is oriented towards OLAP processing -a kind of processing that implies long-running queries. Taniar et al. <ref type="bibr" target="#b25">[26]</ref> argue that transactional scaleup is relevant in transaction processing systems where the transactions are small queries. On the other hand, data scaleup is very important for our system because the amount of data in OLAP environments can exhibit feasible growth.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. EXPERIMENTS</head><p>In order to conduct the experiments, we selected the following setup of data and query distributions. We designate one processing node as a server and assign it several worker nodes. The server only processes user requests, while the data is stored on worker nodes (see Figure <ref type="figure" target="#fig_2">2</ref>). Each worker node stores a horizontal partition of the fact table (LINEORDER) along with the replicas of all other (dimension) tables. Dimension tables are always tiny compared to the fact table, so their replication incurs almost no storage overhead.</p><p>Figure <ref type="figure" target="#fig_1">1</ref> shows the distributed query for query 2.1 from the workload. It illustrates the general approach which we follow ... in this paper for each query. The server is responsible for receiving data from worker nodes and for aggregation. Note that all queries in this benchmark can be distributed in such a manner that no inter node (worker node) communication is required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Description of Experiments, Hardware and Software Experimental Setups</head><p>We consider three different experiments, all using the SSB workload:</p><p>1) The dependency of PosDB performance on SSB scale factor in a local (one node) case.</p><p>2) The speedup of PosDB, i.e. the dependency of the performance for a fixed workload (scale factor 50) on the number of nodes. The number of nodes includes server and 1, 2, 4, 6, 8 worker nodes.</p><p>3) The scaleup of PosDB, i.e. the performance on k = 1, 2, 4, 6, 8 nodes for scale factor 10 * k workload.</p><p>These experiments are conducted on a cluster of ten machines connected by 1GB local network. Each machine has the following characteristics: Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz (4 cores total), 4 GB RAM. The software used is Ubuntu Linux 16.04.1 (64 bit), GCC 5.4.0, JSON for Modern C++ 2.1.0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Experiment 1</head><p>In this experiment we study PosDB behavior in a local case under the full SSB workload. We have chosen six different  After careful analysis, several interesting conclusions can be drawn:</p><formula xml:id="formula_0">Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3<label>0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4</label></formula><p>• Although query 1.3 has a higher selectivity, its execution time is higher than that of the other queries of its flight. Perhaps it is due to a more expensive aggregation.</p><p>• Execution time of the queries from the second flight decreases with the increase in selectivity, as is to be expected. • Query flight 3 reveals two interesting points. Query 3.1 is much more expensive than the others, because its first join operator returns a significantly higher number of records, thus loading the rest of the query tree. With high scale factors, query 3.3 become much more expensive than others. We suppose that it is due to intensive disk usage. • Queries 4.1 and 4.2 behave in a very similar way, however the last join in the query 4.1 produces more results. This is the reason for the slightly extended run times for the whole query. Also, there is an anomaly in query 4.3 which still has to be explained. We plan to explore it in our further studies. The total time of the whole workload is presented in Figure <ref type="figure" target="#fig_4">4</ref>. In order to obtain this graph we summed up the run times of all queries described in the SSB. Essentially, this graph is just another representation of the information presented in Figure <ref type="figure" target="#fig_3">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Experiment 2</head><p>You can see the results of the second experiment in Figure <ref type="figure" target="#fig_5">5</ref>. Starting with 1, the number of nodes is increased by 2 with each step. The contents of the LINEORDER table (about 11 GBs) are evenly partitioned and distributed across them. Other tables are fully replicated. The red line shows how much faster the queries are executed when the number of nodes increases. The green line represents the "ideal" case, where the speedup grows linearly. As you can see, PosDB's performance increases when new nodes are added, although not linearly, which is because our system is yet in its infancy. We believe that such high overhead can be written off on the lack of a proper buffer manager, which means that the same data may be transferred over the network many times.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Experiment 3</head><p>In this experiment we measured PosDB data scaleup under scale factors 10, 20, 40, 60, 80 on 1, 2, 4, 6, 8 nodes. Data and query for each test configuration are distributed similar to the experiment 2. LINEORDER is partitioned between nodes, other tables are fully replicated. Then, parts of query plan that lie below aggregation (or tuple construction) are sent to different nodes, each with a DataSource operator for the corresponding LINEORDER partition. See Figures <ref type="figure" target="#fig_2">2 and 1</ref> for more details. We consider scaleup as (server+1 machine)executiontime (server+k machines)executiontime relation and present the results in Figure <ref type="figure" target="#fig_6">6</ref>. To estimate the PosDB scaleup, we also plotted the "linear scaleup" and "no scaleup" cases. The former is a situation when scaleup is constant (ideal value) during all experiments. In the "no scaleup" case we assume that the amount of data grows linearly, but the computing power remains constant, so scaleup is 1/(number of machines).</p><p>We can see that PosDB scaleup is in [0.5, 0.75] boundaries, slowly decreasing with the number of servers growing. Thus, comparing to the case "no scaleup," we can conclude that our system can offer a good scale-up.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION</head><p>In this paper we presented an evaluation of PosDB, our distributed column-store query engine. We used the Star Schema Benchmark -a standard benchmark used for evaluation of OLAP systems. We studied several performance metrics, such as speedup and scaleup. In our experiments we were able to achieve scale factor 200 on a single machine, our system demonstrated sublinear speedup and a good data scaleup. The evaluation also allowed us to discover some anomalies and bottlenecks in our system. They are the subject of our future research.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Example of distributed query plan -distributed query 2.1 from SSB</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Data distribution scheme in PosDB</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Query performance from scale factor dependency in local case</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Dependency of total SSB execution time from scale factor</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. Speedup from number of servers dependency in PosDB</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Fig. 6 .</head><label>6</label><figDesc>Fig. 6. Data scaleup from number of servers dependency in PosDB</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">C-store: A column-oriented dbms</title>
		<author>
			<persName><forename type="first">M</forename><surname>Stonebraker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Batkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cherniack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ferreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Lau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>O'neil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>O'neil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rasin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zdonik</surname></persName>
		</author>
		<ptr target="http://dl.acm.org/citation.cfm?id=1083592.1083658" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st International Conference on Very Large Data Bases, ser. VLDB &apos;05</title>
				<meeting>the 31st International Conference on Very Large Data Bases, ser. VLDB &apos;05</meeting>
		<imprint>
			<publisher>VLDB Endowment</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="553" to="564" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Monetdb: Two decades of research in column-oriented database architectures</title>
		<author>
			<persName><forename type="first">S</forename><surname>Idreos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Groffen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Manegold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Mullender</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Kersten</surname></persName>
		</author>
		<ptr target="http://sites.computer.org/debull/A12mar/monetdb.pdf" />
	</analytic>
	<monogr>
		<title level="j">IEEE Data Eng. Bull</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="40" to="45" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<ptr target="https://code.google.com/archive/p/supersonic/" />
		<title level="m">Google. supersonic library</title>
				<imprint>
			<date type="published" when="2017-02">2017. 12/02/2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Bridging the archipelago between row-stores and column-stores for hybrid workloads</title>
		<author>
			<persName><forename type="first">J</forename><surname>Arulraj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pavlo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Menon</surname></persName>
		</author>
		<ptr target="http://db.cs.cmu.edu/papers/2016/arulraj-sigmod2016.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 International Conference on Management of Data, ser. SIGMOD &apos;</title>
				<meeting>the 2016 International Conference on Management of Data, ser. SIGMOD &apos;</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="583" to="598" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">ScaMMDB: Facing Challenge of Mass Data Processing with MMDB</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-03996-6_1</idno>
		<ptr target="http://dx.doi.org/10.1007/978-3-642-03996-61" />
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>Springer</publisher>
			<biblScope unit="page" from="1" to="12" />
			<pubPlace>Berlin, Heidelberg; Berlin Heidelberg</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The vertica analytic database: C-store 7 years later</title>
		<author>
			<persName><forename type="first">A</forename><surname>Lamb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fuller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Varadarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Vandiver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Doshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bear</surname></persName>
		</author>
		<idno type="DOI">10.14778/2367502.2367518</idno>
		<ptr target="http://dx.doi.org/10.14778/2367502.2367518" />
	</analytic>
	<monogr>
		<title level="m">Proc. VLDB Endow</title>
				<meeting>VLDB Endow</meeting>
		<imprint>
			<date type="published" when="2012-08">Aug. 2012</date>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="1790" to="1801" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">From x100 to vectorwise: Opportunities, challenges and things most researchers do not think about</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zukowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Boncz</surname></persName>
		</author>
		<idno type="DOI">10.1145/2213836.2213967</idno>
		<ptr target="http://doi.acm.org/10.1145/2213836.2213967" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD &apos;12</title>
				<meeting>the 2012 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD &apos;12<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="861" to="862" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Boncz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Harizopoulos</surname></persName>
		</author>
		<title level="m">The Design and Implementation of Modern Column-Oriented Database Systems</title>
				<meeting><address><addrLine>Hanover, MA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Now Publishers Inc</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Integrating compression and execution in column-oriented database systems</title>
		<author>
			<persName><forename type="first">D</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ferreira</surname></persName>
		</author>
		<idno type="DOI">10.1145/1142473.1142548</idno>
		<ptr target="http://doi.acm.org/10.1145/1142473.1142548" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD &apos;06</title>
				<meeting>the 2006 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD &apos;06<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="671" to="682" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Db2 with blu acceleration: So much more than just a column store</title>
		<author>
			<persName><forename type="first">V</forename><surname>Raman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Attaluri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Barber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Chainani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kalmuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kulandaisamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leenstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lightstone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Lohman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Malkemus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mueller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Pandis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schiefer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sharpe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sidle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Storm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.14778/2536222.2536233</idno>
		<ptr target="http://dx.doi.org/10.14778/2536222.2536233" />
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endow</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="1080" to="1091" />
			<date type="published" when="2013-08">Aug. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Materialization strategies in a column-oriented DBMS</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Myers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Dewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madden</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICDE.2007.367892</idno>
		<ptr target="http://dx.doi.org/10.1109/ICDE.2007.367892" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007</title>
				<meeting>the 23rd International Conference on Data Engineering, ICDE 2007<address><addrLine>The Marmara Hotel, Istanbul, Turkey</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">April 15-20, 2007, 2007</date>
			<biblScope unit="page" from="466" to="475" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Materialization strategies in the vertica analytic database: Lessons learned</title>
		<author>
			<persName><forename type="first">L</forename><surname>Shrinivas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bodagala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Varadarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Bharathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bear</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 29th International Conference on Data Engineering (ICDE)</title>
				<imprint>
			<date type="published" when="2013-04">2013. April 2013</date>
			<biblScope unit="page" from="1196" to="1207" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">An architecture for recycling intermediates in a column-store</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Ivanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Kersten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">J</forename><surname>Nes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Gonc ¸alves</surname></persName>
		</author>
		<idno type="DOI">10.1145/1559845.1559879</idno>
		<ptr target="http://doi.acm.org/10.1145/1559845.1559879" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD &apos;09</title>
				<meeting>the 2009 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD &apos;09<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="309" to="320" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The state of the art in distributed query processing</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kossmann</surname></persName>
		</author>
		<idno type="DOI">10.1145/371578.371598</idno>
		<ptr target="http://doi.acm.org/10.1145/371578.371598" />
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="422" to="469" />
			<date type="published" when="2000-12">Dec. 2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Query evaluation techniques for large databases</title>
		<author>
			<persName><forename type="first">G</forename><surname>Graefe</surname></persName>
		</author>
		<idno type="DOI">10.1145/152610.152611</idno>
		<ptr target="http://doi.acm.org/10.1145/152610.152611" />
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="73" to="169" />
			<date type="published" when="1993-06">Jun. 1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Principles of Distributed Database Systems</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ozsu</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
			<publisher>Prentice Hall Press</publisher>
			<pubPlace>Upper Saddle River, NJ, USA</pubPlace>
		</imprint>
	</monogr>
	<note>3rd ed</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Chernishev</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-23201-0_12</idno>
		<ptr target="http://dx.doi.org/10.1007/978-3-319-23201-012" />
		<title level="m">New Trends in Databases and Information Systems: ADBIS 2015 Short Papers and Workshops, BigDap, DCSA, GID, MEBIS, OAIS</title>
				<meeting><address><addrLine>, SW4CH, WISARD, Poitiers, France; Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2015">September 8-11, 2015. 2015</date>
			<biblScope unit="page" from="97" to="107" />
		</imprint>
	</monogr>
	<note>Towards Self-management in a Distributed Column-Store System</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">The design of an adaptive column-store system</title>
		<idno type="DOI">10.1186/s40537-017-0069-4</idno>
		<ptr target="http://dx.doi.org/10.1186/s40537-017-0069-4" />
	</analytic>
	<monogr>
		<title level="j">Journal of Big Data</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">21</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Chernishev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Grigorev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Galaktionov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Klyuchikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Smirnov</surname></persName>
		</author>
		<title level="m">PosDB: a Distributed Column-Store Engine (paper submitted</title>
				<meeting><address><addrLine>Berlin, Heidelberg; Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">E</forename><surname>Oneil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Oneil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<ptr target="http://www.cs.umb.edu/∼poneil/StarSchemaB.PDF" />
		<title level="m">The Star Schema Benchmark (SSB)</title>
				<imprint>
			<date type="published" when="2009-07-20">2009. 20/07/2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">DCODE: A Distributed Column-Oriented Database Engine for Big Data Analytics</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mortazavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Adnaik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Morgan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Fang</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-24315-3_30</idno>
		<ptr target="http://dx.doi.org/10.1007/978-3-319-24315-330" />
		<imprint>
			<date type="published" when="2015">2015</date>
			<publisher>Springer International Publishing</publisher>
			<biblScope unit="page" from="289" to="299" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">A migration Yandex ClickHouse. A transcript of a talk at High-load++ 2016</title>
		<ptr target="https://habrahabr.ru/post/322620/" />
		<imprint>
			<date type="published" when="2017-04">2017. 30/04/2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">A comparison of in-memory databases</title>
		<ptr target="http://www.exasol.com/site/assets/files/3147/acomparisonofin-memorydatabases.pdf" />
		<imprint>
			<date type="published" when="2017-04">2017. 30/04/2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Fast joins using join indices</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Ross</surname></persName>
		</author>
		<idno type="DOI">10.1007/s007780050071</idno>
		<ptr target="http://dx.doi.org/10.1007/s007780050071" />
	</analytic>
	<monogr>
		<title level="j">The VLDB Journal</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="24" />
			<date type="published" when="1999-04">Apr. 1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Query processing techniques for solid state drives</title>
		<author>
			<persName><forename type="first">D</forename><surname>Tsirogiannis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Harizopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Wiener</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Graefe</surname></persName>
		</author>
		<idno type="DOI">10.1145/1559845.1559854</idno>
		<ptr target="http://doi.acm.org/10.1145/1559845.1559854" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD &apos;09</title>
				<meeting>the 2009 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD &apos;09<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="59" to="72" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">High-Performance Parallel Database Processing and Grid Databases</title>
		<author>
			<persName><forename type="first">D</forename><surname>Taniar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H C</forename><surname>Leung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Rahayu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Goel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Wiley Series on Parallel and Distributed Computing</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Zomaya</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
