<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Non-Uniform Tuning Method for SQL-on-Hadoop Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Edson</forename><surname>Ramiro</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Universidade Federal do Paraná</orgName>
								<address>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lucas</forename><surname>Filho</surname></persName>
							<email>erlfilho@inf.ufpr.br</email>
							<affiliation key="aff0">
								<orgName type="institution">Universidade Federal do Paraná</orgName>
								<address>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Renato</forename><surname>Silva De Melo</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Universidade Federal do Paraná</orgName>
								<address>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Eduardo</forename><surname>Cunha De Almeida</surname></persName>
							<email>eduardo@inf.ufpr.br</email>
							<affiliation key="aff0">
								<orgName type="institution">Universidade Federal do Paraná</orgName>
								<address>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Non-Uniform Tuning Method for SQL-on-Hadoop Systems</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">A65C1865A73D744BD80E40079ED9253F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T02:45+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Tuning</term>
					<term>Self-Tuning</term>
					<term>SQL-on-Hadoop</term>
					<term>MapReduce</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>A SQL-on-Hadoop query consists of a workflow of MapReduce jobs with a single point of configuration. This means that the developer tunes hundreds of tuning parameters directly in the query source code (or via terminal interface), but the system assumes the same configuration to every running job. The lurking problem is that the system allocates computing resources uniformly to every running job, even if they present different consumption needs (after all the tuning setup is the same). In this paper, we demonstrate that such uniform allocation of resources drives the query to inefficient performance. We claim that applying a non-uniform allocation method to define a customized tuning setup for each job can outperform the uniform allocation. Ultimately, we demonstrate that searching for specific tuning setup is an optimization problem with polynomial cost.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Ever since the proliferation of SQL-on-Hadoop engines <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b0">1]</ref> the number of configuration parameters exposed by such systems, and by the underneath MapReduce systems <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b19">20]</ref>, has grown considerably, as presented in Figure <ref type="figure" target="#fig_0">1</ref>. For instance, the SQL-on-Hadoop system Apache Hive has increased from 96 configuration parameters in its 0.6.0 release up to 989 parameters in the release 3.0.0. The underlying processing engine Apache Hadoop has increased from 104 configuration parameters in its 0.12.0 release up to 530 parameters in the release 3.1.0. The impact on performance of this growing number of tuning parameters has given rise to a successful line of research on tuning MapReduce systems <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b23">24,</ref><ref type="bibr" target="#b25">26,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b18">19]</ref>.</p><p>However, tuning SQL-on-Hadoop queries is more complex than tuning MapReduce jobs, because queries span a workflow of jobs with a single point of configuration (i.e., the tuning setup is set in the query source code or via command line). In practice, developers tune the physical resources to be allocated by the query and the query processing engine propagates such set of physical resources to all jobs in the query plan with potential inefficient performance (see <ref type="bibr">Section 4)</ref>.</p><p>In traditional SQL processing, a query plan indicates the execution flow for the query operators. SQL-on-Hadoop processing is no different, but in the dis- tributed environment of MapReduce systems every job of a query plan implements a different set of query operators with different resource consumption needs <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b21">22]</ref>. For instance, while one job requires disk bandwidth due to a Ta-bleScan operator, another job requires memory throughput due to a Sort operator. The lurking problem of tuning SQL-on-Hadoop with a single point of configuration: the propagation of the same set of physical resources to jobs with different resource needs drives the query to inefficient performance. Contribution. In order to avoid such propagation of tuning setup, we propose a novel resource allocation method that assigns a specific tuning setup for each job in the query plan. We formulate the searching for specific tuning setup as a combinatorial optimization problem, highlight its special structure, and show that some special properties preserve the computational efficiency for a particular input of this problem. We also present the shortcoming of tuning SQL-on-Hadoop queries with the current MapReduce tuning advisers and how these queries can achieve better performance when employing our method.</p><p>Structure. Section 2 presents the notations and definitions required for presenting the problem. Section 3 presents the problem of tuning SQL-on-Hadoop queries. Section 4 presents the current tuning allocation method and its impact on SQL-on-Hadoop queries. Section 5 presents our method. Section 6 concludes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Notation and Definitions</head><p>Let us define a query plan as a directed acyclic graph G = (V, E), where the set of vertices V represents the MapReduce jobs, and the set of edges E denotes the precedence between two jobs. More precisely, a vertex (job) j ∈ V is a tuple of the form j = (O j , T j , C j ) in which O j is the set of physical query operators it executes, T j is the set of associated input tables that are read and processed by j, and C j is the set of configurations used to allocate resources. In this set each c ∈ C represents a configuration exposed by the SQL-on-Hadoop system that allocates resources to jobs. Each directed edge is an ordered pair of vertices e = (i, j) ∈ E that connects the jobs i to j, when the execution of i directly precedes j.  Hadoop queries. Note that the number of MapReduce jobs in each query plan differs considerably across software releases due to the development of different optimization techniques and the addition of new query operators. Although our example sticks to the Apache Hive terminology other SQL-on-Hadoop systems like Spark <ref type="bibr" target="#b2">[3]</ref> and Impala <ref type="bibr" target="#b13">[14]</ref> also represent the query plans as DAGs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The Tuning Allocation Problem</head><p>In SQL-on-Hadoop systems, the task of allocating physical resources through tuning setups mainly comprises two steps: (1) searching for the most efficient tuning setup for a job j ∈ V , and (2) allocating the found tuning setup to the same job j. We define the most efficient tuning setup as the configuration that drives the query to decrease response time or minimize resource usage. Searching for the most efficient tuning setup is a task performed by Tuning Advisers. The common approach is to profile the query execution and, then, to employ heuristics such as Genetic Algorithm <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b4">5]</ref>, Hill Climbing <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b9">10]</ref>, Random Forest <ref type="bibr" target="#b3">[4]</ref>, Particle Swarm Optimization <ref type="bibr" target="#b12">[13]</ref>, Exhaustive Search <ref type="bibr" target="#b17">[18]</ref>, Recursive Random Search <ref type="bibr" target="#b10">[11]</ref> and others <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b26">27]</ref>. We define the searching for the most efficient tuning setup as the Adviser Function, as follows:</p><p>Definition 1 (Adviser Function). We denote as f (j) the adviser function f : V → C which is responsible for always finding the most efficient tuning setup C j for j ∈ V , where the set C = {C 1 , . . . , C k } is the collection of possible configurations.</p><p>However, allocating the found tuning setup remains a problem due to the single point of configuration of SQL-on-Hadoop queries. We propose to replace the single point of configuration of queries, exposed to developers, by an automated tuning system able to allocate specific tuning setup per job. Thus, we define the task of allocating tuning setups C j for each job j ∈ V in a query plan G as the problem of assigning the most efficient tuning setup found by the adviser function f (j).</p><p>In Definition 2, we formulate the tuning allocation as a combinatorial optimization problem, and use this perspective to develop the reasoning about how to improve the current allocation of configurations to jobs for a query plan. For the problem definition, suppose that we have a computable function σ : C × V → R, where σ(C i , j) determines the computational cost of execute a job j with a tuning setup C i . We define the problem of assigning the most efficient tuning setup, as follows:</p><p>Definition 2 (Tuning Allocation Problem). Given a directed acyclic graph G = (V, E) representing a query plan, a set C of possible configurations, and a cost function σ : C × V → R. Find an assignment of tuning setups such that the total cost to process all jobs is minimum, that is, the summation</p><formula xml:id="formula_0">j∈V Ci∈C σ(C i , j) is minimum.</formula><p>Next, we present the current approach employed by SQL-on-Hadoop systems to assign tuning setups to jobs. In Section 5, we present our approach to assign tuning setups to jobs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Uniform Tuning</head><p>The methodology employed by the current SQL-on-Hadoop engines allocates the same physical resources to all jobs of a query plan. We name this methodology as the Uniform Tuning method, and we treat it as a shortcoming for the query plan. We define the Uniform Tuning method as follows: Definition 3 (Uniform Tuning). The Uniform Tuning is the assignment of configurations to jobs such that</p><formula xml:id="formula_1">C u = C v for any u, v ∈ V and C u , C v ∈ C.</formula><p>Despite the adviser function f (j) always find the most efficient tuning setup C j for every job j ∈ V , in the case where only one tuning setup C j is replicate to all jobs in V (according to Definition 3), C j may negatively affect the query's performance, as we observe in Figure <ref type="figure" target="#fig_3">3</ref>. The uniform tuning problem becomes more severe because SQL-on-Hadoop systems delegate to developers the responsibility to chose one of the available tuning setups C j to be replicated.</p><p>The Fig. <ref type="figure" target="#fig_3">3</ref> illustrates that the Uniform Tuning drives the query processing to inefficient performance in practice through experiments in the Amazon Elastic MapReduce (EMR) cloud environment. We used Starfish <ref type="bibr" target="#b10">[11]</ref> to generate the tuning setups. We ran the TPC-H benchmark with Scale Factor of 100GB on Apache Hive (version 0.13.1) and Apache Hadoop (version 0.20.2). We are attached to these versions because they are the versions supported by Starfish.  The experiments ran with 6 machines (type c5.2xlarge: 8 of CPU and 16GB of RAM, 100GB of SSD disks). Each runtime is an average of 3 executions.</p><p>The x-axis represent the available tuning setups, where tuning-j represents the given C j generated for a job j. Each C j was applied uniformly to the query plan, following Definition 3. The y-axis represents the speedups when the execution of the query under the tuning-j is compared to the execution of the default configuration. The default configuration is the configuration bundled with the SQL-on-Hadoop system and is our baseline. According to Definition 3, each tuning setup allocates a different set of resources uniformly to the query, consequently, producing a different outcome. Note that in all queries the performance degrades due to the presence of the uniform allocation of resources, as expected.</p><p>We claim that applying a non-uniform allocation method to define a customized tuning setup for each job can outperform the current alternative (uniform tuning), which consists of choosing arbitrarily a tuning setup C 0 and replicate it to all jobs. Therefore, our contribution is a non-uniform allocation method that allows SQL-on-Hadoop systems to apply specific tuning setup per job.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Non-Uniform Tuning</head><p>For a query plan G = (V, E), every job implements a different set of SQL operators, i.e., for every job u, v ∈ V we have O u = O v . Consequently, we assume that each job presents different resource consumption needs <ref type="bibr" target="#b5">[6]</ref>. For instance, while a job u requires disk bandwidth due to a TableScan operator, another job v requires memory throughput due to a Sort operator. Next, the rest of this section discuss properties and facts about the particular structure of this problem. The first fact states that the optimum of the problem presented in Definition 2 can be found efficiently when using a non-uniform tuning method. Since this study aims to find a way to do more efficient queries, it is very important to be sure that the allocation of optimal tuning can be made in polynomial time.</p><p>The following integer linear programming model describes the optimal solution for the Tuning Allocation Problem. We define the binary decision variable x ij ∈ {0, 1} to indicate whether a tuning setup C i ∈ C is assigned to a job j ∈ V . Let the coefficient c ij = σ(C i , j) be the computational cost for executing a job j with the tuning setup</p><formula xml:id="formula_2">C i . Minimize j∈V Ci∈C c ij x ij<label>(1) subject to</label></formula><p>Ci∈C</p><formula xml:id="formula_3">x ij = 1 ∀j ∈ V<label>(2)</label></formula><p>j∈V</p><formula xml:id="formula_4">x ji = 1 ∀C i ∈ C<label>(3)</label></formula><formula xml:id="formula_5">x ij ∈ {0, 1} ∀C i ∈ C, ∀j ∈ V<label>(4)</label></formula><p>The objective function (1) minimizes the computational cost in processing all the jobs for a given query plan. Constraints in (2) ensure that exactly one tuning setup is assigned to a vertex. Reciprocally, the equality in (3) says that only one vertex j ∈ V can choose a tuning setup C i ∈ C. Finally, the constraints in (4) preserve the decision variable's integrality.</p><p>Claim 1. The Non-Uniform Tuning method can find the optimal solution for the Tuning Allocation Problem in polynomial time.</p><p>The program in ( <ref type="formula" target="#formula_2">1</ref>)-( <ref type="formula" target="#formula_5">4</ref>) describes precisely the model of an well known combinatorial optimization problem called Assignment Problem. For clarity of exposition, we present a procedure to transform an instance G = (V, E), C of the Tuning Allocation Problem into an instance G = (V , E ) of the original Assignment Problem as follows. Let G be a undirected graph such that we create a new vertex i associated to each set C i ∈ C. Also add a copy of each j ∈ V and define U as the set of all i added to G in this way we have V = U ∪ V with U and V disjoint. Now, we add an edge {i, j} between every vertex i ∈ U and j ∈ V . The cost of assigning a tuning setup C i to job j is c ij and can be interpreted as an weight on the edge {i, j} ∈ E . Notice that all edges in E go between U and V . This implies that G (the input of the Assignment Problem) is a complete bipartite graph.</p><p>Even though the Assignment Problem problem is known to be NP-hard, an optimal solution can be obtained efficiently (within execution time bounded by a polynomial function) when the input graph is bipartite <ref type="bibr" target="#b24">[25]</ref>. Therefore, the problem can be solved efficiently, because of the total unimodularity of the constraint matrix of the Assignment Problem.</p><p>An important observation about the Tuning Allocation Problem is that if we drop the constraints (3) of the linear model, the solution remains feasible for the problem. It means that we can attribute a configuration C i to more than one job (otherwise, the uniform tuning method would not be feasible). Therefore, the natural formulation for this problem is a relaxation for the Assignment Problem. Actually, the resulting integer linear program can be rewritten as follows:</p><formula xml:id="formula_6">Minimize j∈V Ci∈C c ij x ij subject to Ci∈C x ij = 1 ∀j ∈ V x ij ∈ {0, 1} ∀C i ∈ C, ∀j ∈ V.</formula><p>Fortunately, the coefficient matrix does not change, and we still have an incidence matrix of a bipartite graph. In this way, the matrix of coefficients of the problem maintains the total unimodularity. Consequently, the Tuning Allocation Problem also can be solved in polynomial time.</p><p>Claim 2. For a query plan G = (V, E), in worst case, the non-uniform method is equal to the uniform method, otherwise the non-uniform method performs better.</p><p>Let the tuning setup C 0 be the one assigned to all jobs j ∈ V of the query plan G using the uniform tuning. Now, consider the case with non-uniform tuning in which for the same query plan G, the most efficient tuning for every job j can be obtained by calling interactively the function f (j) (see <ref type="bibr">Definition 1)</ref>. Directly from definition of the adviser function f (j), we have the following inequality</p><formula xml:id="formula_7">σ(f (j), j) ≤ σ(C 0 , j)<label>(5)</label></formula><p>for every job j ∈ V . The inequality says that, in the best case for the uniform tuning, the arbitrary tuning setup C 0 can be the most efficient for j, but there are no guarantees that it will happen. While attributing a tuning setup C i chosen by the function f (j), we know that it is always the most suitable tuning setup for j. Consequently, the total cost required to execute all jobs j ∈ V of the query plan G using the non-uniform method, can be written, as the following summation:</p><p>j∈V σ(f (j), j).</p><p>So, from the inequality (5) we can compare the total costs of the two methods, as follows:</p><formula xml:id="formula_8">j∈V σ(f (j), j) ≤ j∈V σ(C 0 , j),</formula><p>i.e., the computational cost for executing the query plan G using the non-uniform assignment method is at most equals to the cost for the uniform method. The advantage of using the uniform tuning method is that it is simple and straightforward, i.e., the SQL-on-Hadoop engine just replicates the chosen tuning setup. On the other hand, there are no guarantees that the chosen tuning setup will perform well during the query's execution. In this sense, our proposal of non-uniform tuning personalizes the tuning setup for each job and the optimal assignment of tuning setups is guaranteed. In other words, the Tuning allocation Problem can be solved optimally with the non-uniform tuning method and Claim 1 shows that it can be done in polynomial time. Furthermore, a direct consequence of the non-uniform tuning method is the improvement in the performance of the query plan, as stated in Claim 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>In this paper we presented the shortcoming of MapReduce tuning advisers when applied to SQL-on-Hadoop systems due to the uniform allocation of physical resources and its consequences on query's performance. After, we modeled the Tuning Allocation Problem in order to solve the uniform allocation of physical resources. We presented our Non-Uniform Tuning method and proved that it allows assigning optimal tuning setups to SQL-on-Hadoop queries. We also proved that the optimal set of tuning setups required by a query can be found in polynomial time. For future work, an interesting direction consists of combining multiple tuning advisers to optimize one single query, once different tuning advisers focus on different query operators with different resource usage.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Number of configuration parameters per release.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2</head><label>2</label><figDesc>illustrates the query plans compiled for TPC-H query 7, in version 0.13.1 and 3.1.0 of Apache Hive to reinforce the growing complexity of SQL-on-Query plan compiled by Hive 0.13.1. (b) Query plan compiled by Hive 3.1.0.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 2 :</head><label>2</label><figDesc>Fig. 2: Query plans for TPC-H query 7.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 3 :</head><label>3</label><figDesc>Fig. 3: Speedups compared to default configuration for the uniform tuning. Tuning-{v}, represents a tuning setup generated by Starfish for the job v.</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgement: This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES) -Finance Code 001.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads</title>
		<author>
			<persName><forename type="first">A</forename><surname>Abouzeid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Bajda-Pawlikowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Silberschatz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rasin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment</title>
				<meeting>the VLDB Endowment</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Apache Software Foundation: Apache Flink: Stateful Computations over Data Streams</title>
		<imprint>
			<date type="published" when="2015">2015</date>
			<publisher>Apache.Org</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Spark SQL: Relational Data Processing in Spark</title>
		<author>
			<persName><forename type="first">M</forename><surname>Armbrust</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Xin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">K</forename><surname>Bradley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kaftan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Franklin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ghodsi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zaharia</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop&apos;s Configuration</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Bei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Xiong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">TPDS</title>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">MEST: A Model-Driven Efficient Searching Approach for MapReduce Self-Tuning</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Bei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Tpc-h analyzed: Hidden messages and lessons learned from an influential benchmark</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Boncz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Erling</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>TPCTC</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Cheetah: a high performance, custom data warehouse on top of MapReduce</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment</title>
				<meeting>the VLDB Endowment</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">VectorH: Taking SQL-on-Hadoop to the next level</title>
		<author>
			<persName><forename type="first">Costea</forename><surname>Adrian Ionescu Bogdan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Micha</forename><surname>Switakowski Cristian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bârc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alicja</forename><surname>Sompolski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luszczak Micha L Szafrá Nski Giel De Nijs Peter Boncz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Management of Data -SIGMOD</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">MapReduce: Simplified Data Processing on Large Clusters</title>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghemawat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">OSDI&apos;04</title>
				<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">JellyFish: Online Performance Tuning with Adaptive Configuration and Elastic Container in Hadoop Yarn</title>
		<author>
			<persName><forename type="first">X</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Qian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICPADS</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Starfish: A Self-tuning System for Big Data Analytics</title>
		<author>
			<persName><forename type="first">H</forename><surname>Herodotou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Borisov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Dong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CIDR</title>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">The Performance of MapReduce : An In-depth Study</title>
		<author>
			<persName><forename type="first">D</forename><surname>Jiang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment</title>
				<meeting>the VLDB Endowment</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Optimizing Hadoop parameter settings with gene expression programming guided PSO</title>
		<author>
			<persName><forename type="first">M</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">A</forename><surname>Taylor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Khan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Concurrency and Computation: Practice and Experience</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Kornacker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Behm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Bittorf</surname></persName>
		</author>
		<title level="m">Impala: A Modern, Open-Source SQL Engine for Hadoop</title>
				<imprint>
			<publisher>CIDR</publisher>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">MRONLINE: MapReduce online performance tuning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>HPDC</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Gunther: Search-based autotuning of MapReduce</title>
		<author>
			<persName><forename type="first">G</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Datta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Willke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kalavri</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">Euro-Par</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">MR-COF: A Genetic MapReduce Configuration Optimization Framework</title>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Theoretical Computer Science</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Panacea: towards holistic optimization of MapReduce applications</title>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ravi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chakradhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kandemir</surname></persName>
		</author>
		<editor>CHO</editor>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">The Performance of SQLon-Hadoop Systems -An Experimental Study</title>
		<author>
			<persName><forename type="first">X</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Congress on Big Data</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">The Family of MapReduce and Large-Scale Data Processing Systems</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sakr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Fayoumi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs</title>
		<author>
			<persName><forename type="first">J</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A Comparative Analysis of State-of-The-Art SQL-on-Hadoop Systems for Interactive Analytics</title>
		<author>
			<persName><forename type="first">A</forename><surname>Tapdiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fabbri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Big Data</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Hive: a Warehousing Solution Over a Map-Reduce Framework</title>
		<author>
			<persName><forename type="first">A</forename><surname>Thusoo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Sarma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Shao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Chakka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Anthony</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wyckoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Murthy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment</title>
				<meeting>the VLDB Endowment</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Automatic Database Management System Tuning Through Large-scale Machine Learning</title>
		<author>
			<persName><forename type="first">D</forename><surname>Van Aken</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pavlo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Gordon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>SIGMOD</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Integer and combinatorial optimization</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Wolsey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">L</forename><surname>Nemhauser</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>John Wiley &amp; Sons</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">MapReduce Workload Modeling with Statistical Approach</title>
		<author>
			<persName><forename type="first">H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Qian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Grid Computing</title>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">Self-Balancing Job Parallelism and Throughput in Hadoop</title>
		<author>
			<persName><forename type="first">B</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Krikava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rouvoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Seinturier</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>DAIS</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
