<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Non-Uniform Tuning Method for SQL-on-Hadoop Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edson Ramiro Lucas Filho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renato Silva de Melo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eduardo Cunha de Almeida</string-name>
          <email>eduardog@inf.ufpr.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>MapReduce.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidade Federal do Parana</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A SQL-on-Hadoop query consists of a work ow of MapReduce jobs with a single point of con guration. This means that the developer tunes hundreds of tuning parameters directly in the query source code (or via terminal interface), but the system assumes the same conguration to every running job. The lurking problem is that the system allocates computing resources uniformly to every running job, even if they present di erent consumption needs (after all the tuning setup is the same). In this paper, we demonstrate that such uniform allocation of resources drives the query to ine cient performance. We claim that applying a non-uniform allocation method to de ne a customized tuning setup for each job can outperform the uniform allocation. Ultimately, we demonstrate that searching for speci c tuning setup is an optimization problem with polynomial cost.</p>
      </abstract>
      <kwd-group>
        <kwd>Tuning</kwd>
        <kwd>Self-Tuning</kwd>
        <kwd>SQL-on-Hadoop</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Ever since the proliferation of SQL-on-Hadoop engines [
        <xref ref-type="bibr" rid="ref1 ref14 ref2 ref22 ref23 ref3 ref7 ref8">2,7,3,23,8,14,22,1</xref>
        ] the
number of con guration parameters exposed by such systems, and by the
underneath MapReduce systems [
        <xref ref-type="bibr" rid="ref20 ref9">9,20</xref>
        ], has grown considerably, as presented in
Figure 1. For instance, the SQL-on-Hadoop system Apache Hive has increased
from 96 con guration parameters in its 0.6.0 release up to 989 parameters in the
release 3.0.0. The underlying processing engine Apache Hadoop has increased
from 104 con guration parameters in its 0.12.0 release up to 530 parameters in
the release 3.1.0. The impact on performance of this growing number of tuning
parameters has given rise to a successful line of research on tuning MapReduce
systems [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref15 ref16 ref17 ref18 ref19 ref21 ref24 ref26 ref4 ref5">17,16,21,18,4,13,15,10,5,24,26,11,12,19</xref>
        ].
      </p>
      <p>However, tuning SQL-on-Hadoop queries is more complex than tuning
MapReduce jobs, because queries span a work ow of jobs with a single point of con
guration (i.e., the tuning setup is set in the query source code or via command line).
In practice, developers tune the physical resources to be allocated by the query
and the query processing engine propagates such set of physical resources to all
jobs in the query plan with potential ine cient performance (see Section 4).</p>
      <p>In traditional SQL processing, a query plan indicates the execution ow for
the query operators. SQL-on-Hadoop processing is no di erent, but in the
dis</p>
    </sec>
    <sec id="sec-2">
      <title>Hive</title>
    </sec>
    <sec id="sec-3">
      <title>SparkSQL Pig</title>
    </sec>
    <sec id="sec-4">
      <title>Hadoop Tez Spark 1;000</title>
      <p>trsee 680000
am 400
r 200
a
p 0
#</p>
      <p>0 10 20 30</p>
    </sec>
    <sec id="sec-5">
      <title>SQL-on-Hadoop systems' releases 0 10 20 30</title>
    </sec>
    <sec id="sec-6">
      <title>MapReduce systems' releases</title>
      <p>
        tributed environment of MapReduce systems every job of a query plan
implements a di erent set of query operators with di erent resource consumption
needs [
        <xref ref-type="bibr" rid="ref22 ref6">6,22</xref>
        ]. For instance, while one job requires disk bandwidth due to a
TableScan operator, another job requires memory throughput due to a Sort
operator. The lurking problem of tuning SQL-on-Hadoop with a single point of
con guration: the propagation of the same set of physical resources to jobs with
di erent resource needs drives the query to ine cient performance.
      </p>
      <p>Contribution. In order to avoid such propagation of tuning setup, we
propose a novel resource allocation method that assigns a speci c tuning setup
for each job in the query plan. We formulate the searching for speci c tuning
setup as a combinatorial optimization problem, highlight its special structure,
and show that some special properties preserve the computational e ciency for
a particular input of this problem. We also present the shortcoming of tuning
SQL-on-Hadoop queries with the current MapReduce tuning advisers and how
these queries can achieve better performance when employing our method.</p>
      <p>Structure. Section 2 presents the notations and de nitions required for
presenting the problem. Section 3 presents the problem of tuning SQL-on-Hadoop
queries. Section 4 presents the current tuning allocation method and its impact
on SQL-on-Hadoop queries. Section 5 presents our method. Section 6 concludes.
2</p>
      <sec id="sec-6-1">
        <title>Notation and De nitions</title>
        <p>Let us de ne a query plan as a directed acyclic graph G = (V; E), where the set
of vertices V represents the MapReduce jobs, and the set of edges E denotes
the precedence between two jobs. More precisely, a vertex (job) j 2 V is a tuple
of the form j = (Oj ; Tj ; Cj ) in which Oj is the set of physical query operators
it executes, Tj is the set of associated input tables that are read and processed
by j, and Cj is the set of con gurations used to allocate resources. In this set
each c 2 C represents a con guration exposed by the SQL-on-Hadoop system
that allocates resources to jobs. Each directed edge is an ordered pair of vertices
e = (i; j) 2 E that connects the jobs i to j, when the execution of i directly
precedes j.</p>
        <p>
          Figure 2 illustrates the query plans compiled for TPC-H query 7, in version
0.13.1 and 3.1.0 of Apache Hive to reinforce the growing complexity of
SQL-on(a) Query plan compiled by Hive 0.13.1.
v3
v2
v5
v4
v1
v6
v7
v11
v10
v9
Hadoop queries. Note that the number of MapReduce jobs in each query plan
di ers considerably across software releases due to the development of di erent
optimization techniques and the addition of new query operators. Although our
example sticks to the Apache Hive terminology other SQL-on-Hadoop systems
like Spark [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and Impala [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] also represent the query plans as DAGs.
3
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>The Tuning Allocation Problem</title>
        <p>In SQL-on-Hadoop systems, the task of allocating physical resources through
tuning setups mainly comprises two steps: (1) searching for the most e cient
tuning setup for a job j 2 V , and (2) allocating the found tuning setup to the
same job j. We de ne the most e cient tuning setup as the con guration that
drives the query to decrease response time or minimize resource usage.</p>
        <p>
          Searching for the most e cient tuning setup is a task performed by Tuning
Advisers. The common approach is to pro le the query execution and, then,
to employ heuristics such as Genetic Algorithm [
          <xref ref-type="bibr" rid="ref16 ref17 ref5">16,17,5</xref>
          ], Hill Climbing [
          <xref ref-type="bibr" rid="ref10 ref15">15,10</xref>
          ],
Random Forest [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], Particle Swarm Optimization [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], Exhaustive Search [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ],
Recursive Random Search [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and others [
          <xref ref-type="bibr" rid="ref21 ref27">21,27</xref>
          ]. We de ne the searching for
the most e cient tuning setup as the Adviser Function, as follows:
De nition 1 (Adviser Function). We denote as f (j) the adviser function
f : V 7! C which is responsible for always nding the most e cient tuning
setup Cj for j 2 V , where the set C = fC1; : : : ; Ckg is the collection of possible
con gurations.
        </p>
        <p>However, allocating the found tuning setup remains a problem due to the
single point of con guration of SQL-on-Hadoop queries. We propose to replace
the single point of con guration of queries, exposed to developers, by an
automated tuning system able to allocate speci c tuning setup per job. Thus, we
de ne the task of allocating tuning setups Cj for each job j 2 V in a query
plan G as the problem of assigning the most e cient tuning setup found by the
adviser function f (j).</p>
        <p>In De nition 2, we formulate the tuning allocation as a combinatorial
optimization problem, and use this perspective to develop the reasoning about how to
improve the current allocation of con gurations to jobs for a query plan. For the
problem de nition, suppose that we have a computable function : C V ! R,
where (Ci; j) determines the computational cost of execute a job j with a
tuning setup Ci. We de ne the problem of assigning the most e cient tuning setup,
as follows:
De nition 2 (Tuning Allocation Problem). Given a directed acyclic graph
G = (V; E) representing a query plan, a set C of possible con gurations, and a
cost function : C V ! R. Find an assignment of tuning setups such that the
total cost to process all jobs is minimum, that is, the summation
X X
j2V Ci2C
(Ci; j)
is minimum.</p>
        <p>Next, we present the current approach employed by SQL-on-Hadoop systems
to assign tuning setups to jobs. In Section 5, we present our approach to assign
tuning setups to jobs.
4</p>
      </sec>
      <sec id="sec-6-3">
        <title>Uniform Tuning</title>
        <p>The methodology employed by the current SQL-on-Hadoop engines allocates the
same physical resources to all jobs of a query plan. We name this methodology
as the Uniform Tuning method, and we treat it as a shortcoming for the query
plan. We de ne the Uniform Tuning method as follows:
De nition 3 (Uniform Tuning). The Uniform Tuning is the assignment of
con gurations to jobs such that Cu = Cv for any u; v 2 V and Cu; Cv 2 C.</p>
        <p>Despite the adviser function f (j) always nd the most e cient tuning setup
Cj for every job j 2 V , in the case where only one tuning setup Cj is
replicate to all jobs in V (according to De nition 3), Cj may negatively a ect the
query's performance, as we observe in Figure 3. The uniform tuning problem
becomes more severe because SQL-on-Hadoop systems delegate to developers
the responsibility to chose one of the available tuning setups Cj to be replicated.</p>
        <p>
          The Fig. 3 illustrates that the Uniform Tuning drives the query processing
to ine cient performance in practice through experiments in the Amazon
Elastic MapReduce (EMR) cloud environment. We used Star sh [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] to generate
the tuning setups. We ran the TPC-H benchmark with Scale Factor of 100GB
on Apache Hive (version 0.13.1) and Apache Hadoop (version 0.20.2). We are
attached to these versions because they are the versions supported by Star sh.
.
        </p>
        <p>20
)
%
(
p
d0
u
e
e
p
S
20
20
)
%
(
p
d0
u
e
e
p
S
20
-1 -2 -3 -4 -5 -6
g g g g g g
inn inn inn inn inn inn
uT uT uT uT uT uT
(a) Query 2.
:2786 :2791 :1003 :1378 :1266 7
2
:
1
-1 -2 -3 -4 -5 -6
g g g g g g
inn inn inn inn inn inn
uT uT uT uT uT uT
(e) Query 7.</p>
        <p>0
20
100</p>
        <p>0
100
200
The experiments ran with 6 machines (type c5.2xlarge: 8 of CPU and 16GB of
RAM, 100GB of SSD disks). Each runtime is an average of 3 executions.</p>
        <p>The x-axis represent the available tuning setups, where tuning-j represents
the given Cj generated for a job j. Each Cj was applied uniformly to the query
plan, following De nition 3. The y-axis represents the speedups when the
execution of the query under the tuning-j is compared to the execution of the default
con guration. The default con guration is the con guration bundled with the
SQL-on-Hadoop system and is our baseline. According to De nition 3, each
tuning setup allocates a di erent set of resources uniformly to the query,
consequently, producing a di erent outcome. Note that in all queries the performance
degrades due to the presence of the uniform allocation of resources, as expected.</p>
        <p>We claim that applying a non-uniform allocation method to de ne a
customized tuning setup for each job can outperform the current alternative
(uniform tuning), which consists of choosing arbitrarily a tuning setup C0 and
replicate it to all jobs. Therefore, our contribution is a non-uniform allocation method
that allows SQL-on-Hadoop systems to apply speci c tuning setup per job.</p>
        <sec id="sec-6-3-1">
          <title>Minimize subject to</title>
        </sec>
      </sec>
      <sec id="sec-6-4">
        <title>Non-Uniform Tuning</title>
        <p>
          For a query plan G = (V; E), every job implements a di erent set of SQL
operators, i.e., for every job u; v 2 V we have Ou 6= Ov. Consequently, we assume that
each job presents di erent resource consumption needs [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. For instance, while
a job u requires disk bandwidth due to a TableScan operator, another job v
requires memory throughput due to a Sort operator. Next, the rest of this section
discuss properties and facts about the particular structure of this problem. The
rst fact states that the optimum of the problem presented in De nition 2 can
be found e ciently when using a non-uniform tuning method. Since this study
aims to nd a way to do more e cient queries, it is very important to be sure
that the allocation of optimal tuning can be made in polynomial time.
        </p>
        <p>The following integer linear programming model describes the optimal
solution for the Tuning Allocation Problem. We de ne the binary decision
variable xij 2 f0; 1g to indicate whether a tuning setup Ci 2 C is assigned to
a job j 2 V . Let the coe cient cij = (Ci; j) be the computational cost for
executing a job j with the tuning setup Ci.</p>
        <p>X X
The objective function (1) minimizes the computational cost in processing all
the jobs for a given query plan. Constraints in (2) ensure that exactly one tuning
setup is assigned to a vertex. Reciprocally, the equality in (3) says that only one
vertex j 2 V can choose a tuning setup Ci 2 C. Finally, the constraints in (4)
preserve the decision variable's integrality.</p>
        <p>Claim 1. The Non-Uniform Tuning method can nd the optimal solution for the
Tuning Allocation Problem in polynomial time.</p>
        <p>The program in (1)-(4) describes precisely the model of an well known
combinatorial optimization problem called Assignment Problem. For clarity of
exposition, we present a procedure to transform an instance hG = (V; E); Ci of
the Tuning Allocation Problem into an instance hG0 = (V 0; E0)i of the
original Assignment Problem as follows. Let G0 be a undirected graph such that
we create a new vertex i associated to each set Ci 2 C. Also add a copy of
each j 2 V and de ne U as the set of all i added to G0 in this way we have
V 0 = U [ V with U and V disjoint. Now, we add an edge fi; jg between every
vertex i 2 U and j 2 V . The cost of assigning a tuning setup Ci to job j is cij
and can be interpreted as an weight on the edge fi; jg 2 E0. Notice that all edges
(1)
(2)
(3)
(4)
in E0 go between U and V . This implies that G0 (the input of the Assignment
Problem) is a complete bipartite graph.</p>
        <p>
          Even though the Assignment Problem problem is known to be NP-hard,
an optimal solution can be obtained e ciently (within execution time bounded
by a polynomial function) when the input graph is bipartite [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. Therefore,
the problem can be solved e ciently, because of the total unimodularity of the
constraint matrix of the Assignment Problem.
        </p>
        <p>An important observation about the Tuning Allocation Problem is that
if we drop the constraints (3) of the linear model, the solution remains feasible for
the problem. It means that we can attribute a con guration Ci to more than one
job (otherwise, the uniform tuning method would not be feasible). Therefore, the
natural formulation for this problem is a relaxation for the Assignment
Problem. Actually, the resulting integer linear program can be rewritten as follows:</p>
        <sec id="sec-6-4-1">
          <title>Minimize subject to</title>
          <p>Fortunately, the coe cient matrix does not change, and we still have an
incidence matrix of a bipartite graph. In this way, the matrix of coe cients
of the problem maintains the total unimodularity. Consequently, the Tuning
Allocation Problem also can be solved in polynomial time.</p>
          <p>Claim 2. For a query plan G = (V; E), in worst case, the non-uniform method
is equal to the uniform method, otherwise the non-uniform method performs
better.</p>
          <p>Let the tuning setup C0 be the one assigned to all jobs j 2 V of the query plan
G using the uniform tuning. Now, consider the case with non-uniform tuning in
which for the same query plan G, the most e cient tuning for every job j can
be obtained by calling interactively the function f (j) (see De nition 1). Directly
from de nition of the adviser function f (j), we have the following inequality
(f (j); j)
(C0; j)
(5)
for every job j 2 V . The inequality says that, in the best case for the uniform
tuning, the arbitrary tuning setup C0 can be the most e cient for j, but there
are no guarantees that it will happen. While attributing a tuning setup Ci chosen
by the function f (j), we know that it is always the most suitable tuning setup
for j. Consequently, the total cost required to execute all jobs j 2 V of the
query plan G using the non-uniform method, can be written, as the following
summation:</p>
          <p>X (f (j); j):</p>
          <p>So, from the inequality (5) we can compare the total costs of the two methods,
as follows:</p>
          <p>X (f (j); j)
i.e., the computational cost for executing the query plan G using the non-uniform
assignment method is at most equals to the cost for the uniform method.</p>
          <p>The advantage of using the uniform tuning method is that it is simple and
straightforward, i.e., the SQL-on-Hadoop engine just replicates the chosen
tuning setup. On the other hand, there are no guarantees that the chosen tuning
setup will perform well during the query's execution. In this sense, our proposal
of non-uniform tuning personalizes the tuning setup for each job and the optimal
assignment of tuning setups is guaranteed. In other words, the Tuning
allocation Problem can be solved optimally with the non-uniform tuning method
and Claim 1 shows that it can be done in polynomial time. Furthermore, a
direct consequence of the non-uniform tuning method is the improvement in the
performance of the query plan, as stated in Claim 2.
6</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>Conclusion</title>
        <p>In this paper we presented the shortcoming of MapReduce tuning advisers when
applied to SQL-on-Hadoop systems due to the uniform allocation of physical
resources and its consequences on query's performance. After, we modeled the
Tuning Allocation Problem in order to solve the uniform allocation of
physical resources. We presented our Non-Uniform Tuning method and proved
that it allows assigning optimal tuning setups to SQL-on-Hadoop queries. We
also proved that the optimal set of tuning setups required by a query can be
found in polynomial time. For future work, an interesting direction consists of
combining multiple tuning advisers to optimize one single query, once di erent
tuning advisers focus on di erent query operators with di erent resource usage.
Acknowledgement: This study was nanced in part by the Coordenac~ao de
Aperfeicoamento de Pessoal de N vel Superior - Brasil (CAPES) - Finance Code 001.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abouzeid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bajda-Pawlikowski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silberschatz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rasin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Apache</given-names>
            <surname>Software</surname>
          </string-name>
          <article-title>Foundation: Apache Flink: Stateful Computations over Data Streams</article-title>
          . Apache.
          <string-name>
            <surname>Org</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Armbrust</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xin</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lian</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huai</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bradley</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaftan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franklin</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghodsi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaharia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Spark</surname>
            <given-names>SQL</given-names>
          </string-name>
          :
          <article-title>Relational Data Processing in Spark</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bei</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Zhang, H.,
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , et al.:
          <article-title>RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Con guration</article-title>
          .
          <source>TPDS</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bei</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.:
          <article-title>MEST: A Model-Driven E cient Searching Approach for MapReduce Self-Tuning</article-title>
          . IEEE Access (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Boncz</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erling</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Tpc-h analyzed: Hidden messages and lessons learned from an in uential benchmark</article-title>
          .
          <source>In: TPCTC</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Cheetah: a high performance, custom data warehouse on top of MapReduce</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Costea</given-names>
            <surname>Adrian Ionescu Bogdan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.R.</given-names>
            , Micha Switakowski Cristian Ba^rc, A.,
            <surname>Sompolski Alicja Luszczak Michal Szafra nski Giel de Nijs Peter Boncz</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>VectorH: Taking SQL-on-Hadoop to the next level</article-title>
          .
          <source>In: International Conference on Management of Data - SIGMOD</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <source>MapReduce: Simpli ed Data Processing on Large Clusters. In: OSDI'04</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>JellyFish: Online Performance Tuning with Adaptive Con guration and Elastic Container in Hadoop Yarn</article-title>
          . In: ICPADS (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Herodotou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borisov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Star sh: A Self-tuning System for Big Data Analytics</article-title>
          .
          <source>CIDR</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>The Performance of MapReduce : An In-depth Study</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , G.A.,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Optimizing Hadoop parameter settings with gene expression programming guided PSO</article-title>
          .
          <source>Concurrency and Computation: Practice and Experience</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kornacker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Behm</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bittorf</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , et al.:
          <article-title>Impala: A Modern, Open-Source SQL Engine for Hadoop</article-title>
          .
          <source>CIDR</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>MRONLINE: MapReduce online performance tuning</article-title>
          .
          <source>In: HPDC</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willke</surname>
            ,
            <given-names>T.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalavri</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , et al.:
          <article-title>Gunther: Search-based autotuning of MapReduce</article-title>
          . Euro-Par (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.:
          <string-name>
            <surname>MR-COF: A Genetic MapReduce Conguration Optimization</surname>
          </string-name>
          <article-title>Framework</article-title>
          . In: Theoretical Computer Science (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakradhar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kandemir</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Panacea: towards holistic optimization of MapReduce applications</article-title>
          . In: CHO (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>H.:</surname>
          </string-name>
          <article-title>The Performance of SQLon-Hadoop Systems - An Experimental Study</article-title>
          .
          <source>In: IEEE International Congress on Big Data</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sakr</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fayoumi</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          :
          <article-title>The Family of MapReduce and Large-Scale Data Processing Systems</article-title>
          . ACM Computing Surveys (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs</article-title>
          .
          <source>PVLDB</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Tapdiya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fabbri</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>A Comparative Analysis of State-of-The-Art SQL-onHadoop Systems for Interactive Analytics</article-title>
          .
          <source>In: IEEE International Conference on Big Data</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Thusoo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarma</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakka</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anthony</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Wycko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Murthy</surname>
          </string-name>
          , R.:
          <article-title>Hive: a Warehousing Solution Over a Map-Reduce Framework</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Van Aken</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gordon</surname>
            ,
            <given-names>G.J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Automatic Database Management System Tuning Through Large-scale Machine Learning</article-title>
          .
          <source>In: SIGMOD</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Wolsey</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nemhauser</surname>
            ,
            <given-names>G.L.</given-names>
          </string-name>
          :
          <article-title>Integer and combinatorial optimization</article-title>
          . John Wiley &amp; Sons (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>MapReduce Workload Modeling with Statistical Approach</article-title>
          .
          <source>Journal of Grid Computing</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krikava</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rouvoy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seinturier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Self-Balancing Job Parallelism and Throughput in Hadoop</article-title>
          . In: DAIS (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>