=Paper= {{Paper |id=None |storemode=property |title=A Bi-objective Optimization Framework for Heterogeneous CPU/GPU Query Plans |pdfUrl=https://ceur-ws.org/Vol-1032/paper-30.pdf |volume=Vol-1032 |dblpUrl=https://dblp.org/rec/conf/csp/PrzymusSK13 }} ==A Bi-objective Optimization Framework for Heterogeneous CPU/GPU Query Plans== https://ceur-ws.org/Vol-1032/paper-30.pdf
     A Bi-objective Optimization Framework for
                    Query Plans

         Piotr Przymus1 , Krzysztof Kaczmarski2 , and Krzysztof Stencel3
             1
              Nicolaus Copernicus University, Poland, eror@mat.umk.pl
    2
        Warsaw University of Technology, Poland, k.kaczmarski@mini.pw.edu.pl
            3
              The University of Warsaw, Poland, stencel@mimuw.edu.pl



        Abstract. Graphics Processing Units (GPU) have significantly more
        applications than just rendering images. They are also used in general-
        purpose computing to solve problems that can benefit from massive par-
        allel processing. However, there are tasks that either hardly suit GPU
        or fit GPU only partially. The latter class is the focus of this paper.
        We elaborate on hybrid CPU/GPU computation and build optimisation
        methods that seek the equilibrium between these two computation plat-
        forms. The method is based on heuristic search for bi-objective Pareto
        optimal execution plans in presence of multiple concurrent queries. The
        underlying model mimics the commodity market where devices are pro-
        ducers and queries are consumers. The value of resources of computing
        devices is controlled by supply-and-demand laws. Our model of the opti-
        mization criteria allows finding solutions of problems not yet addressed
        in heterogeneous query processing. Furthermore, it also offers lower time
        complexity and higher accuracy than other methods.


1   Introduction
General-Purpose computing on Graphics Processing Units (GPGPU) involves
utilization of graphics processing units (GPU) in tasks traditionally handled
by central processing units (CPU). GPUs offer a notable processing power for
streams.
    Execution of database queries is an example of a successful application of
GPGPU. The current research focuses on using the GPU as a co-processor [5].
GPU as co-processor may accelerate numerous database computations, e.g. rela-
tional query processing, query optimization, database compression or supporting
time series databases [5,13,14].
    An application of GPU requires transferring data from the CPU memory
to the graphical device memory. The data transfer is usually time-consuming. It
may diminish the gain of the acceleration credited to GPU. This situation can be
improved by using lightweight compression methods that can significantly reduce
the costs associated with communication [13,14]. However, this does not solve all
the problems. In particular, GPU is optimized for numerical computation. Thus,
only selected operations will benefit from GPU. Small data sets are another
problem. For such sets the data transfer may dominate processing time and
                    A Bi-objective Optimization Framework for Query Plans       343

destroy the performance gain. Therefore, joint processing capabilities of both
CPU and GPU are worth considering. Furthermore, as it is common to have
more than one GPU in a computer, a potential use of various GPU devices
should be considered. This type of query plans is called heterogeneous.
    The previous research efforts focused on the creation of query plans based
on a cost model. This approach finds plans with the best throughput. However,
it does not allow modelling all phenomena that can occur in heterogeneous sys-
tems. Performing a query as soon as possible is not always cost effective [8]. For
this reason, we propose a query processing model based on concepts of markets
that are known to be suitable for describing the interactions in a heterogeneous
world. They have already gained a considerable interest in the context of task
processing in heterogeneous systems [6]. In market models, manufacturers (pro-
cessing devices) compete with each other for customers (query plans). Similar
competition occurs among customers.
    In this paper, we propose a query optimization model based on the commod-
ity market. A query plan is bi-objectively optimized to minimize: the processing
time and the value of consumed resources. For the user, a small difference in
execution time can be negligible. Thus, it is worth optimizing a query, so that
the execution time satisfies the user while other costs are minimized. In this case,
the cost may be, e.g. the responsiveness of the system, power consumption, heat
production, etc. One can also consider expressing the cost in financial terms.


2     Preliminaries

2.1   GPU and Heterogeneous query processing

From the parallel processing’s point of view, CPU accompanied by a GPU co-
processor is a shared nothing architecture. A GPU card has its own memory or
a separate area in the CPU main memory. Thus, the data has to be explicitly
transferred from the CPU main memory to the GPU main memory. Similarly,
the results produced by GPU have to be transferred back to the CPU main
memory. This data transfer often introduces significant overhead. Thus, it is im-
portant to include the transfer cost in the total execution time of an operation.
This cost is also a component of the execution time prediction.
    Contemporary computer systems often include more than one GPU. Then,
it is possible to combine multiple computational units in a single query plan.
Such plans are called heterogeneous query processing. Each device may have
a different communication cost (e.g. PCIe or shared memory) with the CPU
main memory. Furthermore, devices can often communicate directly between
each other. Therefore, the main problem of heterogeneous query processing is
the construction of such a query plan that uses only computational units from
which query performance will benefit most and yet will minimize used resources.
    Bress et. al. [5] identified problems of hybrid (CPU/GPU) query processing
which are also true in heterogeneous query processing:
344     P. Przymus, K. Kaczmarski, K. Stencel

Problem 1 Execution Time Prediction - as multiple database operations may
   be executed concurrently it is hard to predict influence of concurrent tasks
   on execution times.
Problem 2 Critical Query - since the GPU memory, the concurrent GPU ker-
   nels execution and the PCIe bus bandwidth are all limited, only the critical
   queries should be selected to use GPU (i.e., queries that benefit from GPU
   usage and are important from global perspective).
Problem 3 Optimization Impact - as concurrent heterogeneous queries will in-
   fluence each other, it is important to consider this aspect in the planning
   process.

2.2   Commodity market approach in query processing context
In this paper we address these problems by showing that they may be solved by
applying a supply-and-demand pricing model taken from a commodity market.
In such a market resource owners (processing devices) price their assets and
charge their customers (queries) for consumed resources. Other pricing models
may also be used [6].
    In the supply-and-demand model when supply (available resources) or de-
mand (needed resources) changes, the prices will be changed until an equilibrium
between supply and demand is found. Typically the value of a resource is influ-
enced by: its strength, physical cost, service overhead, demand and preferences
[6]. A consumer may be charged for various resources like CPU cycles, memory
used, the bus usage or the network usage. Typically, a broker mediates between
the resource owners and the consumer. The resource owners announce their val-
uation and the resource quality information (e.g. estimated time) in response
to the broker’s enquiry. Then, the broker selects resources that meet the con-
sumer utility function and objectives, like cost and estimated time constraints
or minimization of one of the objectives.

2.3   Bi-objective optimization
Bi-objective optimization is a problem where optimal decisions need to be taken
in the presence of trade-offs between two conflicting objectives. It is a special
case of multiple criteria decision making. Typically there are no solutions that
meets all objectives. Thus, a definition of an optimum solution set should be
established. In this paper we use the predominant Pareto optimality [11]. Given
a set of choices and a way of valuing them, the Pareto set consists of choices that
are Pareto efficient. A set of choices is said to be Pareto efficient if we cannot find
a reallocation of those choices such that the value of a single choice is improved
without worsening values of others choices. As bi-objective query optimization
is NP-hard, we need an approximate solution [12].

3     Heterogeneous Query Planer
The main aim of the new query planner is to propose a solution to the prob-
lems listed in Section 2.1, i.e., Execution Time Prediction, Critical Query and
                            A Bi-objective Optimization Framework for Query Plans                     345

Optimization Impact. Furthermore, this planner also addresses heterogeneous
GPU cards and distributed processing. In this paper we propose a method to
build heterogeneous query plans based on the economics of commodity markets.
It is characterized by the fact that the resource producers determine the cost
of their resources and resource consumers jostle for resources. Furthermore, the
resources owners provide information on the quality of their resources, i.e., the
estimated processing time.


3.1     Notation

Table 1 contains a summary of the notation used in this paper. Assume a set of
units U , a logical query sequence QSlog and a dataset D. The goal is to build a
heterogeneous query sequence. Let QShet be a heterogeneous query sequence de-
fined in Equation (1). Let Dk+1 be the data returned by an operation Aouki (Dk ).
                                                                          k
The first row of QShet is created by replacing each operation oi ∈ QSlog with
an algorithm Aouij ∈ APoi . The second row is created by inserting an operation
Muk ,uk0 (D) that copies the output of an algorithm on the unit u to the input of
the current algorithm on the unit u0 .


Symbol                                    Description
U = {u1 , u2 , . . . un }                 set of computational units available to process data
D                                         dataset
Muk ,uk0 (D)                              if uk 6= uk0 move D from uk to uk0 else pass
oi ∈ O                                    database operation oi from set of operations O
Aouik                                     algorithm that computes the operation oi on uk
APoi = {Aoui1 , Aoui2 , . . . , Aouin }   algorithm pool for the operation oi
trun (Aouij , D)                          estimated run time of the algorithm Aouij on the data D
tcopy (Mui ,uj , D))                      estimated copy time of the data D from ui to uj
crun (Aouij , D)                          estimated run cost of the algorithm Aouij on the data D
ccopy (Mui ,uj , D)                       estimated copy cost of the data D from ui to uj
QSlog = o1 o2 . . . on                    logical query sequence
QShet                                     heterogeneous query sequence see Eq. 1
ft , g t                                  estimated algorithm run time and copy time see Sec. 3.2
fc , g c                                  estimated algorithm run cost and copy cost see Sec. 3.3
fb , g b                                  estimated algorithm run and copy bi-objective scalar-
                                          ization see Sec. 3.4
Fx (QShet )                               sum of fx and gx over columns of QShet where x ∈
                                          {t, c, b} see Eq. 2
         Table 1: Symbols used in the definition of our optimisation model




                         Aou1i1 (D1 ),  Aou2i2 (D2 ),    Aou3i3 (D3 ), . . . , Aounin (Dn )
                                                                                                 
        QShet =                                                                                       (1)
                        Mu ,ui1 (D1 ), Mui1 ,ui2 (D2 ), Mui2 ,ui3 (D3 ), . . . , Muin ,u∗ (Dn )
                          ∗
346     P. Przymus, K. Kaczmarski, K. Stencel


                                X                           X
               Fx (QShet ) =            fx (Aou , D) +                  gx (Mu0 ,u00 , D)   (2)
                               Ao
                                u (D)                    Mu0 ,u00 (D)




3.2   Single Objective Heterogeneous Query Planer



 Procedure OptimalSeq(QSlog , u∗ , x)
   Input: QSlog = o1 o2 o3 . . . on - logical query sequence, u∗ - base unit, x ∈ { t -
           time, c - cost, b - bioptimization } - optimization type
   Result: QShybrid
 1 seq_list = [];
 2 for u in U do
 3     Qu = Su (QSlog , u∗ );
 4     QFu = Fx (Qu ) ;                                             /* e.g. Ft (Qu ) */
 5     append (u, Qu , QFu ) to Seq_list;
 6 end
 7 QShybrid = pop minimum Qu (by QFu ) sequence from Seq_list;

 8 for (u, Qu , QFu ) in Seq_list do
 9     A, B, C = DiffSeq (QShybrid , Qu , u, x);
10     val, start, end = M axSubseq(A, B, C);
11     if val > 0 then
12         Qhybrid (start : end) = Qu (start : end) ;            /* subarray subsitution */
13     end
14 end
15 return Qbase




     In this section, we introduce the algorithm that searches for a heterogeneous
query plan, i.e., a plan that operates on more than two devices.
     For simplicity let us assume that x = t, ft (Aou , Di ) = trun (Aou , Di ) and
gt (Mu,u0 , D) = tcopy (Mu,u0 , D). Later in this article we will define functions
fc , gc and fb , gb to fit the model of the commodity market and the bi-objective
optimization. Let Su (QSlog , u∗ ) return such QShet that each operation oi ∈
QSlog is replaced with an algorithm from unit u algorithm pool, i.e., Aoui ∈
APoi and the base device is set to u∗ . Note that there is a specially designated
computing unit u∗ from which the processing starts. It also collects the data in
the end of processing, since the GPU computing is controlled by a CPU side
program.
     The algorithm OptimalSeq starts by creating a query sequence for each com-
puting unit and estimating the processing cost for each item of this sequences
(lines 2-6). Next, one sequence (which minimizes Fx (Qu )) is selected as the base
sequence. It will be improved in later steps (line 7). Then, the algorithm iterates
over remaining query sequences in QShybrid in order to find such segments in
the remaining query sequences which improve original sequence (by replacing
                     A Bi-objective Optimization Framework for Query Plans            347


 Procedure DiffSeq(seqbase , sequ , u, x)
   Input: seqbase - base sequence, sequ - unit query sequence, u - sequ unit, x ∈ {
            t - time, c - cost, b - bioptimization } - optimization type
   Result: A - operations improvement array; B, C - copy to/from unit arrays
 1 A, B, C = [], [], [];
 2 for i in enumerate columns seqbase do
 3     Aoub (Di ), Muf ,ut (Di ) = seqbase [i];
 4     Aou (Di ), Mu,u (Di ) = sequ [i];
 5     append fx (Aoub , Di ) − fx (Aou , Di ) to A ;            /* e.g. ft (Aou , D) */
 6     append gx (Muf ,u , Di ) to B ;                        /* e.g. gt (Mu,u0 , D) */
 7     append gx (Mu,ut , Di ) to C;
 8 end
 9 return A, B, C




corresponding segment of the original sequence). This is done by calculating
the improvement and copy cost arrays in DiffSeq and finding maximal sequence
segment in MaxSubseq. A following variant of the proposed algorithm should
also be considered. Suppose that only one query sequence segment may be in-
serted (i.e., choose one sequence segment from remaining k − 1 sequences with
the biggest), this minimizes number of involved computational units and reduces
overall communication costs.
    The procedure DiffSeq simply calculates element wise difference between two
query sequences fx (Aoub , Di ) − fx (Aou , Di ) and copy costs from/to unit. The pro-
cedure MaxSubseq is based on Kadane’s algorithm for maximum subarray prob-
lem [1]. It scans through the improvement array, computing at each position the
maximum subsequence ending at this position. This subsequence is either empty
or consists of one more element than the maximum subsequence ending at the
previous position. Additionally, the copy to and copy from costs are included in
the calculation of the maximum subsequence (B and C arrays). The algorithm
returns the maximum improvement for a subsequence (which may be zero if the
subsequence does not improve the original query), the start and end items of
subsequence.
    The complexity of OptimalSeq is O(k ∗ n) where k is the number of devices
(usually small) and n is the number of operations of the sequence. Su ,Fx , DiffSeq
and MaxSubseq have the complexity O(n).

3.3   Economics in Heterogeneous Environment
To cope with the problems mentioned in Section 2.1, additional criteria are
necessary – in this work an approach based on a simple economic model is
proposed. Each consumer (client) has a query budget that can be used to pay
for the resources used to process queries. Each computational unit is a service
provider (producer) of services available in units algorithm pool APui . Each
service provider establishes its own pricing for execution of any service from
APui . Pricing of the service depends on:
348       P. Przymus, K. Kaczmarski, K. Stencel


 Procedure MaxSubseq(A, B, C)
   Input: A - operations improvement array; B, C - copy to/from unit arrays
   Result: maximum_improvment, start, end
 1 max_ending_here = max_so_far = 0 ;
 2 begin = tbegin = end = 0 ;
 3 for i, x in enumerate(A) do
 4     max_ending_here = max(0, max_ending_here + x) ;
 5     if max_ending_here = 0 then
 6         tbegin = i ;
 7         max_ending_here -= B[i];
 8     end
 9     if max_ending_here - C[i] >= max_so_far then
10         begin = tbegin ;
11         end = i ;
12         max_so_far = max_ending_here ;
13     end
14 end
15 return max_so_far - C[end], begin, end




 – the estimation of needed resources (the size of the data D, the performance
   of the task Aouii ),
 – pricing of needed resources (the load of device ui – the greater the load on
   the device, the higher cost of using the device),
 – the preference of the device (e.g. device may prefer larger jobs and/or tasks
   that give a greater acceleration on the GPU).

    First, pricing for using the resources of computational unit is established. This
depends on the previous load of the device: the higher demand for computational
unit, the higher price for using it. This is a periodic process which calculates
prices every ∆tup seconds by calculating computational unit price Pu . Let 0 <
Lcurr < 1, 0 < Lprev < 1 be current and previous computational unit load
factors. Additionally, let Lth be a threshold below which prices should decrease,
and Pmin be the minimal price. Then the price is calculated using the following
formula4 :
              (
                                         ∆Pu
                  max(Pmin , Pu · (1 + (1−∆U )
                                               )   if (∆P > 0 ∧ ∆U > 0) ∨ (∆P < 0),
      Pu :=                                                                           (3)
               Pu                                  otherwise,

where ∆Pu = Lcurrent − Lthreshold and ∆Uu = Lcurrent − LP revious . This is
similar to the dynamic pricing model proposed in [16] with exception to the
                                                   ∆Pu
pricing formula i.e., we use max(Pmin , Pu ·(1+ (1−∆U   ) ) instead of max(Pmin , Pu ·
(1 + ∆Pu )), this modification reduces the excessive growth of prices.
    To reflect the preference of the device in price we need to define a function re-
turning speedup factor between base device u∗ (defined in the previous section)
   4
       Slightly abusing notation we will also denote the new price by Pu .
                     A Bi-objective Optimization Framework for Query Plans            349

and current device: speedup(Aou , Di ) = trun (Aou∗ , Di )/trun (Aou , Di ). Then we de-
                                                   #Di                           #Di
fine a cost function as crun (Aou , Di ) = speedup(A   o ,D ) · Pu , where speedup(Ao ,D )
                                                            i                           i
                                                       u                             u
part combines the estimation of needed resources and the preference of the de-
vice. A computational unit with high speedup on given operation will get a
discount per data size when pricing this operation. Similarly, operations with a
lower speedup factor will be charged more per quantity. Additionally it is ob-
served [13] that often speedup depends on the size of processed data (usually
low speed-up on small datasets) so discount depends on data size.
    It is also important to include cost of data transfer, let us define it as
                       (
                         0                                        if u,u’ share memory
  ccopy (Mu,u0 , D) :=          #Di
                         bandwidth(#Di ,u,u0 ) · (Pu + Pu )/2 otherwise
                                                           0



where bandwidth returns estimated bytes per second between u and u0 compu-
tational units. If direct data transfer is not available between u and u0 devices,
then transit device will be used (e.q. two GPU cards without direct memory
access will communicate using CPU RAM).
    Now let fc (Aou , Di ) = crun (Aou , Di ) and gc (Mu,u0 , D) = ccopy (Mu,u0 , D). A
solution minimizing the cost may be found under the previous assumptions and
using procedure OptimalSeq.

3.4   Bi-objective Heterogeneous Query Planer
As finding Pareto optimal bi-objective query plan is NP-hard (bi-objective short-
est path problem) [12], we will use previously described OptimalSeq single ob-
jective approximation algorithm and extend it to bi-objective case.
    We will use a priori articulation of preference approach which is often applied
to multi-objective optimization problems. It may be realized as the scalarization
of objectives, i.e., all objective functions are combined to form a single function.
In this work we will use weighted product method, where weights express user
preference [11]. Let us define:
                  fb (Aou , D) = crun (Aou , D)wc · trun (Aou , D)wt ,
               gb (Mu,u0 , D) = ccopy (Mu,u0 , D)wc · tcopy (Mu,u0 , D)wt .
where wt and wc are weights which reflect how important cost and time is (the
bigger the weight the more important the feature – values of fb , gb are higher
than 1). It is worth to mention that a special case with wt = wc = 1 (i.e., without
any preferences) is equivalent to Nash arbitration method (or objective product
method) [11].

4     Preliminary Experimental Results
4.1   Simulation settings
In order to evaluate this model we prepared a proof of concept and evaluated it us-
ing custom developed simulation environment. Simulation environment was de-
350    P. Przymus, K. Kaczmarski, K. Stencel

  Device o1 o2 o3 o4 o5 o6
                                       Option       CPU1 CPU2 GPU1 GPU2
  GPU1 20 11 6 0.38 14 15
                                   Unit threshold    0.75 0.75     0.4  0.4
  GPU2 5 11 6 0.33 4.66 5
                                  Unit minimal price 5       5     70   70
  CPU2 1 1 1.09 1 1.27 1.36
                                        (b) Pricing model configuration
 (a) Average speedup of operation
 oi on given device compared to
 CPU1

                Table 2: Simulation environment configuration



veloped using Python and SimPy framework. All presented experiments are de-
rived from simulation. There where four devices defined in environment: CPU1,
CPU2, GPU1, GPU2. Data transfer bandwidth between CPU* ↔ GPU* was
measured on real system, bandwidth of GPU1 ↔ GPU2 was calculated using
CPU1 as transit device. Following weights in bi-objective scalarization were used
wt = wc = 1 (i.e., without any preferences setting). Other settings of the sim-
ulation environment are gathered in Tables 2b and 2a. Simulation environment
generates new query sequences, when spawn event occurs. Spawn event is gener-
ated randomly in a fixed interval and generates randomly set of query sequences
(with fixed maximum). Every query sequence consists of maximally six oper-
ations and operates on random data volume. In the simulation the processed
data size has a direct (linear) influence on processing speed. Each device has
got a limited number of resources; a database operation can be performed only
if needed resources are available. In other cases the operation is waiting. After
generating desired number of query sequences the simulation stops spawning of
new tasks and waits until all generated query sequences are processed.


4.2   Simulation Results

Figure 1a presents simulated execution time of three scheduling frameworks
processing a pool of generated sequences of queries. To each generated query
sequence an optimization criterion (time, cost or bi-optimization) was assigned
with equal probability 1/3. Optimization criteria are only used if query schedul-
ing framework supports it, otherwise default criteria is used. All scheduling
frameworks process exactly the same pool of generated query sequences.
     Compared frameworks are based on OptimalSeq algorithm but use different
objective function. Time objective planner uses only ft and gt functions as opti-
mization criteria; this means that it has no idea on load of each of devices. Self
Tuning Planner is based on idea presented in Breß et. al. [3], i.e. it maintains a
list of observed execution times on data D for each algorithm Aouij . Observations
are interpolated (using e.q. cubic splines) to form new estimated execution time
function. And finally Bi-objective planner is a proof of concept implementation
of the model described in this work.
                     A Bi-objective Optimization Framework for Query Plans       351




(a) Simulated efficiency of Bi-objective       (b) Efficiency of Bi-objective Heteroge-
Heterogeneous Query Planer, Time based         neous Query Planer for various optimiza-
Query Planer and Self-Tuning Query Plan-       tion tasks
ner




(c) Simulated load of devices (0 < load < 1)             (d) Pricing of device

        Fig. 1: Simulation results. Note that time is in simulation ticks.


    As expected Time Objective Planner is the slowest one since it has no knowl-
edge on the current load of devices. A solution suggested in [3] performs better.
However, there are two problems with this approach: first it adds an additional
overhead due to the interpolation of the observed execution times [3]; Secondly
as may be observed in 1a it takes some time before it adapts to a new situa-
tion (in early stage it performs similarly to the Time Objective Planner). This
is due the fact that it does not immediately react to load change of the device.
Instead, it has to gather enough observations before adapting. The best perfor-
mance is gained when using Bi-objective planner, this is due to the three types
of optimization and the cost model which assures proper load balancing.
    As our framework support different types of optimization in the Figure 1b,
we present an impact of optimization type on processing performance. As it may
be observed, time optimization is the most appropriate for query processing with
high priority or with execution time constraint (like interactive queries or ad hoc
data mining). Cost optimization is appropriate for operations with low priority
or without time constraint (like batch processing or periodic jobs). Optimization
352    P. Przymus, K. Kaczmarski, K. Stencel

of both cost and time (without preferences) leads to moderate processing speed
but with better load balancing which is discussed later.
    As proposed economic model is an important part of presented framework,
in Figure 1 interaction between device load 1c and established device pricing 1d
is illustrated. Notice how increased load influences unit pricing according to
the formula 3. It is worth noting that pricing model may be tuned for specific
applications (see Table 2b for this simulation settings).


4.3   Discussion

In Section 2.1 we cite three challenges of Hybrid Query Processing initially pre-
sented in Breß et.al. [3]. As our bi-objective optimization framework was designed
in order to address this challenges, an evaluation in the context of the former
mentioned problems is needed. We address Critical Query problem by allowing
different optimization targets for queries. Choosing time optimisation allows to a
priori articulate importance of a query. Also, the bi-objective optimization tends
to promote queries which may gain more on particular devices (due to the cost-
delay trade-off and the fact that the cost objective is designed to promote tasks
with greater speed-up 3.3). The problem of Execution Time Prediction is ad-
dressed indirectly with bi-objective optimisation. This is because the bi-objective
optimisation combines the cost objective function, which uses a current device
load when pricing a device, with the execution time objective. So in most cases
it is preferred to optimize both cost and time (without preferences towards any)
through time/cost trade-off. Lastly different types of optimization apply also to
Optimization Impact challenge. Choosing optimization criteria specifies a pos-
sible impact on other queries. Although, the preliminary results are promising
and seem to confirm this, an extended evaluation is needed in future.


5     Related Work

Multiobjective query optimization was considered i.a. in Stonebraker et.al. [17]
where a wide-area distributed database system (called Mariposa) was presented.
An economic auction model was used as cost model. To process a query a user
supplied a cost-delay trade-off curve. Because defining this kind of input data was
problematic Papadimitriou et.al. proposed a new approach where an algorithm
for finding -Pareto optimal solutions was presented. The solution was that a
user would manually choose one of presented solutions. This work differs both
in an optimisation method and an economic model involved.
    In our framework a user supplies an optimization objective for a query a
priori (time, cost or bi-objective). Also as our model addresses the optimisation
of co-processing interaction a simpler commodity market model could be used
instead of a bidding model.
    An extended overview on utilization of a GPU as a coprocessor in database
operations may be found in [3]. Breß et. al. [3] proposed a framework for op-
timisation of hybrid CPU/GPU query plans and present two algorithms for
                    A Bi-objective Optimization Framework for Query Plans        353

constructing hybrid query sequences. The first algorithm selected the fastest al-
gorithm for every element of a query sequence (including the cost of transfer
between devices) with complexity O(n). Unfortunately, this algorithm had two
flaws [3]: the constructed plan could generate too frequent data transfers be-
tween devices, which may significantly affect the performance of data processing
and also an optimal plan was not always generated. To overcome those prob-
lems they proposed the second algorithm. It searched for a continuous segment
of operations on GPU that could improve the base CPU sequence. In order to
find an optimal solution this algorithm generates all possible GPU sequences. Its
complexity is obviously higher: O(n2 ). Our work extends this approach by allow-
ing possible many various co-processing devices (Heterogeneous Query Planer in
Section 3.1). Secondly our work incorporates commodity market model as well
as bi-objective optimisation for better performance overcoming problems men-
tioned in 2.1. Additionally, the algorithm OptimalSeq presented in our work may
be used to produce a similar solution as the second algorithm by Breß et.al.[3]
but with better complexity (in case of two devices O(n)).
    It is worth to mention two surveys: the first one describing economic models
in grid computing [6] and the second one describing methods for multi-objective
optimisation [11].


6   Conclusions and Future Work

In this paper, we proposed a bi-objective optimization framework for heteroge-
neous query plans. We also presented an algorithm for creating query sequences
in a heterogeneous environment with a single objective. This algorithm may be
used to construct query sequences similar to [3] but with better complexity. For
the purposes of this bi-objective optimization we designed a model including
time and cost objectives function. The cost objective function and pricing model
is build on foundations of commodity market economic model.
    The preliminary experiments are very promising. We achieved good load
balancing of the simulated devices combined with better optimization results.
    In future work, an extended evaluation of the presented framework is needed,
including; examination of parameters’ influence on the model behaviour, careful
assessment against Hybrid Query challenges. Another interesting field is exten-
sion of this model beyond CPU/GPU co-processing. Finally, the framework will
be evaluated in a prototype time-series database [14,13].


References

 1. J. Bentley. Programming pearls: algorithm design techniques. Commun. ACM,
    27(9):865–873, Sept. 1984.
 2. S. Breß, F. Beier, H. Rauhe, E. Schallehn, K.-U. Sattler, and G. Saake. Auto-
    matic selection of processing units for coprocessing in databases. In Advances in
    Databases and Information Systems, pages 57–70. Springer, 2012.
354     P. Przymus, K. Kaczmarski, K. Stencel

 3. S. Breß, I. Geist, E. Schallehn, M. Mory, and G. Saake. A framework for cost
    based optimization of hybrid cpu/gpu query plans in database systems. Control
    and Cybernetics, pages 27–35, 2013.
 4. S. Breß, S. Mohammad, and E. Schallehn. Self-tuning distribution of db-operations
    on hybrid cpu/gpu platforms. Grundlagen von Datenbanken, CEUR-WS, pages
    89–94, 2012.
 5. S. Breß, E. Schallehn, and I. Geist. Towards optimization of hybrid cpu/gpu query
    plans in database systems. In New Trends in Databases and Information Systems,
    pages 27–35. Springer, 2013.
 6. R. Buyya, D. Abramson, J. Giddy, and H. Stockinger. Economic models for re-
    source management and scheduling in grid computing. Concurrency and compu-
    tation: practice and experience, 14(13-15):1507–1542, 2002.
 7. W. Fang, B. He, and Q. Luo. Database compression on graphics processors. Pro-
    ceedings of the VLDB Endowment, 3(1-2):670–680, 2010.
 8. D. Florescu and D. Kossmann. Rethinking cost and performance of database
    systems. ACM Sigmod Record, 38(1):43–48, 2009.
 9. M. J. Franklin, B. T. Jónsson, and D. Kossmann. Performance tradeoffs for client-
    server query processing. In ACM SIGMOD Record, volume 25, pages 149–160.
    ACM, 1996.
10. D. Kossmann. The state of the art in distributed query processing. ACM Com-
    puting Surveys (CSUR), 32(4):422–469, 2000.
11. R. T. Marler and J. S. Arora. Survey of multi-objective optimization methods for
    engineering. Structural and multidisciplinary optimization, 26(6):369–395, 2004.
12. C. H. Papadimitriou and M. Yannakakis. Multiobjective query optimization.
    In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on
    Principles of database systems, pages 52–59. ACM, 2001.
13. P. Przymus and K. Kaczmarski. Dynamic compression strategy for time series
    database using gpu. In New Trends in Databases and Information Systems. 17th
    East-European Conference on Advances in Databases and Information Systems
    September 1-4, 2013 - Genoa, Italy, 2013.
14. P. Przymus and K. Kaczmarski. Time series queries processing with gpu support.
    In New Trends in Databases and Information Systems. 17th East-European Con-
    ference on Advances in Databases and Information Systems September 1-4, 2013 -
    Genoa, Italy, 2013.
15. A. Raith and M. Ehrgott. A comparison of solution strategies for biobjective
    shortest path problems. Computers & Operations Research, 36(4):1299–1331, 2009.
16. O. O. Sonmez and A. Gursoy. Comparison of pricing policies for a computa-
    tional grid market. In Parallel Processing and Applied Mathematics, pages 766–773.
    Springer, 2006.
17. M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and
    A. Yu. Mariposa: a wide-area distributed database system. The VLDB Journal,
    5(1):48–63, 1996.
18. T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation
    and performance of compressed databases. ACM SIGMOD Record, 29(3):55–67,
    2000.