<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Vision of a Decisional Model for Re-optimizing Query Execution Plans Based on Machine Learning Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chenxiao Wang</string-name>
          <email>chenxiao@ou.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Le Gruenwald</string-name>
          <email>ggruenwald@ou.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zachary Arani</string-name>
          <email>myrrhman@ou.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurent d'Orazio</string-name>
          <email>laurent.dorazio@univ-rennes1.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Rennes 1 University</institution>
          ,
          <addr-line>Lannion</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oklahoma</institution>
          ,
          <addr-line>Norman, Oklahoma</addr-line>
          ,
          <country country="US">United States of America</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <issue>1349285</issue>
      <abstract>
        <p>Many of the existing cloud database query optimization algorithms target reducing the monetary cost paid to cloud service providers in addition to query response time. These query optimization algorithms rely on an accurate cost estimation so that the optimal query execution plan (QEP) is selected. The cloud environment is dynamic, meaning the hardware configuration, data usage, and workload allocations are continuously changing. These dynamic changes make an accurate query cost estimation dificult to obtain. Concurrently, the query execution plan must be adjusted automatically to address these changes. In order to optimize the QEP with a more accurate cost estimation, the query needs to be optimized multiple times during execution. On top of this, the most updated estimation should be used for each optimization. However, issues arise when deciding to pause the execution for minimum overhead. In this paper, we present our vision of a method that uses machine learning techniques to predict the best timings for optimization during execution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Many of the existing cloud database query optimization
algorithms target reducing the monetary cost paid to cloud service
providers in addition to query response time. The time and
monetary costs needed to execute a query are estimated based on the
data statistics that the query optimizer has available when the
query optimization is performed. These statistics are often not
accurate, which may result in inaccurate estimates for the time
and monetary costs needed to execute the query [
        <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
        ]. Thus,
the QEP generated before the query is executed may not be the
best one. One approach can be applied to solve this issue.
Adaptively optimizing the QEP during the query execution to employ
more accurate statistics will yield better QEP selection, and thus
will improve query performance [
        <xref ref-type="bibr" rid="ref3 ref6">3, 6</xref>
        ] . QEP is not executed as
whole at one time but is divided and executed part by part. After
completion of one part of the QEP, data statistics are updated so
that the rest part of the QEP is re-optimized adopting the new
data statistics. The new QEP is expected to be changed as it either
contains diferent operators or is scheduled to be executed on
diferent machines. Such changes result in a diferent way of
executing a QEP which brings diferent response time and
monetary cost. However, re-optimizing a QEP costs time overhead
which in turn produces extra monetary cost as well. To avoid
unnecessary re-optimization and decide whether or not a QEP
should be re-optimized is not an easy task. In the work [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the
authors manually set check points between diferent operators of
a QEP. Re-optimizations at these check points are necessary but
still not accurate. The work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed a query optimization
method where the query is re-optimized multiple times during
its execution based on stages. Stages are formed automatically
by the query optimizer and operators that do not rely on the
completion of others are grouped together. Every time one stage
is finished, the query is re-optimized by force. We implemented
this algorithm in our previous work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However, our
experimental studies show that after applying many re-optimizations,
the QEP remains the same compared to the original one. This is
because the stages are not aligned with the best timing to apply
the re-optimization. This wastes time as unnecessary
optimization increases overhead. In this paper, we provide our vision of a
method using machine learning techniques to predict whether a
QEP should be optimized or not during the query execution, so
that the overall overhead of re-optimization is further reduced
as unnecessary re-optimization is avoided more accurately.
      </p>
      <p>The rest of the paper is organized as follows. Section 2
discusses the related works. Section 3 discusses the efects of query
re-optimization. Section 4 presents how to predict query
reoptimization by machine learning. Section 5 discusses the feature
selection and machine learning model selection issues according
to the status of our current work. Section 6 provides conclusion
and future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        There are several works [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8 ref9">4–9</xref>
        ] that study when a query should be
re-optimized. Some of them are interactive, which means human
input is required in the re-optimization process. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] presents a
mid-query re-optimization algorithm. In this work, a few
pointers are put over diferent locations of the QEP and whenever
the query operators before the pointers finish, the query will
be re-optimized. These locations are chosen based on a set of
rules built by the authors. The locations that satisfy these rules
indicate that re-optimization is worthwhile. For example, the
re-optimization will take place before a Join operator. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
introduces an algorithm for re-optimizing the schedule to run diferent
queries. In this work, the algorithm compares the distance
between an initial schedule and the ideal schedule which is defined
by multiple human defined rules. Whenever the distance exceeds
      </p>
      <sec id="sec-2-1">
        <title>Updated Statistics and Constraints</title>
      </sec>
      <sec id="sec-2-2">
        <title>Updated Operator Tree</title>
      </sec>
      <sec id="sec-2-3">
        <title>One Stage of Operators</title>
      </sec>
      <sec id="sec-2-4">
        <title>Execution</title>
      </sec>
      <sec id="sec-2-5">
        <title>Runtime Information Yes uQery Result uQery and User Constraints</title>
      </sec>
      <sec id="sec-2-6">
        <title>Staged Optimized Operator Tree</title>
      </sec>
      <sec id="sec-2-7">
        <title>Estimation</title>
      </sec>
      <sec id="sec-2-8">
        <title>User</title>
      </sec>
      <sec id="sec-2-9">
        <title>Optimizer scheduler</title>
      </sec>
      <sec id="sec-2-10">
        <title>All stages finished? No</title>
      </sec>
      <sec id="sec-2-11">
        <title>Statistic</title>
      </sec>
      <sec id="sec-2-12">
        <title>Updater</title>
        <p>
          some threshold, the schedule is re-optimized. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] also presents
another mid-query re-optimization algorithm where a
statisticcollect operator is introduced to be placed at key points and used
to collect the updated data statistics during the query execution.
These updated statistics are used to re-optimize the QEP itself
and the memory allocation to execute the query. The estimated
execution time after re-optimization is compared to the estimated
execution time before the same QEP is re-optimized. If the
difference in execution time exceeds a manually set threshold, the
re-optimization is conducted. However, as presented in our
previous work [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], taking human input into the decision-making
process greatly increases time overhead and introduces a source
of unreliable accuracy. In [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], instead of using human input,
reinforced learning is used for the optimizer to decide which physical
operators the optimizer should select (the optimizer’s actions)
based on the current data statistics (the optimizer’s states). But
still, inaccurate data statistics will result in sub-optimal selection
of physical operators. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] presents a query re-optimization
algorithm to estimate the cost of the current query by looking at the
similar queries answered in the past. This method uses previous
known column statistics to predict unknown column statistics
by using the joint probability density function. However, in this
work, a matrix inversion is required to calculate the cost of the
unknown column statistics. Applying such an operation online
would cost a lot of time overhead. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] presents an algorithm to
adaptively optimize the QEP on cloud databases. This work
assumes that users are willing to accept incomplete query results
if the cost is under users’ budget. However, the optimization in
this work considers either time cost or monetary cost, not both.
All these existing works, while using query re-optimization to
improve query response time, are not concerned about monetary
costs at the same time or are not designed for cloud databases.
        </p>
        <p>To fill the gaps in the existing works, the approach we envision
in this paper emphasizes on addressing a number of important
issues. First, the approach utilizes query re-optimization not only
for query response time but also monetary costs at the same
time for cloud databases. Second, to achieve a greater accuracy in
terms of the timing for re-optimization to occur and to do so
autonomously, the approach uses a machine learning model trained
by historical queries to predict when to do re-optimization
instead of manually deciding. Third, to reduce the time overhead,
the approach consists of the ofline and online processes. The
ofline process takes the majority of the time to build the
machine learning model, while the online process is only applying
the model to do the prediction, which limits the time overhead.
Fourth, the approach always uses the actual data statistics so that
the optimizer is always able to select the correct QEP.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>QUERY RE-OPTIMIZATIONS</title>
      <p>
        In a traditional DBMS, queries are first converted to multiple
QEPs. Following this, all the QEPs are then evaluated by the query
optimizer to obtain the time costs. In some systems, additional
costs like monetary cost, network bandwidth, and hardware
utilization are also calculated. Finally, the optimizer chooses the
best QEP and sends it to the execution engine. However, unlike
traditional DBMS, we apply mid-query re-optimization (Figure
1). This means a query plan can be optimized for multiple times
during execution. In a traditional DBMS, the QEPs are optimized
by the cost estimation(s) which are evaluated based on data
statistics such as the cardinality of a column, number of rows in a
table, etc. However, such data statistics are often not accurate
when the estimation is evaluated. Thus, the QEP generated may
not be the most eficient. Re-optimizing the QEP during query
execution to employ more accurate statistics will yield better
QEP selection, and thus will improve query performance. In
our previous work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we discover that query re-optimization
will enable the optimizer to select better physical operators to
execute the QEP and select better hardware configurations to
execute the QEP (such as the number of containers and the type
of containers). These optimizations are beneficial for improving
either the overall query execution time or the monetary cost or
both. Figure 2 shows the result of executing the query from our
previous work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In the experiment query, there is a Join of
two subqueries. The data size of each subquery is unknown. We
want to see how the physical operator of this Join will change
depending on the data size of the subquery. So, we purposely
made the data size of the right side of the Join operator small
enough to fit in the cache. As a consequence, the Shufle Join
operator will be changed to the Broadcast Join operator only
after re-optimization. Broadcast Join is executed around 40%
faster than Shufle Join in our environment. The results show
the overall time cost using re-optimization has approximately
20% improvement on average over no re-optimization, while the
monetary costs of the two approaches were close, with only a 4%
diference. This increase of monetary cost is due to the fact that
the more powerful containers being selected are the containers
which charge more hourly.
s
t
n
a
p
i
c
i
t
r
a
p
#
44.39
35
      </p>
      <p>23.825</p>
      <sec id="sec-3-1">
        <title>Response Time (sec) Monetary Cost (x$0.001)</title>
      </sec>
      <sec id="sec-3-2">
        <title>Shufle Join BroadCast Join with Re-Optimization</title>
        <p>Though we find that re-optimization improves the query
response time, the re-optimization itself increases the overhead.
Besides that, not all the re-optimizations are efective for the QEP,
that is, some re-optimizations may yield no changes. These
reoptimizations are unnecessary and increase overhead. In order to
avoid this issue, we envision a method using a machine learning
model built on past data (training data) to predict whether or
not a future re-optimization would be useful. If the prediction
indicates that the re-optimization will have no efect on the QEP,
then the query will not be re-optimized.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Architecture</title>
      <p>Figure 3 shows the architecture of how the training data is
collected. First, we collect the training data by running random
queries on our current system and observing the data statistics.
This way the prediction model can be applied to all queries. If
reoptimization is only for most costly/most representative queries,
then in this first step, the training data should be collected from
running only random but most costly/representative queries.
After a query is submitted, we record the current data statistics in
the system. These current statistics are called Statcurr. Then, the
query is sent to the optimizer and an optimal QEP is generated
by using the optimizer in a traditional DBMS. This QEP includes
the stage information and on which nodes these stages will be
executed. Figure 4 shows an example of a QEP generated by the
query optimizer for the following query:</p>
      <sec id="sec-4-1">
        <title>SELECT D e p a r t m e n t , c o u n t ( Name ) FROM STUDENT GROUP BY D e p a r t m e n t WHERE G r a d e &lt; = 'C ' ;</title>
        <p>
          In Figure 4, TS, SOR, FIL and AGG stand for TableScan, Sort, Filter
and Aggregate operators, respectively. The subscripts distinguish
the same operators that are executed in parallel on diferent
data. After that, Stage 1 is sent to the query execution engine.
During the execution, we update the data statistics using the
method mentioned in the work [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and we call these updated
statistics Statupdate. Since these statistics are collected from the
actual running query, Statupdate is more accurate than Statcurr
which is obtained from the estimation. The diference between
the Statupdate and Statcurr, called Statdif, is then used in the
machine learning model to predict whether the re-optimization
is efective. For example, the current selectivity and the updated
selectivity of column A are 0.5 and 0.1, respectively, then the
diference 0.4 is added as one feature of the training dataset. This
process is applied to all the features. The selected features are
shown in Table 1.
        </p>
        <p>If the re-optimization is predicted to be efective, the QEP is
then re-optimized using the updated data statistics. Following
this, the next stage (Stage 2) is executed based on the new QEP.
The process is then repeated for the rest of the stages. In this
example, Stage 2 is possibly changed. At this point, Stage 2 after
the re-optimization is compared to the Stage 2 before the
reoptimization to observe any potential changes. We define that a
QEP is changed if one of the following three aspects occurs: 1)
changes in the physical operator types; 2) changes in the number
of containers; or 3) changes in the types of containers.</p>
        <p>A change in the physical operator types means that if there
exists any physical operator in the current QEP that is diferent
from the physical operators in the previous QEP, then the QEP
has changed. For example, in our previous experiment, the change
in the physical operator from Shufle Join to Broadcast Join is
defined as a change in the physical operator types. This change
highly influences query execution time. Thus, by detecting such
changes in QEP after a re-optimization, this re-optimization is
probably efective, and the re-optimization will be applied if the
similar situation is encountered.</p>
        <p>A change in the number or types of containers means the total
number of containers used to execute the current QEP is
diferent from that of the previous QEP. Such changes are also called
changes in the degree of parallelism. For example, the TableScan
containers are distributed into four containers before the
reoptimization and use three containers after the re-optimization.
This change highly influences the monetary cost of query
execution. Thus, such re-optimization becomes useful if such changes
are detected. Similarly, a change in the types of containers means
the QEP after the re-optimization is executed on diferent types of
containers, either more powerful ones or weaker ones. Detecting
such changes may influence the monetary cost as well.
4.3</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Feature Selection</title>
      <p>The three changes discussed in Section 4.2 usually occur
whenever the estimated data size has also changed. This is because the
optimizer uses these estimations to decide how to execute the
query and how many containers should be used. Thus, in order to
tell whether the re-optimization is efective, we use data features
that are highly relevant to the changes in data size estimation.</p>
      <p>
        Assume in the current DBMS, there exist the C={C1,C2,...,Cn }
columns in all the tables. Selectivity diferences ( DIFF_SELECTIVITY ),
distinct values (DIFF_NVD), and histograms (DIFF_HISTOGRAM)
of each column are used as the data features in the training data
used for prediction as shown in Table 1, and the binary value
YES/NO is used as the predicted class, where YES means the
reoptimization is predicted to be useful and NO otherwise. Many
works show that the selectivity, number of distinct values and the
histogram influence the data size estimation [
        <xref ref-type="bibr" rid="ref12 ref13 ref6">6, 12, 13</xref>
        ]. Thus, the
diferences in these three features result in changes to data size
estimation of intermediate results. Hence, they become relevant
in deciding the efectiveness of re-optimization
      </p>
      <p>In our preliminary work to test our vision, we have collected
the training data from running 5000 random queries on a DBMS
for 24 hours and collected the data for the above features. Then
a prediction model was trained using these data and applied
to a new query to predict whether or not the re-optimization
should be conducted after each stage of this query is executed.
Figure 5 illustrates how this model is applied during the query
execution. The query is first converted to a QEP and Stage 1 is
submitted for execution. The prediction model is used to check
whether or not the re-optimization should be conducted. Only a
’YES’ prediction will trigger re-optimization. By doing this, the
unnecessary re-optimization discussed previously is eliminated.
5
5.1</p>
    </sec>
    <sec id="sec-6">
      <title>DISCUSSION</title>
    </sec>
    <sec id="sec-7">
      <title>Additional Feature Selection</title>
      <p>Some additional features can also be added to the training model,
although the utility of new features may not be immediately
clear. For instance, we considered adding the total available CPU
usage to the feature list. However, the metric of the total available
CPU appears to not be relevant in predicting re-optimization.
For instance, take the statement "If the total available CPU is
low, then the re-optimization will not be conducted." The answer
to this statement is false as when we collect the training data,
re-optimizations are conducted anyway no matter how low the
available CPU usage is. Re-optimization is only influenced by the
estimation of data size. The CPU usage only determines which
containers should be assigned. Thus, this attribute is not relevant.
5.2</p>
    </sec>
    <sec id="sec-8">
      <title>Machine Learning Model Selection</title>
      <p>We postulate that several machine learning techniques can be
applied to predict whether or not re-optimization should be
conducted. Pre-processing data to reduce the number of features
before processing with a machine learning model is necessary.
As in our model, the number of features is linearly related to the
accumulative number of columns in all tables. If there is a high
number of columns, there will be too many predictors. Principle
Component Analysis (PCA) can be a useful option to find out
what features are important.</p>
      <p>
        Unsupervised learning such as clustering has been used in
query optimization [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In our case, clustering can be used to
determine if re-optimization is useful by identifying a specific
pattern of data statistics for the DBMS. There exist serval popular
clustering algorithms such as K-means and DBSCAN [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which
are used on a situational basis. In our study, we think K-means is
suitable as the value of K is fixed. In order to predict whether or
not re-optimization should be conducted, we can set the K value
to two. Two clusters are formed, one is for the "YES" cases and
another one is for the "NO" cases. Similar data statistics collected
after re-optimizations are measured by the normalized Euclidean
distance function and are grouped together to form these two
clusters. When new data statistics arrive, they are measured to
determine to which cluster they belong. If it belongs to the "YES"
cluster, then a re-optimization is necessary. Besides unsupervised
learning such as the K-means clustering, supervised learning can
also be used to predict whether or not re-optimization would be
necessary. Supervised learning is not without issue, however. For
instance, it is usually hard to get labeled data for training a model.
In our case, labels can be easily obtained by observing the efect
of re-optimization on past QEPs. As there are a fair amount of
supervised learning algorithms, several possible options exist for
prediction. For instance, a binary classifier decision tree can be
examined to classify whether or not re-optimization should be
conducted. Each feature will be represented by a node split into
either "YES" or "NO". Figure 6 shows an example of this partial
decision tree. When the final classification is "YES", then a
reoptimization is necessary. Also, the performance of several other
supervised learning algorithms like Neural Networks, Support
Vector Machines, and Linear Regression were studied in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for
cloud provisioning, but not for query re-optimization. We are
currently investigating which machine learning models are most
suitable for our study. An appropriate machine learning model
should be highly accurate in prediction (i.e. having low error rates
when applied to the training data and test data). The training data
must be selected carefully to avoid the cases of model overfitting,
i.e., the cases where the model provides a low training error
rate, but a high testing error rate. In addition, an appropriate
model should incur a low overhead while being applied online
for re-optimization prediction and should perform efectively in
reducing the overall query response time and monetary cost.
6
      </p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>In this paper, we provide our vision of a model that utilizes
machine learning techniques to study previously observed statistics
data from a running system in order to build a prediction model.
This model is used to predict whether or not a query should be
re-optimized in order to avoid unnecessary re-optimizations. By
doing this, a query’s execution time and/or monetary cost can be
further reduced. In future work, we plan to fully implement the
approach we envisioned and use it to predict additional
behaviors of a DBMS. For example, we would like to study methods
for increasing or decreasing the number of executing
containers based on current data statistics. We believe that predicting
additional useful behaviors will make the query re-optimization
process more eficient. In addition, we will also extend our
approach to predict, independently of query stages, when a query
such query re-optimizations should occur.</p>
    </sec>
    <sec id="sec-10">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work is partially supported by the National Science
Founda</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ahalya and H. M. Pandey</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Data clustering approaches survey and analysis</article-title>
          .
          <source>In 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE)</source>
          .
          <volume>532</volume>
          -
          <fpage>537</fpage>
          . https://doi.org/10. 1109/ABLAZE.
          <year>2015</year>
          .7154919
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Bankole</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Ajila</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Predicting cloud resource provisioning using machine learning techniques</article-title>
          .
          <source>In 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)</source>
          .
          <article-title>1-4</article-title>
          . https://doi.org/10.1109/ CCECE.
          <year>2013</year>
          .6567848
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Bruno</surname>
          </string-name>
          , Sapna Jain, and
          <string-name>
            <given-names>Jingren</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Continuous Cloud-scale Query Optimization and Processing</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>6</volume>
          ,
          <issue>11</issue>
          (Aug.
          <year>2013</year>
          ),
          <fpage>961</fpage>
          -
          <lpage>972</lpage>
          . https://doi.org/10.14778/2536222.2536223
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Navin</given-names>
            <surname>Kabra</surname>
          </string-name>
          and
          <string-name>
            <given-names>David J.</given-names>
            <surname>DeWitt</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Eficient Mid-query Re-optimization of Sub-optimal Query Execution Plans</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>27</volume>
          ,
          <issue>2</issue>
          (
          <year>June 1998</year>
          ),
          <fpage>106</fpage>
          -
          <lpage>117</lpage>
          . https://doi.org/10.1145/276305.276315
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Willis</given-names>
            <surname>Lang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Rimma</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Nehme</surname>
            , and
            <given-names>Ian</given-names>
          </string-name>
          <string-name>
            <surname>Rae</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Database Optimization in the Cloud: Where Costs, Partial Results, and Consumer Choice Meet</article-title>
          .
          <source>In CIDR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Volker</given-names>
            <surname>Markl</surname>
          </string-name>
          , Vijayshankar Raman, David Simmen,
          <string-name>
            <given-names>Guy</given-names>
            <surname>Lohman</surname>
          </string-name>
          , Hamid Pirahesh, and
          <string-name>
            <given-names>Miso</given-names>
            <surname>Cilimdzic</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Robust Query Processing Through Progressive Optimization</article-title>
          .
          <source>In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD '04)</source>
          . ACM, New York, NY, USA,
          <fpage>659</fpage>
          -
          <lpage>670</lpage>
          . https://doi.org/10.1145/1007568.1007642
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>David</given-names>
            <surname>Meignan</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A Heuristic Approach to Schedule Reoptimization in the Context of Interactive Optimization</article-title>
          .
          <source>In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation (GECCO '14)</source>
          . ACM, New York, NY, USA,
          <fpage>461</fpage>
          -
          <lpage>468</lpage>
          . https://doi.org/10.1145/2576768.2598213
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Ortiz</surname>
          </string-name>
          , Magdalena Balazinska, Johannes Gehrke, and
          <string-name>
            <given-names>S. Sathiya</given-names>
            <surname>Keerthi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Learning State Representations for Query Optimization with Deep Reinforcement Learning</article-title>
          .
          <source>In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning (DEEM'18)</source>
          . ACM, New York, NY, USA, Article
          <volume>4</volume>
          , 4 pages. https://doi.org/10.1145/3209889.3209890
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Yongjoo</given-names>
            <surname>Park</surname>
          </string-name>
          , Ahmad Shahab Tajik,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Cafarella</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Barzan</given-names>
            <surname>Mozafari</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Database Learning: Toward a Database That Becomes Smarter Every Time</article-title>
          .
          <source>In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17)</source>
          . ACM, New York, NY, USA,
          <fpage>587</fpage>
          -
          <lpage>602</lpage>
          . https://doi.org/10. 1145/3035918.3064013
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Chenxiao</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Zach Arani,
          <source>Le Gruenwald, and Laurent d'Orazio</source>
          .
          <year>2018</year>
          . Adaptive Time,
          <source>Monetary Cost Aware Query Optimization on Cloud Database Systems</source>
          .
          <volume>3374</volume>
          -
          <fpage>3382</fpage>
          . https://doi.org/10.1109/BigData.
          <year>2018</year>
          .8622401
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Chenxiao</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Jason Arenson, Florian Helf,
          <source>Le Gruenwald, and Laurent d'Orazio</source>
          .
          <year>2017</year>
          .
          <article-title>Improving user interaction in mobile-cloud database query processing</article-title>
          .
          <source>In Workshop on Scalable Cloud Data Management</source>
          . Boston, United States. https://hal.archives-ouvertes.fr/hal-01640072
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Florian</surname>
            <given-names>Wolf</given-names>
          </string-name>
          , Norman May,
          <string-name>
            <given-names>Paul R.</given-names>
            <surname>Willems</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kai-Uwe Sattler</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>On the Calculation of Optimality Ranges for Relational Query Execution Plans</article-title>
          .
          <source>In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18)</source>
          . ACM, New York, NY, USA,
          <fpage>663</fpage>
          -
          <lpage>675</lpage>
          . https://doi.org/10.1145/ 3183713.3183742
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Wentao</surname>
            <given-names>Wu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jefrey F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Harneet</given-names>
            <surname>Singh</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <string-name>
            <surname>Sampling-Based Query</surname>
          </string-name>
          Re-Optimization.
          <source>In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16)</source>
          . ACM, New York, NY, USA,
          <fpage>1721</fpage>
          -
          <lpage>1736</lpage>
          . https://doi.org/10.1145/2882903.2882914
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zahir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El Qadi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Mouline</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Access plan recommendation: A clustering based approach using queries similarity</article-title>
          .
          <source>In 2014 Second World Conference on Complex Systems (WCCS)</source>
          .
          <volume>55</volume>
          -
          <fpage>60</fpage>
          . https://doi.org/10.1109/ICoCS.
          <year>2014</year>
          .7060936
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>