AlphaJoin: Join Order Selection à la AlphaGo
                                                                     Ji Zhang†‡
                                        Supervised by Ke Zhou† and Sebastian Schelter‡
                         †
                             Hua Zhong University of Science and Technology, ‡ University of Amsterdam
                                                            jizhang@hust.edu.cn

ABSTRACT                                                                     by deep reinforcement learning to learn from previously e
Query optimization remains a difficult problem, and existing                 executed plans. They provide preliminary results that in-
database management systems (DBMSs) often miss good                          dicate that their approach outperforms PostgreSQL’s join
execution plans. Identifying an efficient join order is key to               enumeration process in terms of effectiveness and efficiency.
achieving good performance in database systems. A primary                    Unfortunately, ReJOIN and the traditional heuristics meth-
challenge in join order selection is enumerating a set of can-               ods all assume a cost-based approach (search a subspace of
didate orderings and identifying the most effective ordering.                all possible join orderings and select the ”cheapest” order
Searching in larger candidate spaces increases the potential                 according to the cost model based on statistical informa-
of finding well-working plans, but also increases the cost of                tion) to join order selection optimization. These approaches
query optimization.                                                          still require a human-designed cost model and might be not
   Inspired by the success of AlphaGo for the game of Go. In                 accurate when the data in the database changes dynami-
this Ph.D. work, we propose an optimization approach re-                     cally [8]. In other words, although PostgreSQL’s execution
ferred to as AlphaJoin, which applies AlphaGo’s techniques,                  engine chooses the join order with the lowest cost, its actual
namely Monte Carlo Tree Search (MCTS), to the join order                     execution time may not be the lowest. This illustrates that
selection problem. Preliminary results indicate that our ap-                 the cost model of PostgreSQL might not really reflect the
proach consistently outperforms a state-of-the-art method                    execution time of the query plan [15].
and the PostgreSQL’s optimizer on its own respective exe-                       Marcus et al. [8] presented a learning optimizer called NEO
cution engine. Our approach is open-sourced and publicly                     that generates highly efficient query execution plans using
available on Github1 .                                                       deep neural networks, based on the actual execution time
                                                                             instead of a cost-based model, which achieves similar or im-
                                                                             proved performance compared to state-of-the-art commer-
1.     INTRODUCTION                                                          cial optimizers on their respective query execution engines.
   Database tuning is vital for optimizing the performance                   However, these methods (traditional database execution en-
of a database management system (DBMS) [1, 2, 3, 4, 5].                      gines, ReJOIN and NEO) are based on some simple search
Query optimization is one of the most well-studied issues                    strategies which search unevenly (randomly) result in some
in this field. Identifying an efficient join order is key to                 plans that were never tried (evenly a part of the complete
achieving good performance in database systems. A primary                    plan) and have a possibility of falling in a local optimum.
challenge in join order selection is to minimize the num-                       Inspired by the impressive search capability of Monte Carlo
ber of execution plans to enumerate, as well as the runtime                  Tree Search (MCTS [9]), which is at the core of the Al-
of the final chosen plan [6]. Traditional database systems                   phaGo [10] system for playing the game of Go, we explore
employ a variety of heuristical methods (dynamic program-                    the benefits of MCTS for the join order selection. We call
ming, greedy approaches, genetic algorithms and simulated                    our approach AlphaJoin. MCTS is a search method usually
annealing) as a join order selection policy. However, these                  used in games to predict the set of moves that should be
traditional query optimizers rely on internal static informa-                taken to reach a final winning solution with high likelihood.
tion and hence do not learn from historic experiences. Be-                   The main idea is to simulate many possible join orders in one
cause of the lack of feedback, these methods select a query                  tree structure which is efficient to learn for searching from
plan, execute it and then forget this selection; thus they                   an even manner. and to apply MCTS to select the order to
never learn from previous experiences.                                       execute with the highest estimated performance. We hope
   In the face of the recent success of machine learning (ML)                that AlphaJoin inspires many other database researchers to
for various computer science problems, it is very natural to                 experiment with combining query optimizers in new ways.
think about the idea of using ML for join order selection in                    In this Ph.D. work, we make the following contributions:
the optimizer [6, 7, 8]. Marcus et al. [6] proposed a proof-of-
concept join order enumerator named ReJOIN entirely driven                   • To the best of our knowledge, AlphaJoin is the first ap-
1                                                                              proach to use MCTS to learn and generate a highly effi-
    https://github.com/HustAIsGroup/AlphaJoin
                                                                               cient join order in a query optimizer. We design a neu-
                                                                               ral network (Order Value Network, OVN) to predict the
                                                                               query execution time of a given plan, and leverage this
                                                                               network within MCTS to score candidate query plans.
                                                                               (We refer to this as AlphaJoin 1.0)
                                                                             • Based on AlphaJoin 1.0, we design another neural net-
                                                                               work called Adaptive Decision Network (ADN) to choose
                                                                               between our AlphaJoin 1.0 and the PostgreSQL opti-
Proceedings of the VLDB 2020 PhD Workshop, August 31st, 2020. Tokyo,
Japan. Copyright (C) 2020 for this paper by its authors. Copying permitted     mizer for a given query which further improves the op-
for private and academic purposes.                                             timization performance. (We refer to this approach as
Figure 1: The encoding methods in AlphaJoin: SQL-
encoding and Plan-encoding.
  AlphaJoin 2.0)
• Our experimental results demonstrate that AlphaJoin can
  generate efficient join orders with improved performance
  compared to the state-of-the-art optimization tool NEO
  and PostgreSQL’s query optimizer on its own respective
  execution engine.
                                                                 Figure 2: Overview of our proposed AlphaJoin which
2.   APPROACH                                                    includes two encoding methods (SQL-encoding and Plan-
                                                                 encoding), two trained neural networks OVN and ADN,
   In this section, we first describe two encoding methods       and an optimizer which applies MCTS.
(SQL-encoding and Plan-encoding) and then provide an overview
of our proposed approach AlphaJoin. Note that AlphaJoin       join predicate connecting A and B. Lastly, the “1” in the
2.0 is the extended version of AlphaJoin 1.0 to further im-   third row and first column corresponds to the join predicate
prove the optimization performance.                           connecting the result of (D (C E)) and the result of (A B).
                                                              Note that this process of execution plan encoding is similar
2.1 Encodings                                                 to the encoding of moves in AlphaGo.
   AlphaJoin uses two encodings: SQL-Encoding, which en-         2.2   AlphaJoin 1.0
codes information regarding the SQL query, but is indepen-
                                                                   Next, we introduce AlphaJoin 1.0, which consists of two
dent of the query plan, and a plan-encoding, which repre-
                                                                 components: the Order Value Network and MCTS. After-
sents the execution plan.
                                                                 wareds, we show some preliminary results of AlphaJoin 1.0.
SQL-encoding encodes the table and attribute information
contained in the SQL query. Similar to previous work [11],       2.2.1    Order Value Network
the representation of each query consists of two components:        The order value network (OVN) is a deep neural network
the first component encodes the join graph of the query in       to predict the best-possible query execution time degree for
an adjacency matrix. A “1” in the matrix corresponds to          a partial execution plan. The architecture of the OVN is
the join predicate connecting two tables, e.g. in Figure 1,      shown in Figure 2. It consists of an input layer I, three hid-
the “1” in the first row, second column corresponds to the       den layers H (2048, 512, 128-dimensional units) with ReLU
join predicate connecting A and C. The second component          activation, and an output layer O. As the goal of this neu-
is a simple “one-hot encoding” of the attributes involved in     ral network is to estimatewhich query plans are fast or slow,
contained SQL predicates.                                        the data we feed to this network is the plan-encoding of
Plan-encoding In addition to the SQL encoding, we also           the query plan and the output is the result of a multi-label
require a representation of a partial or complete query execu-   classification, where the K = 4 possible labels indicate the
tion plan. There needs to be a consistent one-to-one match       execution time degree (from 0 to 4, the lower the degree is,
between each encoding and the corresponding join order.          the lower the execution time is) of the entered query plan.
In other words, the plan encoding method must be both            Note that we tried other setups, but achieved the best result
encodable and decodable at the same time. However, the           with K = 4. We employ a sof tmax output to produce a
execution plan encoding in the proposed method ReJOIN [6]        proper probability distribution over the execution time de-
uses Huffman coding which is hard to represent as a dense        gree and additionally dropout regularization [12] to prevent
tree and cannot distinguish between the left child and right     overfitting. We train our network to minimize the cross en-
child, driving table and driven table. Inspired by the en-       tropy [13] between the historical query plans and their cor-
coding method for the state of the Go board in AlphaGo,          responding predicted execution time (the standard loss for
we designed a new plan-encoding method that also consists        multiclass classification problems) which is a measure be-
of two components. The only difference to SQL-encoding           tween distributions. It is interesting to study other types of
is that we represent the join order instead of just the join     neural networks to improve the prediction result, and we ex-
graph in the encoding matrix (using different order numbers      plored convolutional networks, but found no significant per-
instead of only “1”). The larger a number in the matrix, the     formance improvements. Our model achieves about 60.9%
higher priority of a join operation for two corresponding ta-    accuracy on Join Order Benchmark (JOB). Although the
bles. For example, in the red dotted frame of Figure 1, the      predictive results are not on par with the results for most
“4” in the third row and fifth column corresponds to the         predictive tasks, in the process of MCTS, a large number of
join predicate connecting C and E at the first of all. Then,     simulations will be performed to select the appropriate join
the “3” in the fourth row and third column corresponds to        order, and these simulations will make up for the accuracy of
the join predicate connecting the result of (C E) and D. The     the network. Note that the value network of AlphaGo only
“2” in the first row and second column corresponds to the        achieves 50% accuracy but still exhibits good performance.
                                                                     Fs is a search factor to control the whole simulation time.
                                                                     The larger Fs is, the more search time MCTS will spend (we
                                                                     set it to 15 and analyze the impact of different values on the
                                                                     optimization performance in Section 2.2.3). The total num-
                                                                     ber of simulations for each SQL query is J1 Sn , where J is
                                                                                                                P
                                                                     the number of joins in the query. This process will continue
                                                                     until all possible join orders are searched or the predefined
                                                                     maximum number of simulations is reached. Each simula-
Figure 3: The correspondence between the four distinct               tion of MCTS can be broken down into four distinct steps:
steps in MCTS and join order selection.                              selection, expansion, simulation and backpropagation. Each
                                                                     of these steps for join order selection is shown in Figure 3
2.2.2    MCTS for AlphaJoin                                          and explained in details below:
                                                                        1. Selection- We apply the UCT algorithm to select the
   We first introduce the UCT algorithm, reward function in
                                                                     join order from the root node to the leaf node. Once a child
MCTS and then discuss the MCTS for join order selection.
                                                                     node which is also a leaf node is encountered during travel,
UCT Algorithm. The “Upper Confidence Bounds applied                  MCTS jumps into the expansion step.
to Trees (UCT)” [14] algorithm is a game tree search algo-              2. Expansion- In this process, a new child node (join
rithm to solve the problem of which tree node should be              order) is added to the tree to the node which was optimally
selected. This algorithm adopts the well-known exploration           reached during the selection process.
& exploitation scheme, which not only gives full search (the            3. Simulation- A simulation is performed by randomly
learned experience) to the ability of the model but also ex-         choosing moves or strategies until all the tables are joined.
plores more tree nodes that were previously never tried, to             4. Backpropagation- The backpropagation process is
reduce the possibility of falling into a local optimum. The          performed from the new node to the root node. During
                                                          Q(vi )
detailed UCT calculation formula is U CT (vi , v) = N      (vi )
                                                                 +   this process, the total number of simulations stored in each
  q
C logN    (v)
              where vi is the current node, v is its parent node,    node is incremented (add ”1”). If the new node’s simula-
      N (vi )
                                                                     tion results in a lower execution time, these nodes are also
Q(vi ) refers to as the number of times to gain an advantage         incremented (add reward ”R”).
at the current node, N (vi ) (N (v)) is the total number the
current nodes (parent nodes) were accessed, and C is a pa-           2.2.3    Preliminary Results for AlphaJoin 1.0
rameter to adjust the sensitivity of exploration. Taking Al-
                                                                        We first explore the performance impact of the search fac-
phaGo as an example, the first part of the formula refers to
                                                                     tor Fs on join order selection by AlphaJoin and then in-
exploitation which determines the probability of “win” after
                                                                     vestigate the performance of AlphaJoin 1.0, PostgreSQL’s
selecting this node. The second part of the formula refers
                                                                     join enumeration process and NEO via the Join Order Bench-
to exploration, and there is a high probability of exploring
                                                                     mark, a set of queries used in previous assessments of query
other untouched nodes (instead of vi ) if N (vi ) is large. Note
                                                                     optimizers [8, 15]. Figure 4(a) shows the impact of differ-
that the path from each child node to the root node in the
                                                                     ent search factors Fs on the optimized execution time and
tree represents a complete join order.
                                                                     search time. As the search factor Fs increases from 5 to
Reward Function. Analogous to AlphaGo, we need to                    25, the optimized execution time by AlphaJoin 1.0 is con-
define the “win” in AlphaJoin after a complete join order            tinuously decreased but the search time of our method is
was searched by MCTS. In the query optimization problem,             increased. This is because a larger Fs causes more simu-
our goal is to generate a query plan that results in the lowest      lations on each node using MCTS. Therefore, a tradeoff is
execution time. Therefore, the lower the execution time,             required to select an appropriate Fs between the optimized
the higher the probability of “win”. We designed a simple            execution time and search time, thus we set Fs to 15.
                                    K−k
reward function which is Rj = K j for the reward Rj of                  Figure 4(b) shows the overall optimized performance of
the evaluated join order j by MCTS. K is the predefined              AlphaJoin 1.0, PostgreSQL’s query optimizer and NEO. AlphaJoin
execution time degree in our OVN (Section 2.2.1); kj (range          1.0 outperforms other candidates even when we include the
from 0 to K) is the predicted execution time degree of the           search time. This experiment demonstrates that our method
tried join order j. The reward achieves the largest value 1 if       using MCTS efficiently selects appropriate join orders.
the execution time degree of the plan was predicted as the
lowest degree 0.                                                     2.3     AlphaJoin 2.0
MCTS for Join Order Selection. The MCTS algorithm                       In order to further improve the optimized performance of
is a decision-making algorithm that applies the Monte Carlo          the AlphaJoin 1.0, it is interesting to compare the execu-
method for tree search. MCTS expands the tree through                tion time of each SQL performed by PostgreSQL’s query
simulations when searching the space until making a deci-            optimizer and AlphaJoin 1.0. Not all the join orders rec-
sion, and feeds the final result of the decision (“win” or not)      ommended by AlphaJoin 1.0 are better than the ones gen-
back to the nodes in the tree for updating. After a large            erated by PostgreSQL’s query optimizer. According to the
number of simulations, the information of each node rep-             statistics, still about 48% (almost half of the cases) of queries
resents the ratio of the number of correct decisions to the          optimized by PostgreSQL’s query optimizer on its own re-
total number of simulations at this node, which indicates            spective execution engine achieve a better performance than
the value of this node. In order to decrease the search space        AlphaJoin 1.0 (note that NEO achieves this in 41% of the
from the root node to the leaf node, we define the number            cases). We attributes this to the uncertainty of the UCT
of simulations required for each node as Sn = NC ∗ Fs where          algorithm in our method. Figure 4(c) shows the total ex-
NC is the number of child nodes under the current node and           ecution time of those queries which achieve a better per-
              (a)                             (b)                           (c)                             (d)
Figure 4: (a) The impact of different search factors Fs on the optimized execution time and search time. (b) The
overall optimized performance AlphaJoin 1.0, PostgreSQL’s query optimizer and NEO. (c) The total execution time of
queries which achieve better performance via AlphaJoin 1.0 (left) and PostgreSQL’s query optimizer (right). (d) The
overall optimized performance AlphaJoin 2.0, AlphaJoin 1.0, PostgreSQL’s query optimizer and NEO.

formance via AlphaJoin 1.0 (left) and PostgreSQL’s query           As a next step, we plan to investigate methods for im-
optimizer (right). An interesting finding is that although       proving the prediction accuracy of two networks OVN and
these two methods are similar in the number of preferred         ADN to further optimize the search efficiency of MCTS.
queries, AlphaJoin 1.0 achieves a greatly improved per-          Besides, our method still performs worse than the optimizer
formance compared to the PostgreSQL’s join enumeration           in PostgreSQL on a set of fast queries, which we need to
process. In other words, AlphaJoin 1.0 performs better for       investigate to further improve the overall performance and
slow queries (queries with large execution time). This is        propose “AlphaJoin 3.0”.
because these slow queries have more space for optimiza-
tion using our method. In contrast, for the optimization of      5.    REFERENCES
fast queries, our method is not suitable. In order to further
                                                                  [1] Benoit Dageville et al. Automatic sql tuning in oracle
improve the optimized performance of AlphaJoin 1.0, we
                                                                      10g. In VLDB, pages 1098–1109, 2004.
introduce AlphaJoin 2.0. Note that we could also explore
other methods to decrease the search time of MCTS, e.g.,          [2] Surajit Chaudhuri et al. Self-tuning database systems:
by parallelizing MCTS [16].                                           A decade of progress. In VLDB, pages 3–14, 2007.
   In order to alleviate the situation we analyzed above, we      [3] Dana Van Aken et al. Automatic dbms tuning through
train another neural network referred to as the Adaptive              large-scale machine learning. In SIGMOD, 2017.
Decision Network (ADN, shown in Figure 2) to choose be-           [4] Ji Zhang et al. An end-to-end automatic cloud
tween our AlphaJoin 1.0 and the PostgreSQL optimizer for              database tuning system using deep reinforcement
a given query. ADN is learned from a labeled dataset of his-          learning. In SIGMOD ’19, page 415–432, 2019.
torical execution times of the optimizer in PostgreSQL and        [5] Guoliang Li et al. Qtune: A query-aware database
our AlphaJoin 1.0. The structure of ADN is similar to                 tuning system with deep reinforcement learning. Proc.
OVN, the only difference are the inputs and outputs. The              VLDB Endow., 12(12):2118–2130, 2019.
input is the SQL-encoding of the query and the output is a        [6] Ryan Marcus et al. Deep reinforcement learning for
binary classification result which indicates which optimizer          join order enumeration. aiDM’18. ACM, 2018.
should be chosen to perform. We refer to this architecture        [7] Ryan Marcus. Towards a hands-free query optimizer
as AlphaJoin 2.0.                                                     through deep learning. In CIDR, pages 1–8, 2019.
                                                                  [8] Ryan Marcus et al. Neo: A learned query optimizer.
3. PRELIMINARY RESULTS                                                Proc. VLDB Endow., 12(11):1705–1718, 2019.
   We use the same benchmark and experimental setup to            [9] Rémi Coulom. Efficient selectivity and backup
evaluate the performance between PostgreSQL’s optimizer,              operators in monte-carlo tree search. In Computers
NEO, AlphaJoin 1.0 and AlphaJoin 2.0. We present pre-                 and Games, volume 4630, pages 72–83. Springer, 2006.
liminary experiments in Figure 4(d) that indicate that AlphaJoin
                                                                 [10] David Silver et al. Mastering the game of go with deep
2.0 can generate better join orders with lower execution
                                                                      neural networks and tree search. Nature, 2016.
time compared to the ones generated by PostgreSQL’s op-
timizer (decreased by 60.1%), the state-of-the-art method        [11] Jennifer Ortiz et al. Learning state representations for
NEO [8] (decreased by 38.7%) and our previously proposed              query  optimization with deep reinforcement learning.
version AlphaJoin 1.0 (decreased by 31.3%). Moreover, we              In DEEM’18. ACM, 2018.
also decrease the ratio of the queries for which PostgreSQL’s    [12] Nitish Srivastava et al. Dropout: A simple way to
optimizer performed better from 48% to 9%.                            prevent neural networks from overfitting. J. Mach.
                                                                      Learn. Res., 15(1):1929–1958, January 2014.
4. CONCLUSION AND NEXT STEPS                                     [13] Ian Goodfellow et al. Deep Learning. MIT Press, 2016.
   In this Ph.D. work, we present AlphaJoin, the first MCTS-     [14] Sylvain Gelly et al. Exploration exploitation in go: Uct
based query optimizer that generates highly efficient join or-        for monte-carlo go. In NIPS Workshop OTEE, 2006.
ders for database optimizer. AlphaJoin iteratively improves      [15] Viktor Leis et al. How good are query optimizers,
its performance through a combination of two neural net-              really? Proc. VLDB Endow., 9(3):204–215, 2015.
works and MCTS. Preliminary results show AlphaJoin con-          [16] Anji Liu. Watch the unobserved: A simple approach
sistently outperforms the state-of-the-art method NEO and             to parallelizing monte carlo tree search, 2018.
the PostgreSQL’s optimizer.