AlphaJoin: Join Order Selection à la AlphaGo Ji Zhang†‡ Supervised by Ke Zhou† and Sebastian Schelter‡ † Hua Zhong University of Science and Technology, ‡ University of Amsterdam jizhang@hust.edu.cn ABSTRACT by deep reinforcement learning to learn from previously e Query optimization remains a difficult problem, and existing executed plans. They provide preliminary results that in- database management systems (DBMSs) often miss good dicate that their approach outperforms PostgreSQL’s join execution plans. Identifying an efficient join order is key to enumeration process in terms of effectiveness and efficiency. achieving good performance in database systems. A primary Unfortunately, ReJOIN and the traditional heuristics meth- challenge in join order selection is enumerating a set of can- ods all assume a cost-based approach (search a subspace of didate orderings and identifying the most effective ordering. all possible join orderings and select the ”cheapest” order Searching in larger candidate spaces increases the potential according to the cost model based on statistical informa- of finding well-working plans, but also increases the cost of tion) to join order selection optimization. These approaches query optimization. still require a human-designed cost model and might be not Inspired by the success of AlphaGo for the game of Go. In accurate when the data in the database changes dynami- this Ph.D. work, we propose an optimization approach re- cally [8]. In other words, although PostgreSQL’s execution ferred to as AlphaJoin, which applies AlphaGo’s techniques, engine chooses the join order with the lowest cost, its actual namely Monte Carlo Tree Search (MCTS), to the join order execution time may not be the lowest. This illustrates that selection problem. Preliminary results indicate that our ap- the cost model of PostgreSQL might not really reflect the proach consistently outperforms a state-of-the-art method execution time of the query plan [15]. and the PostgreSQL’s optimizer on its own respective exe- Marcus et al. [8] presented a learning optimizer called NEO cution engine. Our approach is open-sourced and publicly that generates highly efficient query execution plans using available on Github1 . deep neural networks, based on the actual execution time instead of a cost-based model, which achieves similar or im- proved performance compared to state-of-the-art commer- 1. INTRODUCTION cial optimizers on their respective query execution engines. Database tuning is vital for optimizing the performance However, these methods (traditional database execution en- of a database management system (DBMS) [1, 2, 3, 4, 5]. gines, ReJOIN and NEO) are based on some simple search Query optimization is one of the most well-studied issues strategies which search unevenly (randomly) result in some in this field. Identifying an efficient join order is key to plans that were never tried (evenly a part of the complete achieving good performance in database systems. A primary plan) and have a possibility of falling in a local optimum. challenge in join order selection is to minimize the num- Inspired by the impressive search capability of Monte Carlo ber of execution plans to enumerate, as well as the runtime Tree Search (MCTS [9]), which is at the core of the Al- of the final chosen plan [6]. Traditional database systems phaGo [10] system for playing the game of Go, we explore employ a variety of heuristical methods (dynamic program- the benefits of MCTS for the join order selection. We call ming, greedy approaches, genetic algorithms and simulated our approach AlphaJoin. MCTS is a search method usually annealing) as a join order selection policy. However, these used in games to predict the set of moves that should be traditional query optimizers rely on internal static informa- taken to reach a final winning solution with high likelihood. tion and hence do not learn from historic experiences. Be- The main idea is to simulate many possible join orders in one cause of the lack of feedback, these methods select a query tree structure which is efficient to learn for searching from plan, execute it and then forget this selection; thus they an even manner. and to apply MCTS to select the order to never learn from previous experiences. execute with the highest estimated performance. We hope In the face of the recent success of machine learning (ML) that AlphaJoin inspires many other database researchers to for various computer science problems, it is very natural to experiment with combining query optimizers in new ways. think about the idea of using ML for join order selection in In this Ph.D. work, we make the following contributions: the optimizer [6, 7, 8]. Marcus et al. [6] proposed a proof-of- concept join order enumerator named ReJOIN entirely driven • To the best of our knowledge, AlphaJoin is the first ap- 1 proach to use MCTS to learn and generate a highly effi- https://github.com/HustAIsGroup/AlphaJoin cient join order in a query optimizer. We design a neu- ral network (Order Value Network, OVN) to predict the query execution time of a given plan, and leverage this network within MCTS to score candidate query plans. (We refer to this as AlphaJoin 1.0) • Based on AlphaJoin 1.0, we design another neural net- work called Adaptive Decision Network (ADN) to choose between our AlphaJoin 1.0 and the PostgreSQL opti- Proceedings of the VLDB 2020 PhD Workshop, August 31st, 2020. Tokyo, Japan. Copyright (C) 2020 for this paper by its authors. Copying permitted mizer for a given query which further improves the op- for private and academic purposes. timization performance. (We refer to this approach as Figure 1: The encoding methods in AlphaJoin: SQL- encoding and Plan-encoding. AlphaJoin 2.0) • Our experimental results demonstrate that AlphaJoin can generate efficient join orders with improved performance compared to the state-of-the-art optimization tool NEO and PostgreSQL’s query optimizer on its own respective execution engine. Figure 2: Overview of our proposed AlphaJoin which 2. APPROACH includes two encoding methods (SQL-encoding and Plan- encoding), two trained neural networks OVN and ADN, In this section, we first describe two encoding methods and an optimizer which applies MCTS. (SQL-encoding and Plan-encoding) and then provide an overview of our proposed approach AlphaJoin. Note that AlphaJoin join predicate connecting A and B. Lastly, the “1” in the 2.0 is the extended version of AlphaJoin 1.0 to further im- third row and first column corresponds to the join predicate prove the optimization performance. connecting the result of (D (C E)) and the result of (A B). Note that this process of execution plan encoding is similar 2.1 Encodings to the encoding of moves in AlphaGo. AlphaJoin uses two encodings: SQL-Encoding, which en- 2.2 AlphaJoin 1.0 codes information regarding the SQL query, but is indepen- Next, we introduce AlphaJoin 1.0, which consists of two dent of the query plan, and a plan-encoding, which repre- components: the Order Value Network and MCTS. After- sents the execution plan. wareds, we show some preliminary results of AlphaJoin 1.0. SQL-encoding encodes the table and attribute information contained in the SQL query. Similar to previous work [11], 2.2.1 Order Value Network the representation of each query consists of two components: The order value network (OVN) is a deep neural network the first component encodes the join graph of the query in to predict the best-possible query execution time degree for an adjacency matrix. A “1” in the matrix corresponds to a partial execution plan. The architecture of the OVN is the join predicate connecting two tables, e.g. in Figure 1, shown in Figure 2. It consists of an input layer I, three hid- the “1” in the first row, second column corresponds to the den layers H (2048, 512, 128-dimensional units) with ReLU join predicate connecting A and C. The second component activation, and an output layer O. As the goal of this neu- is a simple “one-hot encoding” of the attributes involved in ral network is to estimatewhich query plans are fast or slow, contained SQL predicates. the data we feed to this network is the plan-encoding of Plan-encoding In addition to the SQL encoding, we also the query plan and the output is the result of a multi-label require a representation of a partial or complete query execu- classification, where the K = 4 possible labels indicate the tion plan. There needs to be a consistent one-to-one match execution time degree (from 0 to 4, the lower the degree is, between each encoding and the corresponding join order. the lower the execution time is) of the entered query plan. In other words, the plan encoding method must be both Note that we tried other setups, but achieved the best result encodable and decodable at the same time. However, the with K = 4. We employ a sof tmax output to produce a execution plan encoding in the proposed method ReJOIN [6] proper probability distribution over the execution time de- uses Huffman coding which is hard to represent as a dense gree and additionally dropout regularization [12] to prevent tree and cannot distinguish between the left child and right overfitting. We train our network to minimize the cross en- child, driving table and driven table. Inspired by the en- tropy [13] between the historical query plans and their cor- coding method for the state of the Go board in AlphaGo, responding predicted execution time (the standard loss for we designed a new plan-encoding method that also consists multiclass classification problems) which is a measure be- of two components. The only difference to SQL-encoding tween distributions. It is interesting to study other types of is that we represent the join order instead of just the join neural networks to improve the prediction result, and we ex- graph in the encoding matrix (using different order numbers plored convolutional networks, but found no significant per- instead of only “1”). The larger a number in the matrix, the formance improvements. Our model achieves about 60.9% higher priority of a join operation for two corresponding ta- accuracy on Join Order Benchmark (JOB). Although the bles. For example, in the red dotted frame of Figure 1, the predictive results are not on par with the results for most “4” in the third row and fifth column corresponds to the predictive tasks, in the process of MCTS, a large number of join predicate connecting C and E at the first of all. Then, simulations will be performed to select the appropriate join the “3” in the fourth row and third column corresponds to order, and these simulations will make up for the accuracy of the join predicate connecting the result of (C E) and D. The the network. Note that the value network of AlphaGo only “2” in the first row and second column corresponds to the achieves 50% accuracy but still exhibits good performance. Fs is a search factor to control the whole simulation time. The larger Fs is, the more search time MCTS will spend (we set it to 15 and analyze the impact of different values on the optimization performance in Section 2.2.3). The total num- ber of simulations for each SQL query is J1 Sn , where J is P the number of joins in the query. This process will continue until all possible join orders are searched or the predefined maximum number of simulations is reached. Each simula- Figure 3: The correspondence between the four distinct tion of MCTS can be broken down into four distinct steps: steps in MCTS and join order selection. selection, expansion, simulation and backpropagation. Each of these steps for join order selection is shown in Figure 3 2.2.2 MCTS for AlphaJoin and explained in details below: 1. Selection- We apply the UCT algorithm to select the We first introduce the UCT algorithm, reward function in join order from the root node to the leaf node. Once a child MCTS and then discuss the MCTS for join order selection. node which is also a leaf node is encountered during travel, UCT Algorithm. The “Upper Confidence Bounds applied MCTS jumps into the expansion step. to Trees (UCT)” [14] algorithm is a game tree search algo- 2. Expansion- In this process, a new child node (join rithm to solve the problem of which tree node should be order) is added to the tree to the node which was optimally selected. This algorithm adopts the well-known exploration reached during the selection process. & exploitation scheme, which not only gives full search (the 3. Simulation- A simulation is performed by randomly learned experience) to the ability of the model but also ex- choosing moves or strategies until all the tables are joined. plores more tree nodes that were previously never tried, to 4. Backpropagation- The backpropagation process is reduce the possibility of falling into a local optimum. The performed from the new node to the root node. During Q(vi ) detailed UCT calculation formula is U CT (vi , v) = N (vi ) + this process, the total number of simulations stored in each q C logN (v) where vi is the current node, v is its parent node, node is incremented (add ”1”). If the new node’s simula- N (vi ) tion results in a lower execution time, these nodes are also Q(vi ) refers to as the number of times to gain an advantage incremented (add reward ”R”). at the current node, N (vi ) (N (v)) is the total number the current nodes (parent nodes) were accessed, and C is a pa- 2.2.3 Preliminary Results for AlphaJoin 1.0 rameter to adjust the sensitivity of exploration. Taking Al- We first explore the performance impact of the search fac- phaGo as an example, the first part of the formula refers to tor Fs on join order selection by AlphaJoin and then in- exploitation which determines the probability of “win” after vestigate the performance of AlphaJoin 1.0, PostgreSQL’s selecting this node. The second part of the formula refers join enumeration process and NEO via the Join Order Bench- to exploration, and there is a high probability of exploring mark, a set of queries used in previous assessments of query other untouched nodes (instead of vi ) if N (vi ) is large. Note optimizers [8, 15]. Figure 4(a) shows the impact of differ- that the path from each child node to the root node in the ent search factors Fs on the optimized execution time and tree represents a complete join order. search time. As the search factor Fs increases from 5 to Reward Function. Analogous to AlphaGo, we need to 25, the optimized execution time by AlphaJoin 1.0 is con- define the “win” in AlphaJoin after a complete join order tinuously decreased but the search time of our method is was searched by MCTS. In the query optimization problem, increased. This is because a larger Fs causes more simu- our goal is to generate a query plan that results in the lowest lations on each node using MCTS. Therefore, a tradeoff is execution time. Therefore, the lower the execution time, required to select an appropriate Fs between the optimized the higher the probability of “win”. We designed a simple execution time and search time, thus we set Fs to 15. K−k reward function which is Rj = K j for the reward Rj of Figure 4(b) shows the overall optimized performance of the evaluated join order j by MCTS. K is the predefined AlphaJoin 1.0, PostgreSQL’s query optimizer and NEO. AlphaJoin execution time degree in our OVN (Section 2.2.1); kj (range 1.0 outperforms other candidates even when we include the from 0 to K) is the predicted execution time degree of the search time. This experiment demonstrates that our method tried join order j. The reward achieves the largest value 1 if using MCTS efficiently selects appropriate join orders. the execution time degree of the plan was predicted as the lowest degree 0. 2.3 AlphaJoin 2.0 MCTS for Join Order Selection. The MCTS algorithm In order to further improve the optimized performance of is a decision-making algorithm that applies the Monte Carlo the AlphaJoin 1.0, it is interesting to compare the execu- method for tree search. MCTS expands the tree through tion time of each SQL performed by PostgreSQL’s query simulations when searching the space until making a deci- optimizer and AlphaJoin 1.0. Not all the join orders rec- sion, and feeds the final result of the decision (“win” or not) ommended by AlphaJoin 1.0 are better than the ones gen- back to the nodes in the tree for updating. After a large erated by PostgreSQL’s query optimizer. According to the number of simulations, the information of each node rep- statistics, still about 48% (almost half of the cases) of queries resents the ratio of the number of correct decisions to the optimized by PostgreSQL’s query optimizer on its own re- total number of simulations at this node, which indicates spective execution engine achieve a better performance than the value of this node. In order to decrease the search space AlphaJoin 1.0 (note that NEO achieves this in 41% of the from the root node to the leaf node, we define the number cases). We attributes this to the uncertainty of the UCT of simulations required for each node as Sn = NC ∗ Fs where algorithm in our method. Figure 4(c) shows the total ex- NC is the number of child nodes under the current node and ecution time of those queries which achieve a better per- (a) (b) (c) (d) Figure 4: (a) The impact of different search factors Fs on the optimized execution time and search time. (b) The overall optimized performance AlphaJoin 1.0, PostgreSQL’s query optimizer and NEO. (c) The total execution time of queries which achieve better performance via AlphaJoin 1.0 (left) and PostgreSQL’s query optimizer (right). (d) The overall optimized performance AlphaJoin 2.0, AlphaJoin 1.0, PostgreSQL’s query optimizer and NEO. formance via AlphaJoin 1.0 (left) and PostgreSQL’s query As a next step, we plan to investigate methods for im- optimizer (right). An interesting finding is that although proving the prediction accuracy of two networks OVN and these two methods are similar in the number of preferred ADN to further optimize the search efficiency of MCTS. queries, AlphaJoin 1.0 achieves a greatly improved per- Besides, our method still performs worse than the optimizer formance compared to the PostgreSQL’s join enumeration in PostgreSQL on a set of fast queries, which we need to process. In other words, AlphaJoin 1.0 performs better for investigate to further improve the overall performance and slow queries (queries with large execution time). This is propose “AlphaJoin 3.0”. because these slow queries have more space for optimiza- tion using our method. In contrast, for the optimization of 5. REFERENCES fast queries, our method is not suitable. In order to further [1] Benoit Dageville et al. Automatic sql tuning in oracle improve the optimized performance of AlphaJoin 1.0, we 10g. In VLDB, pages 1098–1109, 2004. introduce AlphaJoin 2.0. Note that we could also explore other methods to decrease the search time of MCTS, e.g., [2] Surajit Chaudhuri et al. Self-tuning database systems: by parallelizing MCTS [16]. A decade of progress. In VLDB, pages 3–14, 2007. In order to alleviate the situation we analyzed above, we [3] Dana Van Aken et al. Automatic dbms tuning through train another neural network referred to as the Adaptive large-scale machine learning. In SIGMOD, 2017. Decision Network (ADN, shown in Figure 2) to choose be- [4] Ji Zhang et al. An end-to-end automatic cloud tween our AlphaJoin 1.0 and the PostgreSQL optimizer for database tuning system using deep reinforcement a given query. ADN is learned from a labeled dataset of his- learning. In SIGMOD ’19, page 415–432, 2019. torical execution times of the optimizer in PostgreSQL and [5] Guoliang Li et al. Qtune: A query-aware database our AlphaJoin 1.0. The structure of ADN is similar to tuning system with deep reinforcement learning. Proc. OVN, the only difference are the inputs and outputs. The VLDB Endow., 12(12):2118–2130, 2019. input is the SQL-encoding of the query and the output is a [6] Ryan Marcus et al. Deep reinforcement learning for binary classification result which indicates which optimizer join order enumeration. aiDM’18. ACM, 2018. should be chosen to perform. We refer to this architecture [7] Ryan Marcus. Towards a hands-free query optimizer as AlphaJoin 2.0. through deep learning. In CIDR, pages 1–8, 2019. [8] Ryan Marcus et al. Neo: A learned query optimizer. 3. PRELIMINARY RESULTS Proc. VLDB Endow., 12(11):1705–1718, 2019. We use the same benchmark and experimental setup to [9] Rémi Coulom. Efficient selectivity and backup evaluate the performance between PostgreSQL’s optimizer, operators in monte-carlo tree search. In Computers NEO, AlphaJoin 1.0 and AlphaJoin 2.0. We present pre- and Games, volume 4630, pages 72–83. Springer, 2006. liminary experiments in Figure 4(d) that indicate that AlphaJoin [10] David Silver et al. Mastering the game of go with deep 2.0 can generate better join orders with lower execution neural networks and tree search. Nature, 2016. time compared to the ones generated by PostgreSQL’s op- timizer (decreased by 60.1%), the state-of-the-art method [11] Jennifer Ortiz et al. Learning state representations for NEO [8] (decreased by 38.7%) and our previously proposed query optimization with deep reinforcement learning. version AlphaJoin 1.0 (decreased by 31.3%). Moreover, we In DEEM’18. ACM, 2018. also decrease the ratio of the queries for which PostgreSQL’s [12] Nitish Srivastava et al. Dropout: A simple way to optimizer performed better from 48% to 9%. prevent neural networks from overfitting. J. Mach. Learn. Res., 15(1):1929–1958, January 2014. 4. CONCLUSION AND NEXT STEPS [13] Ian Goodfellow et al. Deep Learning. MIT Press, 2016. In this Ph.D. work, we present AlphaJoin, the first MCTS- [14] Sylvain Gelly et al. Exploration exploitation in go: Uct based query optimizer that generates highly efficient join or- for monte-carlo go. In NIPS Workshop OTEE, 2006. ders for database optimizer. AlphaJoin iteratively improves [15] Viktor Leis et al. How good are query optimizers, its performance through a combination of two neural net- really? Proc. VLDB Endow., 9(3):204–215, 2015. works and MCTS. Preliminary results show AlphaJoin con- [16] Anji Liu. Watch the unobserved: A simple approach sistently outperforms the state-of-the-art method NEO and to parallelizing monte carlo tree search, 2018. the PostgreSQL’s optimizer.