Local Search for AI Planning? Oto Mraz[0000−0002−7728−5660] Department of Computer Science, ETH Zurich, Switzerland otmraz@student.ethz.ch Abstract. Recent breakthroughs in the field of AI planning such as the Identidem and Marvin planners support the creation of more advanced and realistic representations of real-world domains. It is well-known that an adequate local search strategy can help to solve increasingly compli- cated Planning Domain Definition Language problems. Contemporary planners, however, strive to find a balance between the traditional greedy search and a certain degree of randomness. The aim of this work is thus to introduce a new planner that combines applicable local search tech- niques in a novel way not explored before to enhance the performance of the existing JavaFF planner. The new proposed planner is based on the principle of local beam search combining different successor selection methods, macros and restarts. Experimental results show that the new planner can solve considerably more problems and often within a shorter time compared to its predeces- sor JavaFF. Our planner could find its practical utilization in domains such as urban traffic modelling or autonomous robot control. Keywords: AI Planning · Local Beam Search · Macros · Restarts. 1 Introduction The field of AI planning experienced dramatic changes in the past two decades. During this time, newly developed planners have greatly increased the scope of applicable AI planning models. Among others, Coles [3] has shown how an effi- cient local-search technique can find the goal faster. Further research stresses the importance of a customizable planner resilient to changing domain topologies. This paper presents a new planner on the basis of JavaFF (the Java version of Hoffman’s FF planner [4]). Our planner enhances JavaFF with local-beam search, sophisticated successor selection, efficient restarts and macro actions. The paper continues with an overview of the related work in this field and discusses the algorithms used in theory. We will then describe in detail how ? This research was done in the Informatics Department at King’s College London, UK. Special thanks belong to Dr. Andrew Coles for his support and guidance he provided to me while supervising my thesis. Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). these local search techniques have been incorporated into the novel planner and demonstrate its results in practice. The paper concludes with a summary of the successes achieved and several outlines for future work. 2 Related Work One of the biggest problems AI planners face are local minima which they are unable to escape. Coles’ planner Identidem [3] has successfully tackled this issue using restarts inside a depth-bounded and restart-bounded search. Identidem stores the best state seen so far called bestClosed and can continue to explore worse states only for a set amount of time. If the search is fruitless, it restarts from bestClosed, explores a different path; hopes to progress further in the search. An alternative approach is to keep more than one candidate state open. This is the motivation behind local beam search (LBS). In LBS we keep the best k states open rather than just one like in greedy search. For the purposes of AI planning, a beam will represent a state. We proceed by generating the successors of all k beams and will choose the k best ones out of them [7]. The benefit of LBS lies in the ability to share useful information between the beams and focus computational efforts on the most promising areas of the search. Furthermore, we limit the demand for memory during search since the space complexity remains linear. Meanwhile, we maintain auspicious alternatives in case of failure [10]. When comparing with greedy search algorithms, Wilt [8] has shown, that beam search outperforms its competitors particularly when working with massive search spaces. On the contrary, all search algorithms have also drawbacks. In the case of LBS, all its beams often tend to slide into the same path, which could turn LBS into a slower version of a greedy search. 2.1 Successor Selection In conjunction with a suitable search algorithm, the planner also requires an effective successor selection procedure. The successors will be chosen based on their Relaxed Planning Graph heuristic values (h-values) (a measure of how far from the goal we are). This work explores five distinct successor selection algorithms that operate with the h-values. The simplest one is always selecting the successor with the lowest h-value called the BestSuccessorSelector. Unfortunately, this approach can lead to being stuck in some local minimum that the planner cannot escape. It is thus necessary to employ some sort of non-determinism into the search process and occasionally allow less advanta- geous states to be explored. One way to achieve this stochastic behaviour is by adopting certain techniques from genetic algorithms. A commonly used method is a roulette-wheel selection. In a roulette-wheel style selector, an individual is selected randomly. A state s is chosen from the set of all successors {s1 , . . . , sn } with the probability: F (s) p(s) = Pn (1) i=1 F (si ) where the fitness F (s) of the state s is 1/h(s), (h(s) being the h-value of s) [9]. Evolutionary algorithms tell us that roulette-wheel selection may not work well if the states’ fitnesses are large and the differences between them are negligi- ble. Hence we aim to amplify these differences by transforming the fitness func- tion’s arguments. Rank-based selection confronts this issue by sorting the states according to their h-values and assigning them ranks from the set {1, . . . , n}; their fitness is then calculated based on their ranks. Yet another option is linear scaling, where the h-values of the successors are scaled down closer to 0 to give the best states a bigger advantage of being selected from the roulette-wheel [1]. The final selection method investigated was a tournament selection, which abandons the idea of fitness altogether. Instead, we randomly select a subset of candidate successors and let them compete in pairs in a knockout-style tourna- ment system [1]. In each tournament match the better individual wins with a probability pt , 0 ≤ pt ≤ 1. 2.2 Macro Actions Another decent improvement of AI planners was described by Botea [2]. Botea used Solution Abstraction Enhanced Planner (SOL-EP) macro actions in the MacroFF planner. A macro action is essentially a sequence of two or more con- secutively applied actions that can assist the planner to reach the goal quicker. A macro action allows us to skip intermediate states in the search space and helps to overcome plateaus in the search space since we can explore deeper states. SOL-EP macros work as follows: First, the planner solves an easy problem in a given domain. From the found solution, candidate macros are extracted and pruned to keep only the logically helpful ones. Finally, the planner solves the problem once again with the help of a macro. This tests how much the macro improved the planning process. The most helpful macro(s) are then used to solve more complicated instances in that domain. Although MarcoFF successfully en- hanced the FF planner with SOL-EP macros, there is still more potential in combining them with a well-guided local search, which this work explores. 3 Design of the Planner As mentioned above, the JavaFF planner is the basis for this work. The new planner replaces JavaFF’s Enforced Hill Climbing (EHC) with LBS. We have a set open which contains the states we are currently searching through and a closed set containing states visited in the past. The main LBS algorithm is implemented as follows: 1. Begin with the initial state and check if it is the goal state. 2. Add the successors of the initial state to the open set. 3. Iteratively search through the state space until we find a solution or reach a dead end: (a) Generate the successors of the states in the open set. (b) If a goal state is found, return it. (c) Discard any previously visited states. (d) If the open set is empty (meaning we have reached a dead end), terminate the search as unsuccessful. (e) If the open set contains more states than k, the number of beams, use a successor selector to pick a subset of size k from them. This subset will form a new generation of the open set. (f) Add any not chosen states to the closed set. In order to make the main LBS algorithm more robust, we decided to enhance it with restarts if the planner reaches a dead end. Like in Identidem, we have a variable bestClosed to store the state with the best h-value seen so far. If there are no more successor states we simply start again from bestClosed and search with more breadth. Note that LBS already supports a wider search space. Hence, in order to make the restarts truly effective, we perform a breadth-first search of depth 2 from bestClosed. This approach gives the planner a decent opportunity to grasp a more promising path in the search space. After that, the search continues once again with k beams only. Similarly to Identidem, there is also a restart bound in order to stop fruitless restarts. Lastly, we only continue restarting if the last restart helped the planner reach a more favourable state. 3.1 Successor Selection Various types of selectors can be supported, below we describe those which we have actually implemented in the planner. The main reason behind choosing LBS as the backbone of our new planner lies in its unique property of selecting multiple successors. This allows us to design a new method incorporating two successor selectors (denoted as primary and secondary) into a single planner thus combining their benefits. The primary selector chooses the first part of the successors (the secondary one chooses the remaining ones). In addition, the user can adjust the successorSelectorBoundary (SSB). The variable SSB dictates the proportion of successors chosen by each selector (as a real value in the range from [0, 1]). For beam bi , 1 ≤ i ≤ k, the primary selector is used when (i − 1)/k < successorSelectorBoundary. (2) Otherwise, bi is obtained from the secondary selector. This particular design choice for successor selection yields a flexible application, which allows us to maintain some form of elitism in the successor selection process. The user can, namely, ensure that the current best state always proceeds into the next search iteration by opting for the BestSuccessorSelector, which picks the state with the lowest h-value. In case of a tie, one of the tied states is chosen at random. In addition to the traditional RouletteSuccessorSelector discussed in Sec- tion 2.1, our planner also provides a RankBasedSuccessorSelector, which cal- culates the fitness F (s) of a state s according to the following formula: F (s) = 1/(2R(s) ), (3) where R(s) is the rank of the state s. Such a formula yields a geometric distri- bution for the selection probabilities, the sum of which tends to 1. Moreover, this provides an elegant implementation of the rank-based selection algorithm (the remaining portion of the roulette-wheel is simply given to the best state as an extra advantage). The ScalingSuccessorSelector, on the other hand, operates directly with the h-values. It finds the worst h-value among the current set of successors hmax and computes the fitness F (s) of each state s as follows: F (s) = hmax − h(s) + 1. (4) Lastly, the implemented TournamentSuccessorSelector uses tournaments of size 2. It picks 2 states randomly and then returns the better of them with the probability 0.8 (this value worked the best in our preliminary experiments). 3.2 Macros While the LBS algorithm combined with restarts yields a reasonably robust application capable of dealing with complex domain topologies, another key goal was to make the planner fast. In order to achieve this, we decided to enhance it by means of MacroFF’s SOL-EP macros. The newly proposed planner thus comes with a MacroGenerator, which is responsible for macro extraction and pruning. From a solution to a training problem the MacroGenerator extracts a list of all pairs of consecutive actions (macros consisting of more than 2 simple actions were not considered). We then prune this list to discard macros that: – consist of a single repeated action, – are duplicated, i.e., we already have a pair of these actions (albeit possibly using different objects), – do not satisfy the chaining rule – their actions do not have any parameters in common and, therefore, are unrelated to each other. The main difference between MacroFF and our macro implementation is how candidate macro actions are evaluated. Instead of considering the number of nodes expanded (which may not always reveal the most efficient macros), we will adopt an alternative proposed by Newton [6]. Newton decided to compare the time taken to find a solution with and without the macro, which provides a more accurate measure of how useful the macro was. Finally, as a crucial improvement to its predecessor JavaFF, our planner al- lows the user to adjust its behaviour to the given domain topology. Our prelimi- nary research has shown that every successor selector, macro generation setting or even beam count has its benefits and drawbacks. Therefore, the planner was instead designed to use default values (configurations that performed best during testing) and to parse optional parameters if the user decides to plan otherwise. Our preliminary research indicates that there does not seem to be an optimal value for the beam count k for all domains. Some domains are solved fastest by means of greedy search (k = 1), while others require a more thorough exploration of the search space. Of course, a higher value of k does not stop the planner from solving trivial problems, but it could drastically slow down the entire process and the planner would no longer be competitive against trivial algorithms such as best-first search. Hence, we opted for an iterative approach of increasing k, starting at k = 1. Each time a search fails, we start again with one more beam. The user may choose a maximum beam count; if a solution is still not found after reaching it, the planner runs the best-first search for completeness. 4 Experimental Results and Optimisation After implementation, the planner was tested on a set of benchmark problems from the International Planning Competition (IPC) [5] using a current lab ma- chine. The planner was run on each problem instance of a given domain ten times with random seeds 1, . . . , 10 to make the experiments reproducible. The experiments began with searching for optimal default values for the planner’s parameters (tested on the driverlog and rovers domains). Of course, we cannot feasibly test all possible planner configurations, hence we opted for a sequential approach of optimizing one planner component after another. The found optimal value for a given parameter will then be used for all remaining experiments. After optimisation, the new planner was tested against the original JavaFF that used Enforced Hill Climbing (EHC) on a variety of domains to assess how much the planner’s performance improved (additional graphs can be found at sites.google.com/site/otomrazgee/home/lsaip). The pipeline for planner optimization will proceed as follows: 1. First, the performance of each of the successor selectors will be evaluated. 2. Then the most promising selectors will be combined to find a pair, that works best together. Also, the successor selector boundary parameter will be optimised. 3. Next, macro actions will be added to the planner. 4. After that, an optimal number of beams will be found. 5. Lastly, restart options will be added, gradually increasing their amount. 6. Finally, the application will be tested against the original JavaFF planner to evaluate the improvement achieved. When searching for an ideal combination of successor selectors the planner was run using each selector individually on the base implementation of local beam search and the planning times were recorded. The maximum number of beams was set to three, no macros or restarts were added. Perhaps the most interesting results were observed on the Rovers domain, depicted in Figure 1 below. The TournamentSuccessorSelector clearly outperformed all its competi- tors on this domain (note the graph’s logarithmic scale). The most likely reason behind its success is its rapid decision process. While the other selectors under- take the time-consuming process of calculating the h-values for every state, the tournament selector only needs to compute the h-values for k pairs of states from which it chooses. The remaining four selectors do not always come in the same Fig. 1. Comparison of the successor selectors on the rovers domain. order for various problems, however, the rank-based and best successor selectors seem to be performing a little better in general. Very similar results were seen also on the driverlog domain. Consequently, we combined the rank-based and best successor selectors with the tournament selector. Furthermore, two values, 0.3 and 0.6, were tested for the successorSelectorBoundary. Indeed a combination of selectors yielded a faster planner than any single selector used by itself. For example, the rovers instance 20 took at least 150s with a single selector; using a combination of selectors this time could be cut down to 128s. Out of the tested selector combinations, using the Best and Tournament selectors with a successorSelectorBoundary at 0.3 turned out to be triumphant. These observations lead to the conclusion, that elitism (provided by the use of the BestSuccessorSelector) is extremely important for obtaining a fast planner. After optimising the core components of the local beam search algorithm, the focus turned to macros. The planner was tested using 0–4 macros to find a balance between saving time by taking a shorter path to the goal and the overheads associated with an increased branching factor. We must not forget that macros add additional successor states to the open set. If we decide to calculate the h-values of all of them this will slow down the planner considerably. In general, our research shows, that macro actions seem to cut down the planning times for most problems. They clearly succeed in skipping over inter- mediate states and thus decreasing the number of steps needed to reach the goal. Moreover, the macros also helped to overcome local minima or plateaus in the search space by leading the planner in the right direction. Especially in the driverlog domain which is known to have many local minima, macros proved to be very beneficial to the search process. We did, however, also encounter cases where the planner was slowed down, e.g., instance 13 of the driverlog domain. It may be the case, that the macro chosen here was helpful in the training problem instances, but in the real instance 13 it in fact increased the branching factor so much that it became counterproductive and slowed the planner down. For this reason, we decided to use only 1 macro by default. The next step was to optimize k the maximum number of beams used by the planner. The planner was once again asked to solve the first 20 problems on the driverlog and rovers domain gradually increasing k from 1 to 5 (now the planner used a combination of the best and tournament selection methods and one macro was generated). Just as our preliminary research hinted, it was prac- tically impossible to find an ideal, domain-independent value for k. Nevertheless, a compromise value of 3 seems to generalise well. Fig. 2. Comparison of the performance for varying the maximum beam count on the rovers domain. Figure 2 illustrates the situation in the rovers domain. Clearly, the planner was extremely sensitive to changes in this value. With a single beam the planner experienced severe difficulties particularly in instance 11. For most of the random seeds the planner had to revert to best-first search which was a lot more time consuming. In order to be sure that we can solve this problem with LBS, the planner requires at least three beams were required. The most likely reason for this behaviour is the well-known dead ends in the rovers domain. If the single beam enters such a dead end there is no alternative for the search to recover. On the other hand, in the driverlog domain, there were a couple of problems that could not be solved using LBS at all and the fastest solution was to revert to best first search as soon as possible. In this case, adding more beams was actually counterproductive and increased the time need to find a plan. Finally, a default value of five restarts was chosen. Interestingly, tests regard- ing restarts found little correlation between the maximum number of restarts and the time taken to find a solution. Yet, when investigating the planner’s behaviour more closely, we found that the planner generally needs fewer beams when aided by restarts. Therefore, we can say that restarts do take credit in making the planner more resilient to changing domain conditions. 4.1 Comparison with the Original JavaFF Planner After the optimisation, the planner was tested against its predecessor JavaFF in practice. Figure 3 compares the performance of the new planner based on LBS and traditional JavaFF using EHC on the rovers domain. With the exception of the first four problems, LBS (red bars) evidently performs a lot better than EHC. The hardest problem, instance 20, took 148 s to complete with LBS compared to, roughly double, namely, 330 s with EHC. Fig. 3. Comparison of Enforced Hill Climbing (EHC) and Local Beam Search (LBS) on rovers domain. A different picture can be seen in the driverlog domain, shown in Figure 4. Here the LBS algorithm can be considered more successful, too. Although EHC was slightly faster for some problems, the key improvement comes in problem instances 14, 15 which LBS could solve, but EHC could not. These results confirm that the parameter optimisations were fruitful in generating a robust planner. In order to examine how universal the planner is, we will now investigate the planner’s behaviour also on the pipes non-tankage and freecell IPC domains. In the pipes domain, once again, certain problems are solved faster with LBS, others with EHC. Crucially, however, LBS proved to be much more robust than EHC by solving four more problems out of 23. Lastly, the freecell domain is known to have many unrecognizable dead-ends and local minima that are hard to escape. Indeed both LBS and EHC could only solve problems 1–10. Moreover, LBS was Fig. 4. Comparison of EHC and LBS on driverlog domain. The missing bars indicate a failure to find a solution. unable to solve problem 10 using the default planner configurations unlike EHC. Furthermore, LBS in general took more time to find a solution than EHC in problems 1–9. We hence began to search what caused such a poor performance. As discussed above, no planner configuration is ideal for all domains. De- pending on the topology of a given domain, we have encountered cases, where the increased branching factor associated with macros can prove to be detrimen- tal in our attempt to find a solution. Hence, our first shot was to plan without macros. Not only was the planner now successful in finding a solution without the macros, but it also managed to improve on the time of EHC. EHC took 37 minutes on average to complete the problem, LBS reduced this time to 32 minutes. The experiments in the freecell domain underline the need for a customiz- able planner that this work presents. This approach of having a reservoir of alternative search techniques could also be easily applied to many other contem- porary planners. For instance, a planner successful in overcoming local minima might not necessarily be good at dealing with dead ends. While such a planner would be apt to quickly solve problems in the driverlog domain, this planner could struggle in the freecell domain. Hence, the planner customization that this work proposes could serve as a solution and make the planner more resilient to changing environments. 5 Conclusions and Future Work To conclude, this project has produced a novel PDDL planner based on a lo- cal beam search. This work has highlighted the unique properties of LBS that make it suitable for many different planning domains. On the contrary to greedy search, local beam search keeps a reservoir of alternative candidate paths to a solution and thus constitutes a much more robust planner. The new planner has also enabled us to combine the benefits of different successor selection proce- dures. Additionally, the planner employed restarts and macro actions, which help it to combat local minima and dead ends in the search space. The conducted experiments provided useful default parameters for our planner, nevertheless, they have also emphasized the importance of creating a customizable application. More importantly, the methods for combining various search techniques to- gether used in this work could serve as inspiration for many other planners, too. Not only can this approach make the planners suitable to a wider range of do- mains, but also make them more robust. For instance, we have observed that adding restarts decreases the number of beams needed to find a solution. Even with all the wonderful breakthroughs in recent years, the area of AI Planning still remains widely underdeveloped with lots of potential to offer. A promising direction to explore would be to combine LBS with another search algorithm. If LBS fails to find a solution, its work could be exploited by another search technique that would finish the search. Once again we could combine the speed of one search method with the robustness of another. Furthermore, in AI planning we are interested not only in finding a valid solu- tion but rather in finding the best solution. We have demonstrated the qualities of LBS, but we must remember, that LBS is not complete on its own, therefore it might not always provide us with the optimal solution. Zhou [10] outlines, how completeness can be achieved by transforming LBS into beam-stack search. Beam-stack search makes use of systematic backtracking using which we can continue to refine the existing solution until the optimal solution is found. References 1. Baeck, T., Fogel, D., Michalewicz, Z.: Evolutionary Computation 1: Basic Algo- rithms and Operators, vol. 1. CRC Press (2018) 2. Botea, A.: Improving AI Planning and Search with Automatic Abstraction. PhD thesis, University of Alberta (2006) 3. Coles, A., Fox, M., Smith, A.: A new local-search algorithm for forward-chaining planning. In: Proc. of ICAPS’07. pp. 89–96. AAAI (2007) 4. Hoffmann, J.: Where ’ignoring delete lists’ works: Local search topology in planning benchmarks. Journal of Artificial Intelligence Research 24, 685–758 (2005) 5. Long, D., Fox, M.: The 3rd international planning competition: Results and anal- ysis. JAIR 20, 1–59 (2003) 6. Newton, M.A.H., Levine, J., Fox, M., Long, D.: Learning macro-actions for arbi- trary planners and domains. In: Proc. of ICAPS’07. pp. 256–263. AAAI (2007) 7. Russell, S.J., Norvig, P.: Artificial intelligence: a modern approach. Malaysia; Pear- son Education Limited (2016) 8. Wilt, C.M., Thayer, J.T., Ruml, W.: A comparison of greedy search algorithms. In: Third Annual Symposium on Combinatorial Search. pp. 129–136 (2010) 9. Wong, C., Guo, Z.X., Leung, S.: Optimizing decision making in the apparel supply chain using artificial intelligence (AI): from production to retail. Elsevier (2013) 10. Zhou, R., Hansen, E.A.: Beam-stack search: Integrating backtracking with beam search. In: Proc. of ICAPS’05. pp. 90–98. AAAI (2005)