Experimental evaluation of pheromone models in ACOPlan M.Baioletti, A.Milani, V.Poggioni, and F.Rossi Department of Mathematics and Computer Science University of Perugia Via Vanvitelli 1, 06123 Perugia, ITALY {baioletti,milani,poggioni,rossi}@dmi.unipg.it Abstract ACOplan is a planner based on the ant colony optimization frame- work. Using the ACO framework to solve planning optimization prob- lems, one of the main issues to address is the choice of informative and easy to compute pheromone models. In this paper we present and discuss an experimental evaluation of the several pheromone models implemented in ACOPlan. The experiments have been run solving the problems used in the last planning competition. The results suggest that fuzzy pheromone models represents a suitable tradeoff between performance and cost. 1 Introduction A planning problem is usually defined as a search problem: finding a solution plan which satisfies the problem constraints. An important improvement of this definition is to consider planning as an optimization problem, with respect to a given metric. Propositional planning are usually optimized with respect to the plan length or to the number of time steps (if parallel actions are allowed). For instance, HSP [6], using an admissible heuristic function, is able to provide the former kind of optimization, while many algorithms, including Graph- Plan [5], BlackBox [10] and others, can optimize in the latter form, by using an iterative deepening search method. However the planning frameworks in which optimization seems to be more natural are planning with action costs and planning with numerical fluents. In the first, the actions are associated to a real number representing their execution cost, while in the second actions can handle and modify numerical variables and an objective function can be easily expressed in terms of some target variables (for instance fuel as in Logistics domain). The main idea of this paper is to use the well known Ant Colony Opti- mization meta–heuristic (henceforth ACO) to solve planning problems with the aim of optimizing the quality of the solution plans. ACO is a meta– heuristic inspired by the behavior of natural ants colony which has been successfully applied to many Combinatorial Optimization problems. Clearly, Proceedings of the 17th International RCRA workshop (RCRA 2010): Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion Bologna, Italy, June 10–11, 2010 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings being ACO a stochastic algorithm, there is no guarantee that optimal solu- tions are ever found, but we hope that, as shown in many other applications of ACO, this method can produce excellent or optimal solutions. To test if ACO can be effectively used in optimization planning problem we have implemented, as a first step, an ACO–based propositional planner ACOPlan which tries to optimize plans in terms of plan length. This system has been presented in [1, 2, 3, 4]. After the good results shown in these works, we decided to extend ACOPlan to more complex planning frameworks, like planning with action costs and numerical planning. In this work we present the extension to planning with action costs and an experimental evaluation of this system by using the most recent planning benchmarks for planning with action costs [8]. The paper is structured as follows. A short introduction to Ant Colony Optimization is provided, then the ACOPlan algorithm is presented in a recent version applied to the framework of planning with action costs. Four pheromone models are also introduced and motivated. The experimental evaluation of the proposed models on the above cited benchmark is presented and discussed. The paper concludes with a discussion of lessons learned and future works and extensions. 1.1 Ant Colony Optimization Ant Colony Optimization (ACO) is a meta–heuristic used to solve combina- torial optimization problems, introduced since early 90s by Dorigo et al. [7], whose inspiration comes from the foraging behavior of natural ant colonies. ACO uses a colony of artificial ants, which move in the search space and build solutions by composing discrete components. The construction of a solution is incremental: each ant randomly chooses a component to add to the partial solution built so far, according to the problem constraints. The random choice is biased by the pheromone value τ related to each component and by a heuristic function η. Both terms evaluate the appropriateness of each component. The probability that an ant will choose the component c is [τ (c)]α [η(c)]β p(c) = P α β (1) x [τ (x)] [η(x)] where the sum on x ranges on all the components which can be chosen, and α and β are tuning parameters for pheromone and heuristic contributions. The pheromone values represent a kind of memory shared by the ant colony and are subject to update and evaporation. In particular, the pheromone can be updated at each construction step or for complete solutions (either all or the best ones) in the current iteration, possibly considering also the best solution of the previous iterations. The update phase is usually per- formed by adding to the pheromone values associated to components of the 2 solution s an increment ∆(s) which depends on a solution quality function F (s) 1 . The pheromone evaporation has the purpose of avoiding a premature convergence towards not optimal solutions and it is simulated by multiplying the pheromone values by a factor 1 − ρ, where 0 < ρ < 1 is the evaporation rate. The simulation of the ant colony is iterated until a satisfactory solution is found, an unsuccessful condition is met or after a given number of iterations. ACO has been used to solve several combinatorial optimization problems, reaching in many cases state of art performance, as shown in [7]. More recently ACO has also been applied to automated planning[1, 2, 3, 4]. 2 The ACOPlan planner Automated planning is a well known AI problem deeply studied in the last years. Many different kinds of problems defined over several planning models have been proposed. Our aim is to solve planning problems optimizing the cost of the solution plans, where the cost of a plan is defined in terms of the sum of the execution costs of its actions. In the standard reference ”classical” model for planning [12], world states are described in terms of a finite set F of propositional variables under a closed world assumption (CWA), i.e. a state s is represented with a subset of F, containing all the variables which are true in s. A Planning Problem is defined by a triple (I, G, A), in which I is the initial state, G denotes the goal, and A is a finite set of actions. An action a ∈ A is described by a triple (pre(a), add(a), del(a)), where pre(a), add(a), del(a) ⊆ F. pre(a) is the set of preconditions: a is executable in a state s if and only if pre(a) ⊆ s. add(a) and del(a) are, respectively, the sets of positive and negative effects: if an action a is executable in a state s, the state resulting after its execution is Res(s, a) = (s ∪ add(a)) \ del(a). (2) Otherwise Res(s, a) is undefined. A linear sequence of actions (a1 , a2 , . . . , an ) is a plan for a problem (I, G, A) if a1 is executable in I, a2 is executable in s1 = Res(I, a1 ), a3 is executable in s2 = Res(s1 , a2 ), and so on. A plan π = (a1 , a2 , . . . , an ) is a (classical) solution for (I, G, A) if G ⊆ sn . Several extensions have been proposed to the classical model, by the adding notions such as time, cost, resource consumption, etc., and increasing the complexity of the domain and the constraints. 1 For minimization problem whose objective function is f , F (s) is a decreasing function of f 3 In a planning model with action execution costs, the problem consists in finding an optimal plan which achieve the goals in G. A real constant, defined as action execution cost c(a), is associated to eachPaction a ∈ A, and the plan execution cost is defined accordingly by c(π) = ni=1 c(ai ). A plan π = (a1 , a2 , . . . , an ) is an optimal solution plan if there is no other solution plans π ′ such that c(π) > c(π ′ ). The idea is to use an ACO approach to solve this kind of optimization problem, by ”ants” which perform a forward search in the state space build- ing a plan starting from the initial state s0 and executing actions step by step. Each ant extracts the next action to execute in the current state from the set of candidate executable actions. The choice is made by a random- ized weighted extraction which takes into account the pheromone value τ associated to each solution component and the heuristic value η(a) for each candidate action a with a formula similar to (1). Once an action a has been selected, the current state is updated by means of the effects of a. By choosing always only executable actions is possible to guarantee that the entire plan is executable in the initial state. The construction phase stops when (1) a solution is found (the goals are true in the current state), (2) when a dead end is encountered (no action is executable) or (3)when an upper bound Lmax for the the length of action sequence is reached. Better solution plans will strengthen the pheromone they deposit. 2.1 The algorithm The general algorithm of ACOPlan [1, 2, 3, 4], was originary proposed for optmizing the plan length. The optimization process continues for a given number N of iterations, in each of them na ants build plans with a maxi- mum number of actions limited to Lmax . The pseudo code of the resulting algorithm is shown in Algorithm 1 where c is the initial value for pheromone. 2.2 The Pheromone Models The performance of the general algorithm described above, strongly depends on the choice of the next action, i.e. on the pheromone model which is used. A pheromone model consists of a set of solution components loaded with pheromone, where a component is a structure characterizing the context in which an ant has chosen a specific action. The more successful a component is, the more pheromone it will receive. The pheromone amount on the alternative components represent the distribution of τ value in (1). A good component should be enough informative to distinguish the context of most successful choices from the worst ones, in other word in order to represent the local strategies of successful ants, moreover components should be easy to represent and calculate. The following four pheromone models has been considered and experimented with the ACOPlan algorithm. 4 Algorithm 1 The algorithm ACOPlan 1: πbest ← ∅ 2: InitPheromone(c) 3: for g ← 1 to N do 4: for m ← 1 to na do 5: πm ← ∅ 6: state ←initial state of the problem 7: for i ← 1 to Lmax and state is not final do 8: Ai ← feasible actions on state 9: ak ← ChooseAction(Ai ) 10: extend πm with ak 11: update state 12: end for 13: end for 14: Update(πbest ) 15: Sort(π1 , . . . , πna ) 16: UpdatePheromone(πbest , π1 , . . . , πσ−1 , ρ) 17: end for State–State (SS) In this case a component is defined by a pair (si , sj ) where si is the current state and sj is the state reachable by the exe- cution of one of the actions executable in si . In this case only state information are shared among ants: state transitions which appear in more successful solutions will have their pheromone levels increased. In the other three models, instead, also information about actions is considered in the components. State–Action (SA) In the State–Action model, τ depends on the current state s and on the action a we would execute. This is one of the most expensive pheromone model by far, because the number of possible components is exponential with respect to the problem size. On the other hand, the pheromone values have a straightforward interpreta- tion as a policy (although in terms of a preference policy): τ (a, s) represents how much it is desirable (or better, it has been useful) to execute a in the state s. Fuzzy level–Action (FLA) The basic idea is to associate the action which is evaluated with the plan step, i.e. the level, where it is executed. Since the limited number of levels, such a model has a more tractable representation with respect to the state–action one. A drawback is that strictly associating the preference policy of an action to a time step t makes it unrelated to the values for close time steps, on the other hand it is likely that an action desirable at time 2, i.e. at an 5 early planning stage, it will also be desirable at time 1 and 3, i.e. a stage close to an early ones. To solve this problem we introduce the Fuzzy level–Action model which is the fuzzified version of the combi- nation level–action: the pheromone used for action a to be executed at time t can be seen as the weighted average of the pheromone values τ (a, t′ ) for a and for t′ = t − W, . . . , t + W computed with the level– action model. The weights and the time window W can be tuned by experimental results. Action–Action (AA) In the Action–Action model a notion of local his- tory is introduced: the pheromone depends both on the action under evaluation and on the last executed action. This is the simplest way in which the action choice can be directly influenced by the previous choices. Considering only the last choice allows to have a manage- able representation, and it results in a sort of local first order Markov property. 2.3 The Action Cost heuristic The Action Cost heuristic, i.e. the estimated cost to reach the goals starting from a state s, is computed on the delete relaxation P + of the problem by extracting a relaxed plan π + in a FF –style [9] from the relaxed planning graph. The planning graph [5] is a leveled graph with two kinds of nodes: facts and actions. The initial fact level F0 is the set of facts that are true in the initial state s0 , while the levels Ak and Fk+1 at a generic step k are respectively computed considering the actions executable using the facts at the level Fk and the positive effects of these actions. The computation of the heuristic is defined by the cost of π + : X c(π + ) = c(a) (3) a∈π + where a is an action belonging to π + . The extraction process of π + is similar to the FF one, except that the difficulty of an action, taken into account when next actions to add to π + at the level k have to be chosen (see [9] for details), is the action cost at the level k − 1, which is recursively estimated as follows: Action costs. X ck (a) = c(a) + ck (f ). (4) f ∈pre(a) Fluent costs. X ck (f ) = min{c(a) + ck−1 (b) | a ∈ Ak−1 , f ∈ eff (a)}. (5) b∈pre(a) 6 where ck (f ) is the cost we have to spend to reach f at the level Fk , c(a) is the cost assigned to the action a, Ak−1 is the set of feasible actions in the facts level Fk−1 . Moreover, we have c0 (f ) = 0, ∀f ∈ F0 . Notice that the cost of a fluent (and of an action too) may be different depending on the level in the planning graph. This heuristic can be related to the ones presented in [11]. 3 Experimental Evaluation This section presents and discusses the results obtained by the different pheromone models for ACOplan previously described. The optimization problem is to minimize the overall plan execution cost. State-State, State- Action, Action-Action and Fuzzy-Level-Action models have been experi- mented with the benchmark domains of the planning competition IPC-6 [8]. A preliminary phase of parameters tuning has been held by systematic tests in order to establish the ACOPlan general setting which are common to all the four pheromone models, then the phase of actual experiments has been conducted with ACOPlan settings: 10 ants, 5000 iterations, α = 2, β = 5, ρ = 0.15, c = 1, k = 0.5. Moreover since ACOPlan is a not deterministic planner, the results collected for each single problem instance are the average values obtained over 50 runs. The experiments were run on a Linux machine with a dual core processor and 2 GB of RAM. In the following, the experimental results for ACOPlan over the domains [8] Elevators, Openstacks, Parcprinter, Pegsol, Transport and Woodworking are presented. In Fig.1 it is shown, for each domain, the percentage of problems solved in 900 seconds. A first point which can be noted is that the best performing pheromone model is the Action-Action model, while the State-Action model is the worst performing one. This confirms the results obtained by the first version of ACOPlan (optimizing the plan length), presented in [13], where the model Action-Action has been introduced. On the other hand a very encouraging new result has been obtained: ACOPlan with Fuzzy-Level-Action pheromone model performs very close to the Action-Action model, considering also that Fuzzy-Level-Action is much more tractable. Due to the high variability of the absolute values of the solution costs, in order to compare the different solutions for all the problems in each domain, we have scaled and normalized all the quality values in the interval [0, 1]. First of all we have taken the cost of the best solution found by any model; then, for each problem and for each program version, we have computed the ratio between the best value and the average value obtained. The results are shown in the group of figures Fig.2 - Fig.7. 7 AA FLA SA SS Elevators 89% 83% 81% 83% Openstacks 100% 100% 100% 100% Parcprinter 80% 80% 72% 74% Pegsol 100% 100% 100% 100% Transport 82% 84% 82% 82% Woodworking 100% 100% 100% 100% Figure 1: Percentages of problems solved in 900 secs for each IPC-6 domains. We can note that all the pheromone models perform equally good for the first and the easiest problem instances in each domain; this is due to the fact that the problem instances in benchmark domain are ordered with respect to increasing size, which roughly correspond to an increasing optimization and problem solving difficulty. A general comment on the overall plan quality result is that the preva- lence of Action-Action ACOPlan and the bad performance of State-Action observed on the number of solved problems, see Fig.1, is generally confirmed for the quality found. In other words the problem solving ability and the optimization ability of all the pheromone models seems to follow a similar pattern. On the other hand even the worst performing pheromone models are within the 80 percent of the best quality found so far in 3 domains out of 6. The reason for the correspondence between quality and solving ability in ACOPlan is that a more successful strategy, in a randomized algorithms, tends to find many diverse solutions, while a successful deterministic strat- egy tends to find fast solutions, which are very close to each other, i.e. with close quality values. Figure 2: Best quality elevators domain, 8 Figure 3: Best quality openstacks domain. Domains which are considered ”difficult”, such as elevators, transport and woodworking show a prevalent bad performance for component mod- els State-State and State-Action. The intuition would suggest instead that State-State and State-Action characterize the context where a choice is op- erated by ants, since a pair of states represents the most detailed description of a local context, the best performance of more simplified contexts such as Action-Action pair and Fuzzy-Level-Action seems to be counterintuitive. A careful analysis has been done of pheromone distribution for SS and SA on the worst performing problem instances, such as elevator9, parcprinter19, transportation15. The analysis has revealed that the difficult seems to lie exactly in a too high level of details, i.e. in the proliferation of components. In other words if each different state originates one (or more) different com- ponent, similar successful states, which could differ for just one fluent value, do not accumulate enough pheromone, a fact which prevent them to be preferred in ants choices. We have observed that in situations where SS and SA are performing bad, the pheromone is spread over too many components, which are not likely to appear again in different runs of the ants, even if the plans found in the run slightly differ from the previous ones. On the other hand a less detailed description of the context, like in the Action-Action model, allows the accumulation of pheromone over components which appear more easily in different successful and similar runs. Let consider for example two successful plans π1 = {A0 , . . . , Ai , Ai+1 , . . . , An } and π2 = {A0 , . . . , Ai+1 , Ai , . . . , An } which differs only for two consecutive switched actions Ai and Ai+1 . The two plans in the Action-Action model will only differ locally for the three components (Ai−1 , Ai ) , (Ai , Ai+1 ) and (Ai , Ai+1 ), while in the State-State (or in the State-Action ) models they will potentially differ on all the components starting from the state produced by 9 Figure 4: Best quality parcprinter domain. Figure 5: Best quality pegsol domain. action Ai−1 onward, till the end of the plans. It is apparent that the earlier a different state is produced the greatest is the impact on the component differences among plans which are similar. Considering the global plan quality shown in Fig.8 the Fuzzy-Level-Action still confirm its proximity to Action-Action, i.e. F LA is always performing second and it is the best performing one also in the only domain (see pegsol domain) where AA is not. The global quality results for Openstacks also point out that it is in general an easy problem domain not significant to address quality performance comparisons. The performance of Fuzzy-Level-Action (FLA) can be explained by rea- sons similar to the ones supporting the better behavior of Action-Action(AA) with respect to state based components. In fact in Fuzzy-Level-Action (FLA) 10 Figure 6: Best quality transport domain. Figure 7: Best quality woodworking domain. two similar plans, in the sense of the previous π1 and π2 , receive pheromone in similar components: the model is robust with respect to local differences such as consecutive action switching or action insertion in a plan. Moreover the F LA model is more flexible with respect to AA. In AA the pheromone for plan π1 is given, for instance, to (Ai , Ai+1 ) whereas pheromone for plan π2 is put on component (Ai+1 , Ai ). In F LA instead, switching actions Ai and Ai+1 will not prevent the two components (Ai , i) and (Ai , i + 1) to receive pheromone from both plans, although they will receive a fuzzy different amount. Fuzzy-Level-Action is then a worthwhile compromise to obtain a good performance close to Action-Action without the associated cost. In general F LA tends to learn in which time step (or nearby) is useful 11 Figure 8: For each domain, each column represents a synthesis of the average quality found by means of each model. AA(Action-Action), FLA(Fuzzy- Level-Action), SA(State-Action), SS(State-State). to apply an action, while AA tends to learn which action is useful to apply after a given one. It is worth noticing that in specific domains like pegsol the better performance of all the pheromone models with respect to AA is basically due to the fact that the domain is easy to solve and to optimize for an ACOPlan solver, i.e. all the models solve the pegsol domain problems finding solutions within 95 percent of the optimum. The ability of a detailed distinction of the context, for instance applying an action at a certain precise step, is then crucial in finding and refining the optimal value, i.e. to improve the local search. 4 Conclusions The experiments of ACOPlan in the framework of planning with action costs, suggest that action based pheromone models outperforms state based ones, while the proposed novel pheromone model based on Fuzzy-Level-Action components has been proved to be a promising and effective tradeoff between performance (both for solving and optimization capabilities) and space cost. Further experiments are needed to confirm the behavior of F LA model in more general optimization contexts such as multiple costs and cost metrics involving resource consumption. More general lessons have been learned by a deeper understanding the role of component models in the problem solving and optimization capability of ACOPlan, which can be a useful guideline for further development of ACOplan and at some extent in the broader context of ACO algorithms: • in order to produce effective search strategies which distinguish a use- 12 ful situation, the components should capture the context in which an action is applicable, or a decision is taken (maximum context details); • the component should not proliferate too much, for performance rea- son, and more importantly in order to allow the pheromone deposits and concentrates on frequent components, in other words the compo- nents should not be too much descriptive in order to avoid to spread the pheromone on too many components which seldom will be found again in the search space (minimum context details) • the model components should include elements of the solution plans (actions and precedence constraints, time steps or plan history etc.) instead of properties of entities which can greatly differ in similar so- lution plans (e.g. a complete state description) • the model components should verify the property that two similar so- lution should produce similar components, according to a given notion of similarity, i.e. a minimal perturbation on a solution producing a so- lution with the same optimality value should deposit a similar amount of pheromone on similar components. This latter property has been shown to hold for AA and more strongly for F LA, allowing the pheromone to deposit on most of the components of perturbed solutions. Particularly promising will be investigating the extension of the tech- nique of component fuzzification introduced with F LA. According to this principle, given a metric distance D among components, a pheromone model can be fuzzified providing that: each component C which is similar to a given one Csol appearing in a solution, will receive a pheromone amount in pro- portion inverse to the distance D(C, Csol ). The technique has a general aim and it can also be applied to state based pheromone models, although these latter ones still present space complexity problems. Another line of future research will be the investigation of ACOPlan with multiple pheromone models where the transition probabilities are computed by using a normalized weighted combination of different pheromone types. References [1] M. Baioletti, A. Milani, V. Poggioni, and F. Rossi. An ACO approach to planning. In Proc of the 9th European Conference on Evolutionary Computation in Combinatorial Optimisation, EVOCOP 2009, 2009. [2] M. Baioletti, A. Milani, V. Poggioni, and F. Rossi. Ant search strategies for planning optimization. In Proc of the International Conference on Planning and Scheduling, ICAPS 2009, 2009. 13 [3] M. Baioletti, A. Milani, V. Poggioni, and F. Rossi. Optimal planning with ACO. In Proc of AI*IA 2009, LNCS 5883, pages 212–221, 2009. [4] M. Baioletti, A. Milani, V. Poggioni, and F. Rossi. PlACO: Planning with Ants. In Proc of The 22nd International FLAIRS Conference. AAAI Press, 2009. [5] A. Blum and M. Furst. Fast planning through planning graph analysis. Artificial Intelligence, (90):281–300, 1997. [6] B. Bonet and H. Geffner. Planning as heuristic search. Artificial Intel- ligence, 129((1-2)), 2001. [7] M. Dorigo and T. Stuetzle. Ant Colony Optimization. MIT Press,, Cambridge, MA, USA, 2004. [8] M. Helmert, M. Do, and I. Refanidis. International Planning Competition IPC-2008, The Deterministic Part. http://ipc.icaps- conference.org/, 2008. [9] J. Hoffmann and B. Nebel. The FF Planning System: Fast Plan Gen- eration Through Heuristic Search. Journal of Artificial Intelligence Research, 14:253 – 302, 2001. [10] H. Kautz and B. Selman. Unifying sat-based and graph-based planning. In Proc. of IJCAI-99, 1999. [11] E. Keyder and H. Geffner. Heuristics for planning with action costs revisited. In Proceedings of ECAI 2008, pages 588–592, 2008. [12] D. Nau, M. Ghallab, and P. Traverso. Automated Planning: Theory and Practice. Morgan Kaufmann, 2004. [13] F. Rossi. An ACO approach to Planning. PhD thesis, PhD Thesis, Mathematics and Computer Science Dept., University of Perugia, Italy, 2009. 14