=Paper=
{{Paper
|id=Vol-1648/paper11
|storemode=property
|title=Strong-Cyclic Planning when Fairness is Not a Valid Assumption
|pdfUrl=https://ceur-ws.org/Vol-1648/paper11.pdf
|volume=Vol-1648
|authors=Alberto Camacho,Sheila A. McIlraith
|dblpUrl=https://dblp.org/rec/conf/ijcai/CamachoM16
}}
==Strong-Cyclic Planning when Fairness is Not a Valid Assumption==
Strong-Cyclic Planning when Fairness is Not a Valid Assumption Alberto Camacho Sheila A. McIlraith Department of Computer Science Department of Computer Science University of Toronto, Canada University of Toronto, Canada acamacho@cs.toronto.edu sheila@cs.toronto.edu Abstract description of the actions and their effects that associates a non-zero probability with alternative action outcomes. While We address the class of fully-observable non- less expressive than MDPs, such systems are often easier to deterministic (FOND) planning problems where solve and show better scalability. non-deterministic actions are not guaranteed to dis- In cases where the stochasticity of a system is not well play fair behaviour. Typical solutions to FOND understood, or deemed unnecessary, Fully Observable Non- planning problems either guarantee goal reachabil- Deterministic (FOND) planning [Cimatti et al., 2003] pro- ity with a bounded number of transitions (so-called vides a framework, similar to probabilistic planning, but with strong acyclic solutions), or are predicated on a fair- the exception that the effects of actions are non-deterministic ness assumption that presumes each action effect without commitment to the probability of alternative action occurs infinitely often when the action is applied effects. Solutions to FOND planning problems are policies infinite times in the same state (so-called strong- that map states into actions. Cimatti et al. [2003] identi- cyclic solutions). We introduce the FOND+ plan- fied three different classes of solutions to FOND problems: ning model, that extends the FOND model with weak, strong, and strong-cyclic. Weak solutions are plans that a more informed description of the action tran- achieve the goal, but without guarantees. Strong solutions sitions. Solutions to FOND+ problems guarantee guarantee goal achievement in all executions. This condition goal reachability, even when the classical fairness is too restrictive, as FOND problems often do not have strong assumption is not valid. We characterize different solutions. The class of strong-cyclic solutions is an alterna- topologies, or classes, of solutions showing that tive, and guarantees goal achievement provided that all exe- typical strong acyclic, and fair strong-cyclic plan- cutions are fair. When the fairness assumption is not valid, ning are particular cases. Moreover, we character- policy executions may not achieve the goal. ize a class of FOND+ solutions that also solve 1- primary normative fault-tolerant problems, and are In 2004, Jensen et al. introduced the notion of fault- robust to any finite number of faults during execu- tolerant planning, motivated by two observations. First, they tion. We present algorithms to solve each class of interpreted non-determinism in many real-world problems as problems. Finally, we present the results of one of the consequence of infrequent errors (so-called faults) during our algorithms in the search for so-called norma- execution. Second, they claimed that in general, no action is tive solutions for a selection of blocksworld prob- ever guaranteed to achieve non-faulty behaviour. The resul- lems where the classical fairness assumption is not tant framework for fault-tolerant planning includes complete valid. knowledge about which state transitions are normative and which are otherwise faulty. Solutions attempt to reach a pre- scribed goal under the assumption that no more than κ faults 1 Introduction will occur during execution. Most real-world dynamical systems are best modeled us- In this paper we recognize that there exist non- ing non-deterministic action effects, reflecting our inability deterministic transition systems where the fairness assump- to characterize, with certainty, the outcome of actions, and tion does not hold, and the challenge we address is to find so- in some cases to additionally model the effects of exoge- lutions that guarantee goal achievement when strong FOND nous events. A number of different frameworks have been solutions do not exist. To this end, we introduce the FOND+ proposed to model and plan with non-deterministic transition planning framework as an extension to the FOND frame- systems. For the purposes of this paper, we limit our discus- work that includes a more informed description of transitions. sion to fully-observable systems. Markov Decision Processes The FOND+ framework makes it possible to find policies (MDPs) [Puterman, 1994], for example, model the effect of that guarantee goal achievement when the classical fairness an action in each state as a probability distribution over the assumption is not valid. Alternatively, the FOND+ frame- successor states. In contrast, many probabilistic planning sys- work can be informed with a description of the normative tems (e.g. [Kolobov et al., 2009]) exploit a compact symbolic behaviour of the system. In this case, solutions to certain FOND+ problems are also solutions to 1-primary normative appears infinitely often in σ, but the transition (s, a, s0 ) oc- fault-tolerant planning problems. We define different classes curs a finite number of times for an outcome s0 ∈ T (s, a). of solutions to FOND+ problems, and introduce algorithms Executions that are not unfair are said to be fair. to find solutions for each class. We identify a continuum Definition 2 (adapted from [Cimatti et al., 2003]). A solution of FOND+ solutions that includes strong, strong-cyclic, and to a FOND problem is strong cyclic when all executions result fault-tolerant planning solutions in extreme cases. This re- in either finite sequences of states that terminate in a goal sult is perhaps most related to the work done by Trevizan et state, or are infinite and unfair. al. [2007], which defines a framework to deal simultaneously with risk and uncertainty in an attempt to unify probabilis- Cimatti et al. [2003] distinguish three classes of solutions tic and non-deterministic planning. The results presented in to a FOND planning problem: weak, strong, and strong cyclic. this paper open the door to further advancements in the uni- Weak solutions are plans that lead the agent from the initial fication of FOND, fault-tolerant, and probabilistic planning state to a goal state, but with no guarantees due to the non- frameworks. determinism of the actions. Strong solutions are plans that guarantee goal achievement. Necessarily, strong solutions are 2 Background acyclic, and achieve the goal in a bounded number of steps. A non-deterministic planning domain is a tuple D = Finally, strong cyclic solutions are plans that are guaranteed hF, S, A, T i, where F is a set of propositions; S ⊆ 2F is to achieve the goal, but predicated on the fairness assump- a finite set of states, typically represented by the set of propo- tion given by Definition 1. The fairness assumption presumes sitions that hold; A is a finite set of actions; and T : S × A → that, for each state s and each action a applicable in s, every 2S is a transition function. outcome of a occurs infinitely many times if a is applied in s infinitely often. Strong solutions are also strong-cyclic, but An actions a ∈ A is defined in terms of its preconditions the opposite is not always true. P rea and effects Eff a . P rea ⊆ F is the set of propositions that need to hold true in a state s for a to be applicable in s, 2.2 Fault-Tolerant Planning i.e., P rea ⊆ s. Eff a = hEff 1a , . . . , Eff na i is a finite set of possible effects of a. Each effect e ∈ Eff a is a tuple of the A fault-tolerant planning problem is a tuple P = form e = hAdde , Dele i, where Adde and Dele are sets of hD, s0 , SG , F, κi, where D = hF, S, A, T i is a non- propositions to be added and deleted, respectively, when an deterministic planning domain, s0 ∈ S is the initial state, action transition is applied. The progression of s with respect SG ⊆ S is a set of goal states, F is an exception model as to action a and effect e is the updated state P rog(s, a, e) = described in Definition 3, and κ is a constant [Jensen et al., (s \ Dele ) ∪ Adde . Finally, the 2004; Domshlak, 2013]. Informally, a fault-tolerant planning S result of applying a in s is task P requires planning for goal achievement under the as- one of the states in T (s, a) = e∈Eff a P rog(s, a, e). A state sumption that no more than κ exceptions will occur during s0 ∈ T (s, a) is said to be an outcome of a in s, and the triplet plan execution. (s, a, s0 ) is called the transition. An execution of domain D in state s0 is a state-action sequence s0 , a0 , s1 , a1 . . . such Definition 3 (adapted from [Domshlak, 2013]). For a non- that ai is applicable in si for each i = 0, 1 . . ., and si+1 is deterministic Splanning domain D, an exception model is a one of the outcomes of ai in si chosen non-deterministically function F : a∈A Eff a → N, computable in time polyno- by the system. When s0 = P rog(s, a, e), the regression mial in the size of P. If, for each action a ∈ A, |{e | e ∈ of s0 with respect to action a and effect e is the partial Eff a , F (e) = 0}| ≤ α, then F is called α-primary. Likewise, state Regr(s0 , a, e) = (s0 \ Adde ) ∪ P rea . Conceptually, if, for each action a ∈ A, |{e | e ∈ Eff a , F (e) = 0}| > 0, Regr(s0 , a, e) is the minimal subset of propositions that a then F is called normative. state s needs to entail so that s0 is an outcome of a in s. Ac- In a fault-tolerant planning problem, actions have two types tions can be applied in partial states, whose progression and of effects. Primary effects model the normative behaviour of regression is defined as above (cf. [Muise et al., 2012]). the system, while faulty effects model one or more failures (faults) of the system. Primary effects, e, have F (e) = 0, 2.1 Fully Observable Non-Deterministic Planning whereas F (e) > 0 when e is faulty. Intuitively, the exception A fully-observable non-deterministic (FOND) planning model F maps each effect to the number of exceptions, or problem is a tuple P = hD, s0 , SG i, where D = hF, S, A, T i faults, associated with it. is a non-deterministic planning domain, s0 ∈ S is the initial Solutions to a fault-tolerant planning problem P = state, and SG ⊆ S is a set of goal states. Solutions to FOND hD, s0 , SG , F, κi are κ-plans, or policies that achieve the planning problems are policies that map states into actions. goal under the assumption that no more than κ faults We say that π is well-defined when π(s) is defined in every will occur during execution. Formally, a plan P is κ- state reachable by π. Executions of a well-defined policy π admissible when its execution generates the state-effect se- (also called plan executions) are executions s0 , a0 , s1 , a1 . . . quence (s0 , e0 , . . . , si , ei , . . .) with Σi F (ei ) ≤ κ. A policy is where, for each i = 0, 1 . . . , ai = π(si ). When the execu- a κ-plan when all plan executions are κ-admissible and reach tion finishes in a goal state sn ∈ SG , the sequence of actions the goal. Solutions to fault-tolerant planning problems do not P = a0 , a1 , . . . , an−1 is called a plan. assume fairness as done in strong-cyclic FOND planning, and Definition 1 (adapted from [Cimatti et al., 2003]). We say are robust to occurrence of up to κ faults. Note that the pa- that an execution σ is unfair when a state-action tuple s, a rameter κ that sets the maximum number of exceptions in a solution is part of the problem description, and needs to be A A known, computed, or otherwise guessed in advance. B B Jensen et al. [2004] introduced the model for fault-tolerant planning, and addressed 1-primary normative models with the rationale that these satisfactorily address non-determinism A A originating from many real-world physical system models. B A B B A B Fault-tolerant planning has found applications in the robot community (e.g. [Lussier et al., 2007]). More recently, fault- tolerant planning has been studied by Domshlak [2013] and (a) Strong-Cyclic Solution (b) Normative Solution Delgrande and Levesque [2013]. Recent work by Pineda et al. [2013] addressed the problem of fault-tolerant planning Figure 1: Different policies for the blocks-world domain. The under uncertainty, tackling the problem from the perspective pick-up action is non-deterministic, and may pick-up a certain of a stochastic shortest-path (SSP) problem. block or drop it on the table. The put-on-block and put-block- on-table actions are deterministic. 3 Planning with Unfair Non-Determinism Illustrative Example We seek to generate solutions to goal-oriented planning Consider the blocks-world domain, where block A is ini- problems with non-deterministic action effects where the fair- tially on top of block B, and the goal is to put A on the ta- ness assumption does not necessarily hold. Different notions ble. The agent can pick up blocks and put them on top of of fairness over executions have been discussed in the liter- other blocks, or on the table. The pickup-block action is non- ature. In particular, D’Ippolito et al. [2011] defined strong deterministic and the gripper may drop the block onto the ta- independent fairness, a notion that presumes it is known, for ble, but is guaranteed to eventually pick-up the block if tried each action, which outcomes are good and which ones are repeated times. The put-on-block (resp. put-block-on-table) failures (similar to the distinction made in fault-tolerant plan- action is deterministic and puts the block held by the grip- ning), and requires failures and assumptions – normally, tem- per on top of another block (resp. on the table). Strong-cyclic porally extended constraints – on the environment behaviour FOND solutions that assume fairness may rely on the even- to be independent. Sardina and D’Ippolito [2015] defined dif- tual occurrence of the faulty effect of pickup-block in order ferent logical characterizations of fairness based on the no- to reach the goal. Figure 1a shows the strong-cyclic solution tions introduced by D’Ippolito et al. [2011]. Kupferman and that picks up block A and puts it on top of B repeatedly until Vardi [1996] proved that verification of fair transition systems block A is dropped on the table. If it is the case that the effect is PSPACE-complete. of the pickup-block action that drops the block is not guaran- teed to occur during execution, the solution described above Our premise is that in many transition systems, and in con- does not guarantee goal achievement. In contrast, the solu- crete non-deterministic planning domains, it is known which tion illustrated in Figure 1b is strong, and goal achievement state-action pairs s, a are guaranteed to produce, eventually, is guaranteed. certain outcomes s0 if a would be applied in s infinitely often. A similar pattern appears when we consider that the block While not true in general, it is a valid assumption for many being picked up is the normative behaviour of the pickup- real-world transition systems. For example, it is known that block action. The solution described in Figure 1a loops until pressing the elevator’s button on the wall repeatedly causes the pick-up action is faulty, which is certainly not a good qual- the elevator’s doors to open. On the other hand, it is known ity solution because the normative behaviour of the system there is no guarantee that a fixed number will be awarded in a can lead to the goal, regardless of whether the faulty effect is lottery, even if the lottery is played infinitely many times un- guaranteed to occur or not. Note that the solution described der identical conditions. When fairness does not hold, solu- in Figure 1b is of good quality because executions lead to the tions need to be robust to non-determinism, and goal achieve- goal when the behaviour of the system is normative, and the ment in plan executions should not rely on a particular tran- solution is robust to non-determinism. sition for which there is no guarantees of occurrence. A sim- ilar argument can be made in non-deterministic domains that 3.1 The FOND+ Model have known normative behaviour. In the scope of this paper, A Fully Observable Non-Deterministic Planning problem we focus our attention on 1-primary normative fault-tolerant with Labeled UnfairnesS (for short, FOND+ problem) is a planning problems. In these problems, solutions need to be tuple P = hD, s0 , SG , Li, where D = hF, S, A, T i is a non- robust to failures, and the goal needs to be achievable when- deterministic planning domain, s0 ∈ S is the initial state, ever the system manifests its normative behaviour. SG ⊆ S is a set of goal states, and L : S×A×S → {F, U} is a The following example motivates the need for policy gen- labeling function that maps each triplet (s, a, s0 ) ∈ S ×A×S eration mechanisms that guarantee goal achievement in those into one of the symbols F or U. The semantics of L supports non-deterministic domains where the fairness assumption is different interpretations. Before elaborating, we first define not valid. Further, it illustrates a parallelism between solu- what constitutes an L-fair execution. For a non-deterministic tions to goal-oriented planning problems where fairness gov- planning domain D = hF, S, A, T i, and labeling function erns the non-determinism of the actions, and those where the L : S × A × S → {F, U}, we say that an execution in state s0 non-deterministic actions have primary and faulty effects. is L-unfair when there exists a state-action tuple (s, a) such that (i) (s, a) appears infinitely often, and (ii) there exists a transition (s, a, s0 ) such that L(s, a, s0 ) = F and (s, a, s0 ) oc- Strictly Fair curs a finite number of times. Executions that are not L-unfair are said to be L-fair. Note that fairness, as defined by Cimatti et al. [2003], is a particular case of L-fair that occurs when L assigns F to all transitions. Solutions to a FOND+ problem Strictly Unfair are policies that guarantee goal achievement, predicated on Normative the assumption that all executions of D in s0 are L-fair. Mixed Definition 4. A FOND+ problem is a tuple P = hD, s0 , SG , Li, where D = hF, S, A, T i is a non- Figure 2: Distribution of different classes of solutions to deterministic planning domain, s0 ∈ S is the initial state, FOND+ planning problems. SG ⊆ S is a set of goal states, and L : S × A × S → {F, U} is a labeling function. class of normative solutions that is particularly interesting for Definition 5. Solutions to a FOND problem P = + its practical applications and proximity with solutions to 1- hD, s0 , SG , Li are policies that guarantee goal achievement, primary normative fault-tolerant planning problems. predicated on the assumption that all executions of D in s0 A solution π to a FOND+ problem is strictly fair when are L-fair. all transitions t produced by L-fair plan executions have L(t) = F. Strictly fair solutions to a FOND+ problem P = In the scope of this paper, we focus our attention on two hD, s0 , SG , Li are strong-cyclic solutions to the FOND prob- different interpretations of the labeling function. The first in- lem P 0 = hD, s0 , SG i that results from ignoring the labeling terpretation assumes it is known, for each state-action pair function in P. The opposite is not always true, and strong- (s, a), which transitions (s, a, s0 ) are not guaranteed to oc- cyclic solutions to P’ may or may not be strictly fair solutions cur infinitely often in executions where the pair (s, a) ap- to P – only when all transitions t produced by plan execu- pears an infinite number of times. In those transitions, we as- tions have L(t) = F. In general, a strong cyclic solution can sign L(s, a, s0 ) = U. Otherwise, we assign L(s, a, s0 ) = F. belong to any class of FOND+ solutions. Like strong-cyclic With this interpretation, all plan executions are L-fair and solutions, strictly fair solutions can contain cycles. FOND+ solutions guarantee goal achievement during ex- Definition 6. A solution π to a FOND+ problem is strictly ecution. In other words, FOND+ solutions are robust to fair when all transitions t produced by L-fair plan executions non-deterministic transitions during plan execution, and goal have L(t) = F. achievement does not rely on any specific transition for which there is no guarantees of eventual occurrence. The The class of strictly unfair solutions is analogous to the FOND+ model overcomes difficulties of the FOND model class of strictly fair solutions, this time requiring transitions when strong solutions do not exist, as strong-cyclic solutions t produced by L-fair plan executions have L(t) = U. Strictly do not guarantee goal achievement, in general, when Cimatti unfair solutions to a FOND+ problem P = hD, s0 , SG , Li et al.’s fairness assumption is not valid. are strong solutions to the FOND problem P 0 = hD, s0 , SG i The second interpretation we examine describes the nor- that results from ignoring the labeling function in P. The op- mative behaviour of the system, and assigns L(s, a, s0 ) = F posite is not always true either, and strong solutions to P’ are (resp. L(s, a, s0 ) = U) when s0 follows from a primary (resp. strictly unfair solutions to P only when all transitions t pro- faulty) effect of a in s. With this interpretation, an execution duced by plan executions have L(t) = U. In general, a strong is L-fair when it is finite, or when it is infinite and for each solution can belong to any class of FOND+ solutions. Like state-action pair (s, a) that appears infinitely often, the system strong solutions, strictly unfair solutions are acyclic. responds with each primary effect of a in s infinitely often. As Definition 7. A solution π to a FOND+ problem is strictly we will see later, this interpretation is particularly interesting unfair when all transitions t produced by L-fair plan execu- in certain FOND+ planning problems, and makes it possible tions have L(t) = U. to find policies that guarantee goal achievement under the as- The class of mixed solutions completes the space of so- sumption that the number of transitions produced by faulty lutions to a FOND+ planning problem. Mixed solutions are effects during plan execution is finite. those that are neither strictly fair nor strictly unfair. Plan ex- In the blocksworld example introduced above, and with ecutions s0 , a0 , s1 , a1 . . . sn are such that sn is a goal state, both interpretations, all transitions t have L(t) = F except and may contain cycles. More precisely, there may exist the transitions produced by the effect of the pick-up action k < m < n such that sk = sm . As all infinite plan exe- that drops the block on the table, which have L(t) = U. With cutions need to be L-unfair (cf. Definition 5), at least one of this labeling, the policy depicted in Figure 1b is a FOND+ the outcomes of si by ai , for a certain k ≤ i ≤ m, must yield solution, whereas the policy depicted in Figure 1b is not. a transition (si , ai , s0i+1 ) with L(si , ai , s0i+1 ) = F. 3.2 Classes of Solutions Definition 8. A solution π to a FOND+ problem is mixed We characterize different classes of solutions to FOND+ when it is neither strictly fair nor strictly unfair. problems according to the fairness of plan executions. Finally, we characterize the class of normative solutions, Namely, we distinguish between strictly fair, strictly un- which are particularly interesting when the labeling function fair, and mixed solutions. Furthermore, we characterize the describes the normative behaviour of the system – i.e., when transitions (s, a, s0 ) with label L(s, a, s0 ) = F are exactly model and its solutions. More precisely, normative FOND+ those produced by a primary effect of a in s. A solution π solutions are robust to any number of faulty transitions dur- is normative when, in each state s, reachable by π: (i) there ing execution. This is an advantage with respect to solutions exists a plan execution in s that reaches the goal and such to fault-tolerant planning, which are robust to a limited num- that all transitions t have L(t) = F, and (ii) exactly one out- ber of κ faults during execution. Besides, κ is a parameter of come of s by π(s) produces a transition t with L(t) = F. the model that has to be fixed prior search, even if a solution The class of normative solutions intersects with the class of robust to any number of faults exists. strictly fair solutions in those solutions that are deterministic plans (cf. Figure 2). 4 Search Algorithms for FOND+ + Definition 9. A FOND solution π is normative when, in This section presents algorithms to search for strictly fair, each state s, reachable by π: (i) there exists a plan execution strictly unfair, and normative solutions to FOND+ planning in s that reaches the goal and such that all transitions t have problems. This is followed by an algorithm that is, in prin- L(t) = F, and (ii) exactly one outcome of s by π(s) produces ciple, less efficient, but can search for general solutions a transition t with L(t) = F. to FOND+ problems. Notwithstanding, the algorithms pre- sented below are intended to provide a baseline for future im- With the interpretation of the labeling function that de- provements, and their main purpose is to demonstrate the ex- scribes the normative behaviour of the system, policy actions istence of methods to find solutions to FOND+ . We decided of normative solutions are required to have, by definition, one to follow a branch-and-bound search scheme, motivated by its primary effect. Solutions need to be robust to faulty transi- efficiency in the search for strong-cyclic solutions to FOND tions and guarantee goal achievement, at any point during problems [Muise et al., 2012]. The similarity between the execution, when the system follows its normative behaviour. FOND+ and FOND planning models suggests that a similar In order to facilitate comparison with the fault-tolerant plan- technique may be effective in the search for FOND+ solu- ning model, we assume the value L(s, a, s0 ) depends only tions. on the effect that produces the outcome s0 , but not on the state s itself. Formally, we assume L(s, a, P rog(s, a, e)) = 4.1 Search for Strictly Fair Solutions L(s0 , a, P rog(s0 , a, e)) for all actions a ∈ A, all effects e By definition, all transitions t produced by plan executions of a, and all pairs of states s, s0 ∈ S in which a is applica- of strictly fair solutions are such that L(t) = F. It is straight- ble. When this property holds, the labeling function can be forward to see that strictly fair solutions to a FOND+ problem described compactly as a function of the actions effects. P = hD, s0 , SG , Li are all and only those strong-cyclic solu- With the restrictions described above, normative solutions tions to the FOND problem P 0 = hD0 , s0 , SG i. Intuitively, to a FOND+ problem P = hD, s0 , SG , Li are also 1-primary D0 is like D, but the actions applicable in a given state s normative solutions 1 to fault-tolerant planning problems are restricted to those a’s that only yield transitions (s, a, s0 ) P 0 = hD, s0 , SG , F, κi with an exception model F that as- labeled with L(s, a, s0 ) = F. This constraint can be imple- signs faults F (e) > 0 to effects e that produce transitions mented by means of extra dummy fluents in D0 . Alternatively, (s, a, P rog(s, a, e)) such that L(s, a, P rog(s, a, e)) = U. when L accepts a compact description such that its value de- Note that normative FOND+ solutions are robust to occur- pends only on the effects of the actions that yield each transi- rence of any possible number of faults during execution, as tion, we can safely prune an action a from D0 when not all its opposed to standard fault-tolerant solutions, whose execu- effects yield transitions t with L(t) = F. tions are guaranteed to be robust up to a bounded number For a FOND+ problem P, the algorithm consists of two of faults κ specified by the model. steps: Theorem 1. Let P = hD, s0 , SG , Li be a FOND+ prob- 1. P is relaxed into a FOND problem P 0 as described lem, and let P 0 = hD, s0 , SG , F, κi be a 1-primary normative above. fault-tolerant planning problem with exception model F that assigns faults F (e) > 0 to effects e that produce transitions 2. A sound and complete strong-cyclic FOND planner – (s, a, P rog(s, a, e)) such that L(s, a, P rog(s, a, e)) = U. e.g. PRP [Muise et al., 2012] – is used to search for Then, FOND+ normative solutions to P are 1-primary nor- a strong-cyclic solution to P 0 , which is returned as a mative solutions to P’ for any κ < ∞. strictly fair solution to P. Since strictly fair solutions to P are the strong-cyclic solu- The practicality of 1-primary normative fault-tolerant plan- tions to the FOND problem P 0 , soundness and completeness ning to model and solve many real-world physical prob- of the algorithm derives from soundness and completeness of lems has been discussed by Jensen et al. [2004] and Domsh- the strong-cyclic FOND planner. lak [2013]. Normative FOND+ solutions overcome certain limitations of the 1-primary normative fault-tolerant planning 4.2 Search for Strictly Unfair Solutions 1 By definition, all transitions t produced by plan executions For convenience, we informally say that a solution π is α- primary (resp. normative) when |{e | e ∈ Eff a , F (e) = 0}| ≤ α of strictly unfair solutions are such that L(t) = U. The algo- (resp. |{e | e ∈ Eff a , F (e) = 0}| > 0) in all effects from plan rithm to find strictly unfair solutions is somewhat analogous executions of π. Technically, the α-primary normative model intro- to the algorithm used to find strictly fair solutions. This time, duced by Domshlak [2013] requires F to be α-primary and norma- however, we exploit the correspondence between strictly un- tive globally, which is a harder constraint. fair solutions to a FOND+ problem P = hD, s0 , SG , Li and strong FOND solutions to the FOND problem P 0 = Algorithm 1 Search for Normative Solutions hD0 , s0 , SG i. Intuitively, D0 is like D, but the actions applica- 1: procedure P OLICY S EARCH(V, s0 , s? , A) ble in a given state s are restricted to those a’s that only yield 2: while π changes do transitions (s, a, s0 ) labeled wth L(s, a, s0 ) = U. This con- 3: π←∅ straint can also be implemented by means of extra dummy 4: Open ← s0 fluents in D0 . Alternatively, when L accepts a compact de- 5: Seen ← {} scription such that its value depends only on the effects of 6: Deferred ← {} the actions that yield each transition, we can safely prune 7: while Open 6= ∅ ∨ Deferred 6= ∅ do an action a when not all its effects yield transitions t with 8: s =U NSTACK () L(t) = U. 9: if s 6|= s? ∧ s 6∈ Seen then For a FOND+ problem P, the algorithm consists of two 10: Seen.add(s) steps: 11: if π(s) is undefined then 1. P is relaxed into a FOND problem P 0 as described 12: G EN FAIR P LAN(V, s, s? , A, π) above. 13: end if 2. A sound and complete strong FOND planer – e.g. 14: if π(s) is defined then [Jaramillo et al., 2014] – is used to search for a strictly 15: hp, ai = π(s) unfair solution to P 0 , which is returned as a strictly un- 16: S TACK S UCCESSORS(s, a) fair solution to P. 17: else break 18: end if Since strictly unfair solutions to P are the strong solutions 19: end if to the FOND problem P 0 , soundness and completeness of the 20: end while algorithm derives from soundness and completeness of the 21: P ROCESS D EADENDS () strong FOND planner. 22: end while 23: return π 4.3 Search for Normative Solutions 24: end procedure Normative solutions are those where, for every reachable Add Successors to Corresponding Stack state s, there exists a plan for s for which all transitions t have L(t) = F. Algorithm 1 exploits this property. Intuitively, the 25: function S TACK S UCCESSORS(s, a) algorithm explores plans that exclusively produce transitions 26: for e ∈ Effa do t such that L(t) = F, which are extended into robust policies 27: s0 = P rog(s, a, e) that also consider occurrence of other transitions. 28: if L(s, a, s0 ) = F then Without loss of generality, we assume the problem has the 29: Open.add(s0 ) property that all actions produce, in each state s in which a 30: else is applicable, exactly one transition t such that L(t) = F. If 31: Deferred.add(s0 ) this property is not true, and the value of L(s, a, s0 ) in any 32: end if state s depends only on the effect of a that generates outcome 33: end for s0 , actions that do not have exactly one effect that produces a 34: end function transition t such that L(t) = F can be safely pruned in a pre- Get Next Node to Explore processing stage. Alternatively, a constraint during search can 35: procedure U NSTACK be added, so that a state s can be expanded by action a only 36: if OpenF 6= ∅ then when exactly one outcome produces a transition t such that 37: s = Open.pop() L(t) = F. If this check is not performed, Algorithm 1 still 38: else finds solutions (perhaps non-normative mixed solutions) that 39: s = Deferred.pop() guarantee goal achievement when the interpretation of the la- 40: end if beling function describes fairness, but goal achievement dur- 41: return s ing execution is not guaranteed anymore when the labeling 42: end procedure function describes the normative behaviour of the system. Procedure P OLICY S EARCH in Algorithm 1 maintains three L(s, π(s), s0 ) = F, and stacks s0 into Deferred otherwise. separate stacks: Open, Seen, and Deferred. The following The construction of the policy is performed by, iteratively, search procedure is performed until policy π does not change unstacking a new state from Open, or Deferred if Open is with respect to the previous policy found. First, Open is ini- empty, and finding a plan with G EN FAIR P LAN search. This tialised with the initial state s0 and Seen, Deferred, and π iterative process is run until the stacks Open and Deferred are are initialised to empty sets. For each state s in Open being empty, or G EN FAIR P LAN search for plans fails. processed, for which π is undefined, procedure G EN FAIR - P LAN searches a plan P for s that exclusively produces tran- Procedures G EN FAIR P LAN and P ROCESS D EADENDS are sitions t such that L(t) = F. Then, policy π is extended inspired by procedures G EN P LAN PAIRS and P ROCESS - with information extracted from P, and the successors of s D EADENDS in the FOND planner PRP [Muise et al., 2012]. by π(s) are stacked. This is done by procedure S TACK S UC - G EN FAIR P LAN does two things. First, it searches for a plan CESSORS, which stacks each successor s0 into Open when P for s such that all transitions t have L(t) = F. For effi- ciency, such a plan may end in a goal state or in a state for Algorithm 2 Search of FOND+ Solutions which a policy action has already been computed. Second, 1: procedure P OLICY S EARCH(κ, V, s0 , s? , A) like PRP’s G EN P LAN PAIRS, it extends π with new state- 2: while π changes do action pairs. More precisely, for each pair hs, ai in the plan, 3: π←∅ G EN FAIR P LAN computes, via regression, the relevant part p 4: Open ← s0 of s so that the goal is still reachable (e.g. [Waldinger, 1977; 5: Seen ← {} Reiter, 2001]). Finally, it extends π with the (partial) state- 6: while Open 6= ∅ do action pair hp, ai. PRP maintains a list of forbidden state- 7: s = Open.pop() action pairs (FSAPs) that recognisably lead to a state where 8: if s.ucount > κ then break no strong-cyclic policy exists. FSAPs allow to prune the 9: end if search space effectively, and reduce search times consider- 10: if s 6|= s? ∧ s 6∈ Seen then ably, in practice. We apply the same ideas in our algorithm. 11: Seen.add(s) Whenever G EN FAIR P LAN fails to find a plan for state s (i.e., 12: if π(s) is undefined then s is a dead end), procedure P ROCESS D EADENDS computes 13: G EN B OUND P LAN(κ, V, s, s? , A, π) the reasons why s is a dead end and performs regression to 14: end if create a set of forbidden (partial) state-action pairs that need 15: if π(s) is defined then to be avoided in future searches. 16: hp, ai = π(s) Algorithm 1 finds normative solutions to a FOND+ prob- 17: S TACK S UCCESSORS(s, a) lem, as proved in Theorem 2. 18: else break Theorem 2. Algorithm 1 is sound and complete in the search 19: end if for normative solutions to a FOND+ problem. 20: end if 21: end while Proof sketch. Line 16 of Algorithm 1 stacks all states reach- 22: P ROCESS D EADENDS () able by π, Every state a reachable by π is considered in line 23: end while 8 of Algorithm 1. If no fair plan for s has been computed yet, 24: return π procedure G EN FAIR P LAN finds a fair plan for it. This proves 25: end procedure soundness of the algorithm. Procedure P OLICY S EARCH ex- Add Successors to Corresponding Stack plores all the states space, except those forbidden state-action pairs that are pruned from search by P ROCESS D EADENDS. 26: function S TACK S UCCESSORS(s, a) Therefore, the algorithm is complete. 27: for e ∈ Effa do 28: s0 = P rog(s, a, e) Algorithm 1 maintains soundness and completeness when 29: if L(s, a, s0 ) = F then procedure S TACK S UCCESSORS stacks all successors into the 30: s0 .ucount ← s.ucount Open stack – i.e. the use of Deferred stack is not necessary. 31: else However, there is an advantage to employing the Deferred 32: s0 .ucount ← s.ucount + 1 stack. The rationale appears with the interpretation that tran- 33: end if sitions with L(t) = F model the normative behaviour of the 34: Open.add(s0 ) system. Intuitively, we want faulty transitions to be repaired 35: end for so the system moves back to a normative state. When han- 36: end function dling of states originated by faulty transitions (L(t) = U) is deferred, it is more likely that plans found by procedure G EN - representation of states that includes, for each planning state FAIR P LAN plan to a state that is already handled by the pol- s explored, a counter s.ucount that records the number of icy rather than yielding a niew plan all the way to the goal. In transitions with L(t) = U produced to visit state s. The ini- other words, it is more likely that those plans are repairs rather tial state has s.ucount = 0. For a transition (s, a, s0 ), the than alternative plans. This is a desired property of solutions counter s0 .ucount equals s.ucount when L(s, a, s0 ) = F, and that justifies, in principle, the use of the Deferred stack. s.ucount + 1 when L(s, a, s0 ) = U. We abuse notation and say that s entails s0 when s |= s0 4.4 Search of FOND+ Solutions and s.ucount ≤ s0 .ucount. For a state s in the Open stack We presented algorithms to obtain strictly fair, strictly un- being processed, procedure G EN B OUND P LAN searches for fair, and normative solutions to FOND+ problems. In this plans P = s, a0 , s1 , a1 , . . . an , sn that either: (i) exclusively section, we present a search algorithm to obtain general solu- produce transitions t with L(t) = F and sn is a goal state, tions to FOND+ problems. (ii) exclusively produce transitions t with L(t) = F and sn For a FOND+ problem P, our algorithm is an iterated call entails a partial state for which the policy action is defined. or to Algorithm 2 with parameter κ = 0, 1, . . . until a solu- (iii) exclusively produce transitions t with L(t) = F except tion is found. Similar to normative fault-tolerant planning, (sn−1 , an , sn ), and all transitions produced by the progres- each iteration fixes the maximum number κ of transitions sion of sn−1 by an have L(t) = U. Intuitively, conditions with L(t) = U allowed to occur in any plan execution. How- (i),(ii),(iii) ensure that all L-fair plan executions achieve the ever, solutions do not presume that no more than κ faulty ef- goal. fects will occur. The algorithm makes use of an augmented In G EN B OUND P LAN, plans are regressed as in procedure G EN FAIR P LAN of Algorithm 1. Regressed states keep the Strong-Cyclic Normative counter s.ucount in each state s of the plan, which is not problem run-time size run-time size altered by plan regression. Procedure P ROCESS D EADENDS p2 0 3 0 3 – analogous to that of Algorithm 1 – keeps track of the state- p3 0.002 5 0.016 5 action pairs that recognisably lead to a dead end or a state p4 0.020 11 0.048 11 for which procedure G EN B OUND P LAN fails to find a plan, p5 0.070 27 0.178 27 where representation of states s include the counter s.ucount. p6 0.110 39 0.296 39 The list of forbidden state-action pairs is initialised in each p7 0.114 32 0.270 32 call to P OLICY S EARCH. We leave further optimizations of p8 0.150 26 0.356 26 the algorithm for future work. p9 0.278 46 0.664 46 Strictly speaking, solutions found by Algorithm 2 are not p10 0.336 49 0.782 49 policies. The reason is that the action performed in an aug- p11 0.522 120 1.936 97 mented state s, in general, may depend on the value of the p12 0.626 97 1.840 119.5 counter s.ucount. Certainly, the action returned by a policy p13 0.682 57 1.810 57 should depend only on the planning state, not on the counter. p14 3.794 1117 37.10 1123 We have two options to mitigate for this. One, is to construct p15 1.500 278 7.814 278 the policy that selects, for each planning state s, the action Table 1: Average run-time (sec.) and policy size of FOND among all (augmented partial) state-action pairs {p, a} such and FOND+ solutions to blocksworld-new problems. Strong- that s |= p and p.ucount is maximal. The second option is cyclic solutions presume fairness, whereas computation of to represent the solution found by Algorithm 2 as a FSC π normative solutions is informed by the labeling function. that keeps track, implicitly, of occurrence of transitions with L(t) = U along execution. pick-up is to pick-up the chosen block (or the tower of two Theorem 3. Algorithm 2 is sound and complete in the search blocks). Likewise, the primary effect of put-on-block is to put of FOND+ solutions whose plan executions produce a num- a block (or a tower of two blocks) that is held by the grip- ber of transitions t with L(t) = U that is bounded by κ. per on top of another block. The faulty effect for both actions Corollary 1. The iterated call to Algorithm 2 with parame- drops the block (or the tower of two blocks) on the table. Al- ters κ = 0, 1, . . . finds a solution to a FOND+ problem P, ternatively, we can interpret the gripper is guaranteed to suc- if such one exists. If a solution is found in the κ-th iteration, ceed in picking-up the block eventually, and so on. the number of transitions t with L(t) = U in plan execution is Table 1 summarizes the results of our tests. For each prob- bounded by κ. lem, we report the run-time and policy size, averaged over 10 runs per problem. Normative policies have, sometimes, 5 Experiments equal size to strong-cyclic policies. We conjecture that in these problems strong-cyclic policies are also robust to un- We conducted a series tests with the objective of demon- fairness (i.e. are also normative solutions to the FOND+ prob- strating the need for FOND+ solutions, that differ from lems). On the other hand, in some FOND+ problems – like strong-cyclic FOND solutions and that guarantee goal p11, p12, and p14– the sizes of the normative policy varies achievement when the classical fairness assumption is not with respect to the size of the strong-cyclic policy of the cor- valid. We selected a non-deterministic planning domain, the responding FOND problem. In those cases, normative poli- blocksworld-new, and a set of FOND problems. For each cies can be of greater or lesser size relative to strong-cyclic FOND problem, we constructed FOND+ problem by labeling policies. As normative FOND+ solutions are also solutions transitions. Finally, we compared FOND+ normative solu- to the FOND problem, this illustrates that policies found by tions with strong-cyclic FOND solutions to each of the prob- PRP are not necessarily of minimal size. Run-times of Algo- lems. We used PRP as the strong-cyclic planner; Algorithm 1 rithm 1 are, in general, two times greater than those of PRP is implemented on top of PRP. In all cases (also in Algorithm in those problems where the size of the policies generated 1), we configured PRP search with dead end detection dis- are the same. Comparing the run times of our algorithm with abled, and strong-cyclic-detection optimization disabled. Ex- those of PRP is not fair, as the algorithms find solutions to periments were run on an Intel Xeon E5-2430 CPU @2.2GHz different problems. However, general run-times suggest that Linux server, limiting processes to 2GB of memory usage. the branch-and-bound technique used in Algorithm 2 is, in The blocksworld-new problems used in our tests have been principle, efficient in the search for normative solutions. As extracted from Muise et al. [2012]. They are based on the the implementation of Algorithm 2 is preliminary and not op- blocksworld benchmark used in past International Probabilis- timal, we believe there is room for further improvements in tic Planning Competitions (IPPCs), and include some fixes its performance. and larger problem sizes with respect to the original instances. In the blocksworld-new domain, the gripper can pick up a block (or two blocks at once) and put it on top of other block, 6 Summary and Future Work or on the table. The goal is to set the blocks in a certain or- There exist a diversity of models for planning in the face der. The action that puts blocks on the table is determinis- of non-deterministic action outcomes including probabilistic tic. However, the pick-up and put-on-block actions have non- planning, MDPs, and fault-tolerant planning. We focus our deterministic effects. More precisely, the primary effect of attention on FOND planning. Solutions to FOND planning problems are required to be robust to non-deterministic ac- [Jaramillo et al., 2014] Andres Calderon Jaramillo, Jicheng Fu, tion effects, and to guarantee goal achievement during execu- Vincent Ng, Farokh B Bastani, and I-Ling Yen. Fast strong tion. The class of strong solutions provides such guarantees, planning for fond problems with multi-root directed acyclic but the existence of such solutions is restricted in general. graphs. International Journal on Artificial Intelligence Tools, On the other hand, the soundness of strong-cyclic solutions 23(06):1460028, 2014. is predicated on fairness, and therefore its execution does not [Jensen et al., 2004] Rune M Jensen, Manuela M Veloso, and Ran- guarantee goal achievement, in general, when the fairness as- dal E Bryant. Fault tolerant planning: Toward probabilistic uncer- sumption is not valid. tainty models in symbolic non-deterministic planning. In Proc. of the 14th Int’l Conference on Automated Planning and Scheduling To mitigate for this, we introduced the model for Fully Ob- (ICAPS), pages 335–344, 2004. servable Non-Deterministic Planning with Labeled Unfair- ness (FOND+ ), where fairness of action transitions can be [Kolobov et al., 2009] Andrey Kolobov, Mausam, and Daniel S described explicitly by means of the labeling function L, and Weld. Retrase: Integrating paradigms for approximate proba- bilistic planning. In Proc. of the 21st Int’l Joint Conference on solutions guarantee goal achievement provided that execu- Artificial Intelligence (IJCAI), pages 1746–1753, 2009. tions are L-fair. We leave complexity analysis of our model to future work. [Kupferman and Vardi, 1996] Orna Kupferman and Moshe Y Vardi. Verification of fair transition systems. In Computer Aided We distinguished four classes of solutions to FOND+ plan- Verification, pages 372–382. Springer, 1996. ning problems. Namely, strictly fair, strictly unfair, mixed, and normative. Strictly fair (resp. strictly unfair) solutions [Lussier et al., 2007] Benjamin Lussier, Matthieu Gallien, Jérémie are strong-cyclic (resp. strong) solutions to FOND relax- Guiochet, Félix Ingrand, Marc-Olivier Killijian, and David Pow- ell. Fault tolerant planning for critical robots. In 7th Annual ations of the problem. This results in a continuum between IEEE/IFIP Int’l Conf. on Dependable Systems and Networks strong planning and strong-cyclic planning. Additionally, we (DSN), pages 144–153. IEEE, 2007. found a connection between normative solutions and fault- [Muise et al., 2012] Christian Muise, Sheila A. McIlraith, and tolerant planning. In particular, when primary action transi- J. Christopher Beck. Improved Non-deterministic Planning by tions have L(s, a, s0 ) = F, normative solutions are solutions Exploiting State Relevance. In Proc. of the 22th Int’l Conference to 1-primary normative models of fault-tolerant planning. No- on Automated Planning and Scheduling (ICAPS), pages 172– tably, our model does not impose a bound on the number 180, 2012. of faults during policy execution, and solutions are robust to [Pineda et al., 2013] Luis Enrique Pineda, Yi Lu, Shlomo Zilber- any finite number of faults during execution. This overcomes stein, and Claudia V Goldman. Fault-tolerant planning under un- the limitation of fault-tolerant planning, where that the num- certainty. In Proc. of the 23rd Int’l Joint Conference on Artificial ber of faults κ that a plan can tolerate must be designated Intelligence (IJCAI), 2013. prior to policy generation and execution. We leave further ex- [Puterman, 1994] Martin Puterman. Markov Decision Processes: ploration of the connection between the FOND+ model and Discrete Dynamic Programming. Wiley, New York, 1994. fault-tolerant planning to future work. Finally, we presented algorithms to find different classes [Reiter, 2001] Raymond Reiter. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Sys- of solutions to FOND+ planning problems, and an algorithm tems. MIT Press, Cambridge, MA, 2001. to find general solutions to FOND+ . Our algorithms aim to provide a baseline for future enhancements, although the per- [Sardina and D’Ippolito, 2015] Sebastian Sardina and Nicolas formance of Algorithm 1 suggests that a branch-and-bound D’Ippolito. Towards fully observable non-deterministic planning as assumption-based automatic synthesis. In Proc. of the 24th search scheme that explores deterministic plans in a certain Int’l Joint Conference on Artificial Intelligence (IJCAI), pages order is effective. We leave the design and evaluation of bet- 3200–3206. AAAI Press, 2015. ter algorithms to future work. [Trevizan et al., 2007] Felipe W Trevizan, Fabio Gagliardi Coz- man, and Leliane Nunes de Barros. Planning under risk and References knightian uncertainty. Proc. of the 20th Int’l Joint Conference [Cimatti et al., 2003] Alessandro Cimatti, Marco Pistore, Marco on Artificial Intelligence (IJCAI), 2007:2023–2028, 2007. Roveri, and Paulo Traverso. Weak, strong, and strong cyclic [Waldinger, 1977] Richard Waldinger. Achieving several goals si- planning via symbolic model checking. Artificial Intelligence, multaneously. In Machine Intelligence 8, pages 94–136. Ellis 147:35–84, 2003. Horwood, Edinburgh, Scotland, 1977. [Delgrande and Levesque, 2013] James P Delgrande and Hector J Levesque. A formal account of nondeterministic and failed ac- tions. In Proc. of the 23rd Int’l Joint Conference on Artificial Intelligence (IJCAI). Citeseer, 2013. [D’Ippolito et al., 2011] Nicolás D’Ippolito, Victor Braberman, Nir Piterman, and Sebastián Uchitel. Synthesis of live behaviour models for fallible domains. In 33rd Int’l Conf. on Software En- gineering (ICSE), pages 211–220. IEEE, 2011. [Domshlak, 2013] Carmel Domshlak. Fault tolerant planning: Complexity and compilation. In Proc. of the 23th Int’l Confer- ence on Automated Planning and Scheduling (ICAPS), 2013.