=Paper= {{Paper |id=Vol-1648/paper11 |storemode=property |title=Strong-Cyclic Planning when Fairness is Not a Valid Assumption |pdfUrl=https://ceur-ws.org/Vol-1648/paper11.pdf |volume=Vol-1648 |authors=Alberto Camacho,Sheila A. McIlraith |dblpUrl=https://dblp.org/rec/conf/ijcai/CamachoM16 }} ==Strong-Cyclic Planning when Fairness is Not a Valid Assumption== https://ceur-ws.org/Vol-1648/paper11.pdf

Strong-Cyclic Planning when Fairness is Not a Valid Assumption

Alberto Camacho Sheila A. McIlraith
Department of Computer Science Department of Computer Science
University of Toronto, Canada University of Toronto, Canada
acamacho@cs.toronto.edu sheila@cs.toronto.edu

Abstract description of the actions and their effects that associates a
non-zero probability with alternative action outcomes. While
We address the class of fully-observable non- less expressive than MDPs, such systems are often easier to
deterministic (FOND) planning problems where solve and show better scalability.
non-deterministic actions are not guaranteed to dis- In cases where the stochasticity of a system is not well
play fair behaviour. Typical solutions to FOND understood, or deemed unnecessary, Fully Observable Non-
planning problems either guarantee goal reachabil- Deterministic (FOND) planning [Cimatti et al., 2003] pro-
ity with a bounded number of transitions (so-called vides a framework, similar to probabilistic planning, but with
strong acyclic solutions), or are predicated on a fair- the exception that the effects of actions are non-deterministic
ness assumption that presumes each action effect without commitment to the probability of alternative action
occurs infinitely often when the action is applied effects. Solutions to FOND planning problems are policies
infinite times in the same state (so-called strong- that map states into actions. Cimatti et al. [2003] identi-
cyclic solutions). We introduce the FOND+ plan- fied three different classes of solutions to FOND problems:
ning model, that extends the FOND model with weak, strong, and strong-cyclic. Weak solutions are plans that
a more informed description of the action tran- achieve the goal, but without guarantees. Strong solutions
sitions. Solutions to FOND+ problems guarantee guarantee goal achievement in all executions. This condition
goal reachability, even when the classical fairness is too restrictive, as FOND problems often do not have strong
assumption is not valid. We characterize different solutions. The class of strong-cyclic solutions is an alterna-
topologies, or classes, of solutions showing that tive, and guarantees goal achievement provided that all exe-
typical strong acyclic, and fair strong-cyclic plan- cutions are fair. When the fairness assumption is not valid,
ning are particular cases. Moreover, we character- policy executions may not achieve the goal.
ize a class of FOND+ solutions that also solve 1-
primary normative fault-tolerant problems, and are In 2004, Jensen et al. introduced the notion of fault-
robust to any finite number of faults during execu- tolerant planning, motivated by two observations. First, they
tion. We present algorithms to solve each class of interpreted non-determinism in many real-world problems as
problems. Finally, we present the results of one of the consequence of infrequent errors (so-called faults) during
our algorithms in the search for so-called norma- execution. Second, they claimed that in general, no action is
tive solutions for a selection of blocksworld prob- ever guaranteed to achieve non-faulty behaviour. The resul-
lems where the classical fairness assumption is not tant framework for fault-tolerant planning includes complete
valid. knowledge about which state transitions are normative and
which are otherwise faulty. Solutions attempt to reach a pre-
scribed goal under the assumption that no more than κ faults
1 Introduction will occur during execution.
Most real-world dynamical systems are best modeled us- In this paper we recognize that there exist non-
ing non-deterministic action effects, reflecting our inability deterministic transition systems where the fairness assump-
to characterize, with certainty, the outcome of actions, and tion does not hold, and the challenge we address is to find so-
in some cases to additionally model the effects of exoge- lutions that guarantee goal achievement when strong FOND
nous events. A number of different frameworks have been solutions do not exist. To this end, we introduce the FOND+
proposed to model and plan with non-deterministic transition planning framework as an extension to the FOND frame-
systems. For the purposes of this paper, we limit our discus- work that includes a more informed description of transitions.
sion to fully-observable systems. Markov Decision Processes The FOND+ framework makes it possible to find policies
(MDPs) [Puterman, 1994], for example, model the effect of that guarantee goal achievement when the classical fairness
an action in each state as a probability distribution over the assumption is not valid. Alternatively, the FOND+ frame-
successor states. In contrast, many probabilistic planning sys- work can be informed with a description of the normative
tems (e.g. [Kolobov et al., 2009]) exploit a compact symbolic behaviour of the system. In this case, solutions to certain
FOND+ problems are also solutions to 1-primary normative appears infinitely often in σ, but the transition (s, a, s0 ) oc-
fault-tolerant planning problems. We define different classes curs a finite number of times for an outcome s0 ∈ T (s, a).
of solutions to FOND+ problems, and introduce algorithms Executions that are not unfair are said to be fair.
to find solutions for each class. We identify a continuum
Definition 2 (adapted from [Cimatti et al., 2003]). A solution
of FOND+ solutions that includes strong, strong-cyclic, and
to a FOND problem is strong cyclic when all executions result
fault-tolerant planning solutions in extreme cases. This re-
in either finite sequences of states that terminate in a goal
sult is perhaps most related to the work done by Trevizan et
state, or are infinite and unfair.
al. [2007], which defines a framework to deal simultaneously
with risk and uncertainty in an attempt to unify probabilis- Cimatti et al. [2003] distinguish three classes of solutions
tic and non-deterministic planning. The results presented in to a FOND planning problem: weak, strong, and strong cyclic.
this paper open the door to further advancements in the uni- Weak solutions are plans that lead the agent from the initial
fication of FOND, fault-tolerant, and probabilistic planning state to a goal state, but with no guarantees due to the non-
frameworks. determinism of the actions. Strong solutions are plans that
guarantee goal achievement. Necessarily, strong solutions are
2 Background acyclic, and achieve the goal in a bounded number of steps.
A non-deterministic planning domain is a tuple D = Finally, strong cyclic solutions are plans that are guaranteed
hF, S, A, T i, where F is a set of propositions; S ⊆ 2F is to achieve the goal, but predicated on the fairness assump-
a finite set of states, typically represented by the set of propo- tion given by Definition 1. The fairness assumption presumes
sitions that hold; A is a finite set of actions; and T : S × A → that, for each state s and each action a applicable in s, every
2S is a transition function. outcome of a occurs infinitely many times if a is applied in
s infinitely often. Strong solutions are also strong-cyclic, but
An actions a ∈ A is defined in terms of its preconditions
the opposite is not always true.
P rea and effects Eff a . P rea ⊆ F is the set of propositions
that need to hold true in a state s for a to be applicable in s,
2.2 Fault-Tolerant Planning
i.e., P rea ⊆ s. Eff a = hEff 1a , . . . , Eff na i is a finite set of
possible effects of a. Each effect e ∈ Eff a is a tuple of the A fault-tolerant planning problem is a tuple P =
form e = hAdde , Dele i, where Adde and Dele are sets of hD, s0 , SG , F, κi, where D = hF, S, A, T i is a non-
propositions to be added and deleted, respectively, when an deterministic planning domain, s0 ∈ S is the initial state,
action transition is applied. The progression of s with respect SG ⊆ S is a set of goal states, F is an exception model as
to action a and effect e is the updated state P rog(s, a, e) = described in Definition 3, and κ is a constant [Jensen et al.,
(s \ Dele ) ∪ Adde . Finally, the 2004; Domshlak, 2013]. Informally, a fault-tolerant planning
S result of applying a in s is task P requires planning for goal achievement under the as-
one of the states in T (s, a) = e∈Eff a P rog(s, a, e). A state
sumption that no more than κ exceptions will occur during
s0 ∈ T (s, a) is said to be an outcome of a in s, and the triplet plan execution.
(s, a, s0 ) is called the transition. An execution of domain D
in state s0 is a state-action sequence s0 , a0 , s1 , a1 . . . such Definition 3 (adapted from [Domshlak, 2013]). For a non-
that ai is applicable in si for each i = 0, 1 . . ., and si+1 is deterministic Splanning domain D, an exception model is a
one of the outcomes of ai in si chosen non-deterministically function F : a∈A Eff a → N, computable in time polyno-
by the system. When s0 = P rog(s, a, e), the regression mial in the size of P. If, for each action a ∈ A, |{e | e ∈
of s0 with respect to action a and effect e is the partial Eff a , F (e) = 0}| ≤ α, then F is called α-primary. Likewise,
state Regr(s0 , a, e) = (s0 \ Adde ) ∪ P rea . Conceptually, if, for each action a ∈ A, |{e | e ∈ Eff a , F (e) = 0}| > 0,
Regr(s0 , a, e) is the minimal subset of propositions that a then F is called normative.
state s needs to entail so that s0 is an outcome of a in s. Ac-
In a fault-tolerant planning problem, actions have two types
tions can be applied in partial states, whose progression and
of effects. Primary effects model the normative behaviour of
regression is defined as above (cf. [Muise et al., 2012]).
the system, while faulty effects model one or more failures
(faults) of the system. Primary effects, e, have F (e) = 0,
2.1 Fully Observable Non-Deterministic Planning whereas F (e) > 0 when e is faulty. Intuitively, the exception
A fully-observable non-deterministic (FOND) planning model F maps each effect to the number of exceptions, or
problem is a tuple P = hD, s0 , SG i, where D = hF, S, A, T i faults, associated with it.
is a non-deterministic planning domain, s0 ∈ S is the initial Solutions to a fault-tolerant planning problem P =
state, and SG ⊆ S is a set of goal states. Solutions to FOND hD, s0 , SG , F, κi are κ-plans, or policies that achieve the
planning problems are policies that map states into actions. goal under the assumption that no more than κ faults
We say that π is well-defined when π(s) is defined in every will occur during execution. Formally, a plan P is κ-
state reachable by π. Executions of a well-defined policy π admissible when its execution generates the state-effect se-
(also called plan executions) are executions s0 , a0 , s1 , a1 . . . quence (s0 , e0 , . . . , si , ei , . . .) with Σi F (ei ) ≤ κ. A policy is
where, for each i = 0, 1 . . . , ai = π(si ). When the execu- a κ-plan when all plan executions are κ-admissible and reach
tion finishes in a goal state sn ∈ SG , the sequence of actions the goal. Solutions to fault-tolerant planning problems do not
P = a0 , a1 , . . . , an−1 is called a plan. assume fairness as done in strong-cyclic FOND planning, and
Definition 1 (adapted from [Cimatti et al., 2003]). We say are robust to occurrence of up to κ faults. Note that the pa-
that an execution σ is unfair when a state-action tuple s, a rameter κ that sets the maximum number of exceptions in a
solution is part of the problem description, and needs to be
A A
known, computed, or otherwise guessed in advance. B B
Jensen et al. [2004] introduced the model for fault-tolerant
planning, and addressed 1-primary normative models with the
rationale that these satisfactorily address non-determinism A A
originating from many real-world physical system models.
B A B B A B
Fault-tolerant planning has found applications in the robot
community (e.g. [Lussier et al., 2007]). More recently, fault-
tolerant planning has been studied by Domshlak [2013] and (a) Strong-Cyclic Solution (b) Normative Solution
Delgrande and Levesque [2013]. Recent work by Pineda et
al. [2013] addressed the problem of fault-tolerant planning Figure 1: Different policies for the blocks-world domain. The
under uncertainty, tackling the problem from the perspective pick-up action is non-deterministic, and may pick-up a certain
of a stochastic shortest-path (SSP) problem. block or drop it on the table. The put-on-block and put-block-
on-table actions are deterministic.
3 Planning with Unfair Non-Determinism Illustrative Example
We seek to generate solutions to goal-oriented planning Consider the blocks-world domain, where block A is ini-
problems with non-deterministic action effects where the fair- tially on top of block B, and the goal is to put A on the ta-
ness assumption does not necessarily hold. Different notions ble. The agent can pick up blocks and put them on top of
of fairness over executions have been discussed in the liter- other blocks, or on the table. The pickup-block action is non-
ature. In particular, D’Ippolito et al. [2011] defined strong deterministic and the gripper may drop the block onto the ta-
independent fairness, a notion that presumes it is known, for ble, but is guaranteed to eventually pick-up the block if tried
each action, which outcomes are good and which ones are repeated times. The put-on-block (resp. put-block-on-table)
failures (similar to the distinction made in fault-tolerant plan- action is deterministic and puts the block held by the grip-
ning), and requires failures and assumptions – normally, tem- per on top of another block (resp. on the table). Strong-cyclic
porally extended constraints – on the environment behaviour FOND solutions that assume fairness may rely on the even-
to be independent. Sardina and D’Ippolito [2015] defined dif- tual occurrence of the faulty effect of pickup-block in order
ferent logical characterizations of fairness based on the no- to reach the goal. Figure 1a shows the strong-cyclic solution
tions introduced by D’Ippolito et al. [2011]. Kupferman and that picks up block A and puts it on top of B repeatedly until
Vardi [1996] proved that verification of fair transition systems block A is dropped on the table. If it is the case that the effect
is PSPACE-complete. of the pickup-block action that drops the block is not guaran-
teed to occur during execution, the solution described above
Our premise is that in many transition systems, and in con- does not guarantee goal achievement. In contrast, the solu-
crete non-deterministic planning domains, it is known which tion illustrated in Figure 1b is strong, and goal achievement
state-action pairs s, a are guaranteed to produce, eventually, is guaranteed.
certain outcomes s0 if a would be applied in s infinitely often. A similar pattern appears when we consider that the block
While not true in general, it is a valid assumption for many being picked up is the normative behaviour of the pickup-
real-world transition systems. For example, it is known that block action. The solution described in Figure 1a loops until
pressing the elevator’s button on the wall repeatedly causes the pick-up action is faulty, which is certainly not a good qual-
the elevator’s doors to open. On the other hand, it is known ity solution because the normative behaviour of the system
there is no guarantee that a fixed number will be awarded in a can lead to the goal, regardless of whether the faulty effect is
lottery, even if the lottery is played infinitely many times un- guaranteed to occur or not. Note that the solution described
der identical conditions. When fairness does not hold, solu- in Figure 1b is of good quality because executions lead to the
tions need to be robust to non-determinism, and goal achieve- goal when the behaviour of the system is normative, and the
ment in plan executions should not rely on a particular tran- solution is robust to non-determinism.
sition for which there is no guarantees of occurrence. A sim-
ilar argument can be made in non-deterministic domains that 3.1 The FOND+ Model
have known normative behaviour. In the scope of this paper, A Fully Observable Non-Deterministic Planning problem
we focus our attention on 1-primary normative fault-tolerant with Labeled UnfairnesS (for short, FOND+ problem) is a
planning problems. In these problems, solutions need to be tuple P = hD, s0 , SG , Li, where D = hF, S, A, T i is a non-
robust to failures, and the goal needs to be achievable when- deterministic planning domain, s0 ∈ S is the initial state,
ever the system manifests its normative behaviour. SG ⊆ S is a set of goal states, and L : S×A×S → {F, U} is a
The following example motivates the need for policy gen- labeling function that maps each triplet (s, a, s0 ) ∈ S ×A×S
eration mechanisms that guarantee goal achievement in those into one of the symbols F or U. The semantics of L supports
non-deterministic domains where the fairness assumption is different interpretations. Before elaborating, we first define
not valid. Further, it illustrates a parallelism between solu- what constitutes an L-fair execution. For a non-deterministic
tions to goal-oriented planning problems where fairness gov- planning domain D = hF, S, A, T i, and labeling function
erns the non-determinism of the actions, and those where the L : S × A × S → {F, U}, we say that an execution in state s0
non-deterministic actions have primary and faulty effects. is L-unfair when there exists a state-action tuple (s, a) such
that (i) (s, a) appears infinitely often, and (ii) there exists a
transition (s, a, s0 ) such that L(s, a, s0 ) = F and (s, a, s0 ) oc- Strictly Fair
curs a finite number of times. Executions that are not L-unfair
are said to be L-fair. Note that fairness, as defined by Cimatti
et al. [2003], is a particular case of L-fair that occurs when
L assigns F to all transitions. Solutions to a FOND+ problem Strictly Unfair
are policies that guarantee goal achievement, predicated on Normative
the assumption that all executions of D in s0 are L-fair.
Mixed
Definition 4. A FOND+ problem is a tuple P =
hD, s0 , SG , Li, where D = hF, S, A, T i is a non- Figure 2: Distribution of different classes of solutions to
deterministic planning domain, s0 ∈ S is the initial state, FOND+ planning problems.
SG ⊆ S is a set of goal states, and L : S × A × S → {F, U}
is a labeling function. class of normative solutions that is particularly interesting for
Definition 5. Solutions to a FOND problem P = + its practical applications and proximity with solutions to 1-
hD, s0 , SG , Li are policies that guarantee goal achievement, primary normative fault-tolerant planning problems.
predicated on the assumption that all executions of D in s0 A solution π to a FOND+ problem is strictly fair when
are L-fair. all transitions t produced by L-fair plan executions have
L(t) = F. Strictly fair solutions to a FOND+ problem P =
In the scope of this paper, we focus our attention on two hD, s0 , SG , Li are strong-cyclic solutions to the FOND prob-
different interpretations of the labeling function. The first in- lem P 0 = hD, s0 , SG i that results from ignoring the labeling
terpretation assumes it is known, for each state-action pair function in P. The opposite is not always true, and strong-
(s, a), which transitions (s, a, s0 ) are not guaranteed to oc- cyclic solutions to P’ may or may not be strictly fair solutions
cur infinitely often in executions where the pair (s, a) ap- to P – only when all transitions t produced by plan execu-
pears an infinite number of times. In those transitions, we as- tions have L(t) = F. In general, a strong cyclic solution can
sign L(s, a, s0 ) = U. Otherwise, we assign L(s, a, s0 ) = F. belong to any class of FOND+ solutions. Like strong-cyclic
With this interpretation, all plan executions are L-fair and solutions, strictly fair solutions can contain cycles.
FOND+ solutions guarantee goal achievement during ex-
Definition 6. A solution π to a FOND+ problem is strictly
ecution. In other words, FOND+ solutions are robust to
fair when all transitions t produced by L-fair plan executions
non-deterministic transitions during plan execution, and goal
have L(t) = F.
achievement does not rely on any specific transition for
which there is no guarantees of eventual occurrence. The The class of strictly unfair solutions is analogous to the
FOND+ model overcomes difficulties of the FOND model class of strictly fair solutions, this time requiring transitions
when strong solutions do not exist, as strong-cyclic solutions t produced by L-fair plan executions have L(t) = U. Strictly
do not guarantee goal achievement, in general, when Cimatti unfair solutions to a FOND+ problem P = hD, s0 , SG , Li
et al.’s fairness assumption is not valid. are strong solutions to the FOND problem P 0 = hD, s0 , SG i
The second interpretation we examine describes the nor- that results from ignoring the labeling function in P. The op-
mative behaviour of the system, and assigns L(s, a, s0 ) = F posite is not always true either, and strong solutions to P’ are
(resp. L(s, a, s0 ) = U) when s0 follows from a primary (resp. strictly unfair solutions to P only when all transitions t pro-
faulty) effect of a in s. With this interpretation, an execution duced by plan executions have L(t) = U. In general, a strong
is L-fair when it is finite, or when it is infinite and for each solution can belong to any class of FOND+ solutions. Like
state-action pair (s, a) that appears infinitely often, the system strong solutions, strictly unfair solutions are acyclic.
responds with each primary effect of a in s infinitely often. As Definition 7. A solution π to a FOND+ problem is strictly
we will see later, this interpretation is particularly interesting unfair when all transitions t produced by L-fair plan execu-
in certain FOND+ planning problems, and makes it possible tions have L(t) = U.
to find policies that guarantee goal achievement under the as- The class of mixed solutions completes the space of so-
sumption that the number of transitions produced by faulty lutions to a FOND+ planning problem. Mixed solutions are
effects during plan execution is finite. those that are neither strictly fair nor strictly unfair. Plan ex-
In the blocksworld example introduced above, and with ecutions s0 , a0 , s1 , a1 . . . sn are such that sn is a goal state,
both interpretations, all transitions t have L(t) = F except and may contain cycles. More precisely, there may exist
the transitions produced by the effect of the pick-up action k < m < n such that sk = sm . As all infinite plan exe-
that drops the block on the table, which have L(t) = U. With cutions need to be L-unfair (cf. Definition 5), at least one of
this labeling, the policy depicted in Figure 1b is a FOND+ the outcomes of si by ai , for a certain k ≤ i ≤ m, must yield
solution, whereas the policy depicted in Figure 1b is not. a transition (si , ai , s0i+1 ) with L(si , ai , s0i+1 ) = F.
3.2 Classes of Solutions Definition 8. A solution π to a FOND+ problem is mixed
We characterize different classes of solutions to FOND+ when it is neither strictly fair nor strictly unfair.
problems according to the fairness of plan executions. Finally, we characterize the class of normative solutions,
Namely, we distinguish between strictly fair, strictly un- which are particularly interesting when the labeling function
fair, and mixed solutions. Furthermore, we characterize the describes the normative behaviour of the system – i.e., when
transitions (s, a, s0 ) with label L(s, a, s0 ) = F are exactly model and its solutions. More precisely, normative FOND+
those produced by a primary effect of a in s. A solution π solutions are robust to any number of faulty transitions dur-
is normative when, in each state s, reachable by π: (i) there ing execution. This is an advantage with respect to solutions
exists a plan execution in s that reaches the goal and such to fault-tolerant planning, which are robust to a limited num-
that all transitions t have L(t) = F, and (ii) exactly one out- ber of κ faults during execution. Besides, κ is a parameter of
come of s by π(s) produces a transition t with L(t) = F. the model that has to be fixed prior search, even if a solution
The class of normative solutions intersects with the class of robust to any number of faults exists.
strictly fair solutions in those solutions that are deterministic
plans (cf. Figure 2). 4 Search Algorithms for FOND+
+
Definition 9. A FOND solution π is normative when, in This section presents algorithms to search for strictly fair,
each state s, reachable by π: (i) there exists a plan execution strictly unfair, and normative solutions to FOND+ planning
in s that reaches the goal and such that all transitions t have problems. This is followed by an algorithm that is, in prin-
L(t) = F, and (ii) exactly one outcome of s by π(s) produces ciple, less efficient, but can search for general solutions
a transition t with L(t) = F. to FOND+ problems. Notwithstanding, the algorithms pre-
sented below are intended to provide a baseline for future im-
With the interpretation of the labeling function that de- provements, and their main purpose is to demonstrate the ex-
scribes the normative behaviour of the system, policy actions istence of methods to find solutions to FOND+ . We decided
of normative solutions are required to have, by definition, one to follow a branch-and-bound search scheme, motivated by its
primary effect. Solutions need to be robust to faulty transi- efficiency in the search for strong-cyclic solutions to FOND
tions and guarantee goal achievement, at any point during problems [Muise et al., 2012]. The similarity between the
execution, when the system follows its normative behaviour. FOND+ and FOND planning models suggests that a similar
In order to facilitate comparison with the fault-tolerant plan- technique may be effective in the search for FOND+ solu-
ning model, we assume the value L(s, a, s0 ) depends only tions.
on the effect that produces the outcome s0 , but not on the
state s itself. Formally, we assume L(s, a, P rog(s, a, e)) = 4.1 Search for Strictly Fair Solutions
L(s0 , a, P rog(s0 , a, e)) for all actions a ∈ A, all effects e By definition, all transitions t produced by plan executions
of a, and all pairs of states s, s0 ∈ S in which a is applica- of strictly fair solutions are such that L(t) = F. It is straight-
ble. When this property holds, the labeling function can be forward to see that strictly fair solutions to a FOND+ problem
described compactly as a function of the actions effects. P = hD, s0 , SG , Li are all and only those strong-cyclic solu-
With the restrictions described above, normative solutions tions to the FOND problem P 0 = hD0 , s0 , SG i. Intuitively,
to a FOND+ problem P = hD, s0 , SG , Li are also 1-primary D0 is like D, but the actions applicable in a given state s
normative solutions 1 to fault-tolerant planning problems are restricted to those a’s that only yield transitions (s, a, s0 )
P 0 = hD, s0 , SG , F, κi with an exception model F that as- labeled with L(s, a, s0 ) = F. This constraint can be imple-
signs faults F (e) > 0 to effects e that produce transitions mented by means of extra dummy fluents in D0 . Alternatively,
(s, a, P rog(s, a, e)) such that L(s, a, P rog(s, a, e)) = U. when L accepts a compact description such that its value de-
Note that normative FOND+ solutions are robust to occur- pends only on the effects of the actions that yield each transi-
rence of any possible number of faults during execution, as tion, we can safely prune an action a from D0 when not all its
opposed to standard fault-tolerant solutions, whose execu- effects yield transitions t with L(t) = F.
tions are guaranteed to be robust up to a bounded number For a FOND+ problem P, the algorithm consists of two
of faults κ specified by the model. steps:
Theorem 1. Let P = hD, s0 , SG , Li be a FOND+ prob- 1. P is relaxed into a FOND problem P 0 as described
lem, and let P 0 = hD, s0 , SG , F, κi be a 1-primary normative above.
fault-tolerant planning problem with exception model F that
assigns faults F (e) > 0 to effects e that produce transitions 2. A sound and complete strong-cyclic FOND planner –
(s, a, P rog(s, a, e)) such that L(s, a, P rog(s, a, e)) = U. e.g. PRP [Muise et al., 2012] – is used to search for
Then, FOND+ normative solutions to P are 1-primary nor- a strong-cyclic solution to P 0 , which is returned as a
mative solutions to P’ for any κ < ∞. strictly fair solution to P.
Since strictly fair solutions to P are the strong-cyclic solu-
The practicality of 1-primary normative fault-tolerant plan-
tions to the FOND problem P 0 , soundness and completeness
ning to model and solve many real-world physical prob-
of the algorithm derives from soundness and completeness of
lems has been discussed by Jensen et al. [2004] and Domsh-
the strong-cyclic FOND planner.
lak [2013]. Normative FOND+ solutions overcome certain
limitations of the 1-primary normative fault-tolerant planning 4.2 Search for Strictly Unfair Solutions
1 By definition, all transitions t produced by plan executions
For convenience, we informally say that a solution π is α-
primary (resp. normative) when |{e | e ∈ Eff a , F (e) = 0}| ≤ α of strictly unfair solutions are such that L(t) = U. The algo-
(resp. |{e | e ∈ Eff a , F (e) = 0}| > 0) in all effects from plan rithm to find strictly unfair solutions is somewhat analogous
executions of π. Technically, the α-primary normative model intro- to the algorithm used to find strictly fair solutions. This time,
duced by Domshlak [2013] requires F to be α-primary and norma- however, we exploit the correspondence between strictly un-
tive globally, which is a harder constraint. fair solutions to a FOND+ problem P = hD, s0 , SG , Li
and strong FOND solutions to the FOND problem P 0 = Algorithm 1 Search for Normative Solutions
hD0 , s0 , SG i. Intuitively, D0 is like D, but the actions applica- 1: procedure P OLICY S EARCH(V, s0 , s? , A)
ble in a given state s are restricted to those a’s that only yield 2: while π changes do
transitions (s, a, s0 ) labeled wth L(s, a, s0 ) = U. This con- 3: π←∅
straint can also be implemented by means of extra dummy 4: Open ← s0
fluents in D0 . Alternatively, when L accepts a compact de- 5: Seen ← {}
scription such that its value depends only on the effects of 6: Deferred ← {}
the actions that yield each transition, we can safely prune 7: while Open 6= ∅ ∨ Deferred 6= ∅ do
an action a when not all its effects yield transitions t with 8: s =U NSTACK ()
L(t) = U. 9: if s 6|= s? ∧ s 6∈ Seen then
For a FOND+ problem P, the algorithm consists of two 10: Seen.add(s)
steps: 11: if π(s) is undefined then
1. P is relaxed into a FOND problem P 0 as described 12: G EN FAIR P LAN(V, s, s? , A, π)
above. 13: end if
2. A sound and complete strong FOND planer – e.g. 14: if π(s) is defined then
[Jaramillo et al., 2014] – is used to search for a strictly 15: hp, ai = π(s)
unfair solution to P 0 , which is returned as a strictly un- 16: S TACK S UCCESSORS(s, a)
fair solution to P. 17: else break
18: end if
Since strictly unfair solutions to P are the strong solutions 19: end if
to the FOND problem P 0 , soundness and completeness of the 20: end while
algorithm derives from soundness and completeness of the 21: P ROCESS D EADENDS ()
strong FOND planner. 22: end while
23: return π
4.3 Search for Normative Solutions 24: end procedure
Normative solutions are those where, for every reachable
Add Successors to Corresponding Stack
state s, there exists a plan for s for which all transitions t have
L(t) = F. Algorithm 1 exploits this property. Intuitively, the 25: function S TACK S UCCESSORS(s, a)
algorithm explores plans that exclusively produce transitions 26: for e ∈ Effa do
t such that L(t) = F, which are extended into robust policies 27: s0 = P rog(s, a, e)
that also consider occurrence of other transitions. 28: if L(s, a, s0 ) = F then
Without loss of generality, we assume the problem has the 29: Open.add(s0 )
property that all actions produce, in each state s in which a 30: else
is applicable, exactly one transition t such that L(t) = F. If 31: Deferred.add(s0 )
this property is not true, and the value of L(s, a, s0 ) in any 32: end if
state s depends only on the effect of a that generates outcome 33: end for
s0 , actions that do not have exactly one effect that produces a 34: end function
transition t such that L(t) = F can be safely pruned in a pre- Get Next Node to Explore
processing stage. Alternatively, a constraint during search can 35: procedure U NSTACK
be added, so that a state s can be expanded by action a only 36: if OpenF 6= ∅ then
when exactly one outcome produces a transition t such that 37: s = Open.pop()
L(t) = F. If this check is not performed, Algorithm 1 still 38: else
finds solutions (perhaps non-normative mixed solutions) that 39: s = Deferred.pop()
guarantee goal achievement when the interpretation of the la- 40: end if
beling function describes fairness, but goal achievement dur- 41: return s
ing execution is not guaranteed anymore when the labeling 42: end procedure
function describes the normative behaviour of the system.
Procedure P OLICY S EARCH in Algorithm 1 maintains three L(s, π(s), s0 ) = F, and stacks s0 into Deferred otherwise.
separate stacks: Open, Seen, and Deferred. The following The construction of the policy is performed by, iteratively,
search procedure is performed until policy π does not change unstacking a new state from Open, or Deferred if Open is
with respect to the previous policy found. First, Open is ini- empty, and finding a plan with G EN FAIR P LAN search. This
tialised with the initial state s0 and Seen, Deferred, and π iterative process is run until the stacks Open and Deferred are
are initialised to empty sets. For each state s in Open being empty, or G EN FAIR P LAN search for plans fails.
processed, for which π is undefined, procedure G EN FAIR -
P LAN searches a plan P for s that exclusively produces tran- Procedures G EN FAIR P LAN and P ROCESS D EADENDS are
sitions t such that L(t) = F. Then, policy π is extended inspired by procedures G EN P LAN PAIRS and P ROCESS -
with information extracted from P, and the successors of s D EADENDS in the FOND planner PRP [Muise et al., 2012].
by π(s) are stacked. This is done by procedure S TACK S UC - G EN FAIR P LAN does two things. First, it searches for a plan
CESSORS, which stacks each successor s0 into Open when P for s such that all transitions t have L(t) = F. For effi-
ciency, such a plan may end in a goal state or in a state for Algorithm 2 Search of FOND+ Solutions
which a policy action has already been computed. Second,
1: procedure P OLICY S EARCH(κ, V, s0 , s? , A)
like PRP’s G EN P LAN PAIRS, it extends π with new state-
2: while π changes do
action pairs. More precisely, for each pair hs, ai in the plan,
3: π←∅
G EN FAIR P LAN computes, via regression, the relevant part p
4: Open ← s0
of s so that the goal is still reachable (e.g. [Waldinger, 1977;
5: Seen ← {}
Reiter, 2001]). Finally, it extends π with the (partial) state-
6: while Open 6= ∅ do
action pair hp, ai. PRP maintains a list of forbidden state-
7: s = Open.pop()
action pairs (FSAPs) that recognisably lead to a state where
8: if s.ucount > κ then break
no strong-cyclic policy exists. FSAPs allow to prune the
9: end if
search space effectively, and reduce search times consider-
10: if s 6|= s? ∧ s 6∈ Seen then
ably, in practice. We apply the same ideas in our algorithm.
11: Seen.add(s)
Whenever G EN FAIR P LAN fails to find a plan for state s (i.e.,
12: if π(s) is undefined then
s is a dead end), procedure P ROCESS D EADENDS computes
13: G EN B OUND P LAN(κ, V, s, s? , A, π)
the reasons why s is a dead end and performs regression to
14: end if
create a set of forbidden (partial) state-action pairs that need
15: if π(s) is defined then
to be avoided in future searches.
16: hp, ai = π(s)
Algorithm 1 finds normative solutions to a FOND+ prob- 17: S TACK S UCCESSORS(s, a)
lem, as proved in Theorem 2. 18: else break
Theorem 2. Algorithm 1 is sound and complete in the search 19: end if
for normative solutions to a FOND+ problem. 20: end if
21: end while
Proof sketch. Line 16 of Algorithm 1 stacks all states reach- 22: P ROCESS D EADENDS ()
able by π, Every state a reachable by π is considered in line 23: end while
8 of Algorithm 1. If no fair plan for s has been computed yet, 24: return π
procedure G EN FAIR P LAN finds a fair plan for it. This proves 25: end procedure
soundness of the algorithm. Procedure P OLICY S EARCH ex-
Add Successors to Corresponding Stack
plores all the states space, except those forbidden state-action
pairs that are pruned from search by P ROCESS D EADENDS. 26: function S TACK S UCCESSORS(s, a)
Therefore, the algorithm is complete. 27: for e ∈ Effa do
28: s0 = P rog(s, a, e)
Algorithm 1 maintains soundness and completeness when 29: if L(s, a, s0 ) = F then
procedure S TACK S UCCESSORS stacks all successors into the 30: s0 .ucount ← s.ucount
Open stack – i.e. the use of Deferred stack is not necessary. 31: else
However, there is an advantage to employing the Deferred 32: s0 .ucount ← s.ucount + 1
stack. The rationale appears with the interpretation that tran- 33: end if
sitions with L(t) = F model the normative behaviour of the 34: Open.add(s0 )
system. Intuitively, we want faulty transitions to be repaired 35: end for
so the system moves back to a normative state. When han- 36: end function
dling of states originated by faulty transitions (L(t) = U) is
deferred, it is more likely that plans found by procedure G EN - representation of states that includes, for each planning state
FAIR P LAN plan to a state that is already handled by the pol- s explored, a counter s.ucount that records the number of
icy rather than yielding a niew plan all the way to the goal. In transitions with L(t) = U produced to visit state s. The ini-
other words, it is more likely that those plans are repairs rather tial state has s.ucount = 0. For a transition (s, a, s0 ), the
than alternative plans. This is a desired property of solutions counter s0 .ucount equals s.ucount when L(s, a, s0 ) = F, and
that justifies, in principle, the use of the Deferred stack. s.ucount + 1 when L(s, a, s0 ) = U.
We abuse notation and say that s entails s0 when s |= s0
4.4 Search of FOND+ Solutions and s.ucount ≤ s0 .ucount. For a state s in the Open stack
We presented algorithms to obtain strictly fair, strictly un- being processed, procedure G EN B OUND P LAN searches for
fair, and normative solutions to FOND+ problems. In this plans P = s, a0 , s1 , a1 , . . . an , sn that either: (i) exclusively
section, we present a search algorithm to obtain general solu- produce transitions t with L(t) = F and sn is a goal state,
tions to FOND+ problems. (ii) exclusively produce transitions t with L(t) = F and sn
For a FOND+ problem P, our algorithm is an iterated call entails a partial state for which the policy action is defined. or
to Algorithm 2 with parameter κ = 0, 1, . . . until a solu- (iii) exclusively produce transitions t with L(t) = F except
tion is found. Similar to normative fault-tolerant planning, (sn−1 , an , sn ), and all transitions produced by the progres-
each iteration fixes the maximum number κ of transitions sion of sn−1 by an have L(t) = U. Intuitively, conditions
with L(t) = U allowed to occur in any plan execution. How- (i),(ii),(iii) ensure that all L-fair plan executions achieve the
ever, solutions do not presume that no more than κ faulty ef- goal.
fects will occur. The algorithm makes use of an augmented In G EN B OUND P LAN, plans are regressed as in procedure
G EN FAIR P LAN of Algorithm 1. Regressed states keep the Strong-Cyclic Normative
counter s.ucount in each state s of the plan, which is not problem run-time size run-time size
altered by plan regression. Procedure P ROCESS D EADENDS p2 0 3 0 3
– analogous to that of Algorithm 1 – keeps track of the state- p3 0.002 5 0.016 5
action pairs that recognisably lead to a dead end or a state p4 0.020 11 0.048 11
for which procedure G EN B OUND P LAN fails to find a plan, p5 0.070 27 0.178 27
where representation of states s include the counter s.ucount. p6 0.110 39 0.296 39
The list of forbidden state-action pairs is initialised in each p7 0.114 32 0.270 32
call to P OLICY S EARCH. We leave further optimizations of p8 0.150 26 0.356 26
the algorithm for future work. p9 0.278 46 0.664 46
Strictly speaking, solutions found by Algorithm 2 are not p10 0.336 49 0.782 49
policies. The reason is that the action performed in an aug- p11 0.522 120 1.936 97
mented state s, in general, may depend on the value of the p12 0.626 97 1.840 119.5
counter s.ucount. Certainly, the action returned by a policy p13 0.682 57 1.810 57
should depend only on the planning state, not on the counter. p14 3.794 1117 37.10 1123
We have two options to mitigate for this. One, is to construct p15 1.500 278 7.814 278
the policy that selects, for each planning state s, the action Table 1: Average run-time (sec.) and policy size of FOND
among all (augmented partial) state-action pairs {p, a} such and FOND+ solutions to blocksworld-new problems. Strong-
that s |= p and p.ucount is maximal. The second option is cyclic solutions presume fairness, whereas computation of
to represent the solution found by Algorithm 2 as a FSC π normative solutions is informed by the labeling function.
that keeps track, implicitly, of occurrence of transitions with
L(t) = U along execution.
pick-up is to pick-up the chosen block (or the tower of two
Theorem 3. Algorithm 2 is sound and complete in the search blocks). Likewise, the primary effect of put-on-block is to put
of FOND+ solutions whose plan executions produce a num- a block (or a tower of two blocks) that is held by the grip-
ber of transitions t with L(t) = U that is bounded by κ. per on top of another block. The faulty effect for both actions
Corollary 1. The iterated call to Algorithm 2 with parame- drops the block (or the tower of two blocks) on the table. Al-
ters κ = 0, 1, . . . finds a solution to a FOND+ problem P, ternatively, we can interpret the gripper is guaranteed to suc-
if such one exists. If a solution is found in the κ-th iteration, ceed in picking-up the block eventually, and so on.
the number of transitions t with L(t) = U in plan execution is Table 1 summarizes the results of our tests. For each prob-
bounded by κ. lem, we report the run-time and policy size, averaged over
10 runs per problem. Normative policies have, sometimes,
5 Experiments equal size to strong-cyclic policies. We conjecture that in
these problems strong-cyclic policies are also robust to un-
We conducted a series tests with the objective of demon- fairness (i.e. are also normative solutions to the FOND+ prob-
strating the need for FOND+ solutions, that differ from lems). On the other hand, in some FOND+ problems – like
strong-cyclic FOND solutions and that guarantee goal p11, p12, and p14– the sizes of the normative policy varies
achievement when the classical fairness assumption is not with respect to the size of the strong-cyclic policy of the cor-
valid. We selected a non-deterministic planning domain, the responding FOND problem. In those cases, normative poli-
blocksworld-new, and a set of FOND problems. For each cies can be of greater or lesser size relative to strong-cyclic
FOND problem, we constructed FOND+ problem by labeling policies. As normative FOND+ solutions are also solutions
transitions. Finally, we compared FOND+ normative solu- to the FOND problem, this illustrates that policies found by
tions with strong-cyclic FOND solutions to each of the prob- PRP are not necessarily of minimal size. Run-times of Algo-
lems. We used PRP as the strong-cyclic planner; Algorithm 1 rithm 1 are, in general, two times greater than those of PRP
is implemented on top of PRP. In all cases (also in Algorithm in those problems where the size of the policies generated
1), we configured PRP search with dead end detection dis- are the same. Comparing the run times of our algorithm with
abled, and strong-cyclic-detection optimization disabled. Ex- those of PRP is not fair, as the algorithms find solutions to
periments were run on an Intel Xeon E5-2430 CPU @2.2GHz different problems. However, general run-times suggest that
Linux server, limiting processes to 2GB of memory usage. the branch-and-bound technique used in Algorithm 2 is, in
The blocksworld-new problems used in our tests have been principle, efficient in the search for normative solutions. As
extracted from Muise et al. [2012]. They are based on the the implementation of Algorithm 2 is preliminary and not op-
blocksworld benchmark used in past International Probabilis- timal, we believe there is room for further improvements in
tic Planning Competitions (IPPCs), and include some fixes its performance.
and larger problem sizes with respect to the original instances.
In the blocksworld-new domain, the gripper can pick up a
block (or two blocks at once) and put it on top of other block, 6 Summary and Future Work
or on the table. The goal is to set the blocks in a certain or- There exist a diversity of models for planning in the face
der. The action that puts blocks on the table is determinis- of non-deterministic action outcomes including probabilistic
tic. However, the pick-up and put-on-block actions have non- planning, MDPs, and fault-tolerant planning. We focus our
deterministic effects. More precisely, the primary effect of attention on FOND planning. Solutions to FOND planning
problems are required to be robust to non-deterministic ac- [Jaramillo et al., 2014] Andres Calderon Jaramillo, Jicheng Fu,
tion effects, and to guarantee goal achievement during execu- Vincent Ng, Farokh B Bastani, and I-Ling Yen. Fast strong
tion. The class of strong solutions provides such guarantees, planning for fond problems with multi-root directed acyclic
but the existence of such solutions is restricted in general. graphs. International Journal on Artificial Intelligence Tools,
On the other hand, the soundness of strong-cyclic solutions 23(06):1460028, 2014.
is predicated on fairness, and therefore its execution does not [Jensen et al., 2004] Rune M Jensen, Manuela M Veloso, and Ran-
guarantee goal achievement, in general, when the fairness as- dal E Bryant. Fault tolerant planning: Toward probabilistic uncer-
sumption is not valid. tainty models in symbolic non-deterministic planning. In Proc. of
the 14th Int’l Conference on Automated Planning and Scheduling
To mitigate for this, we introduced the model for Fully Ob-
(ICAPS), pages 335–344, 2004.
servable Non-Deterministic Planning with Labeled Unfair-
ness (FOND+ ), where fairness of action transitions can be [Kolobov et al., 2009] Andrey Kolobov, Mausam, and Daniel S
described explicitly by means of the labeling function L, and Weld. Retrase: Integrating paradigms for approximate proba-
bilistic planning. In Proc. of the 21st Int’l Joint Conference on
solutions guarantee goal achievement provided that execu-
Artificial Intelligence (IJCAI), pages 1746–1753, 2009.
tions are L-fair. We leave complexity analysis of our model
to future work. [Kupferman and Vardi, 1996] Orna Kupferman and Moshe Y
Vardi. Verification of fair transition systems. In Computer Aided
We distinguished four classes of solutions to FOND+ plan-
Verification, pages 372–382. Springer, 1996.
ning problems. Namely, strictly fair, strictly unfair, mixed,
and normative. Strictly fair (resp. strictly unfair) solutions [Lussier et al., 2007] Benjamin Lussier, Matthieu Gallien, Jérémie
are strong-cyclic (resp. strong) solutions to FOND relax- Guiochet, Félix Ingrand, Marc-Olivier Killijian, and David Pow-
ell. Fault tolerant planning for critical robots. In 7th Annual
ations of the problem. This results in a continuum between
IEEE/IFIP Int’l Conf. on Dependable Systems and Networks
strong planning and strong-cyclic planning. Additionally, we (DSN), pages 144–153. IEEE, 2007.
found a connection between normative solutions and fault-
[Muise et al., 2012] Christian Muise, Sheila A. McIlraith, and
tolerant planning. In particular, when primary action transi-
J. Christopher Beck. Improved Non-deterministic Planning by
tions have L(s, a, s0 ) = F, normative solutions are solutions
Exploiting State Relevance. In Proc. of the 22th Int’l Conference
to 1-primary normative models of fault-tolerant planning. No- on Automated Planning and Scheduling (ICAPS), pages 172–
tably, our model does not impose a bound on the number 180, 2012.
of faults during policy execution, and solutions are robust to
[Pineda et al., 2013] Luis Enrique Pineda, Yi Lu, Shlomo Zilber-
any finite number of faults during execution. This overcomes
stein, and Claudia V Goldman. Fault-tolerant planning under un-
the limitation of fault-tolerant planning, where that the num- certainty. In Proc. of the 23rd Int’l Joint Conference on Artificial
ber of faults κ that a plan can tolerate must be designated Intelligence (IJCAI), 2013.
prior to policy generation and execution. We leave further ex-
[Puterman, 1994] Martin Puterman. Markov Decision Processes:
ploration of the connection between the FOND+ model and
Discrete Dynamic Programming. Wiley, New York, 1994.
fault-tolerant planning to future work.
Finally, we presented algorithms to find different classes [Reiter, 2001] Raymond Reiter. Knowledge in Action: Logical
Foundations for Specifying and Implementing Dynamical Sys-
of solutions to FOND+ planning problems, and an algorithm
tems. MIT Press, Cambridge, MA, 2001.
to find general solutions to FOND+ . Our algorithms aim to
provide a baseline for future enhancements, although the per- [Sardina and D’Ippolito, 2015] Sebastian Sardina and Nicolas
formance of Algorithm 1 suggests that a branch-and-bound D’Ippolito. Towards fully observable non-deterministic planning
as assumption-based automatic synthesis. In Proc. of the 24th
search scheme that explores deterministic plans in a certain
Int’l Joint Conference on Artificial Intelligence (IJCAI), pages
order is effective. We leave the design and evaluation of bet- 3200–3206. AAAI Press, 2015.
ter algorithms to future work.
[Trevizan et al., 2007] Felipe W Trevizan, Fabio Gagliardi Coz-
man, and Leliane Nunes de Barros. Planning under risk and
References knightian uncertainty. Proc. of the 20th Int’l Joint Conference
[Cimatti et al., 2003] Alessandro Cimatti, Marco Pistore, Marco on Artificial Intelligence (IJCAI), 2007:2023–2028, 2007.
Roveri, and Paulo Traverso. Weak, strong, and strong cyclic [Waldinger, 1977] Richard Waldinger. Achieving several goals si-
planning via symbolic model checking. Artificial Intelligence, multaneously. In Machine Intelligence 8, pages 94–136. Ellis
147:35–84, 2003. Horwood, Edinburgh, Scotland, 1977.
[Delgrande and Levesque, 2013] James P Delgrande and Hector J
Levesque. A formal account of nondeterministic and failed ac-
tions. In Proc. of the 23rd Int’l Joint Conference on Artificial
Intelligence (IJCAI). Citeseer, 2013.
[D’Ippolito et al., 2011] Nicolás D’Ippolito, Victor Braberman, Nir
Piterman, and Sebastián Uchitel. Synthesis of live behaviour
models for fallible domains. In 33rd Int’l Conf. on Software En-
gineering (ICSE), pages 211–220. IEEE, 2011.
[Domshlak, 2013] Carmel Domshlak. Fault tolerant planning:
Complexity and compilation. In Proc. of the 23th Int’l Confer-
ence on Automated Planning and Scheduling (ICAPS), 2013.