=Paper= {{Paper |id=Vol-1648/paper11 |storemode=property |title=Strong-Cyclic Planning when Fairness is Not a Valid Assumption |pdfUrl=https://ceur-ws.org/Vol-1648/paper11.pdf |volume=Vol-1648 |authors=Alberto Camacho,Sheila A. McIlraith |dblpUrl=https://dblp.org/rec/conf/ijcai/CamachoM16 }} ==Strong-Cyclic Planning when Fairness is Not a Valid Assumption== https://ceur-ws.org/Vol-1648/paper11.pdf
              Strong-Cyclic Planning when Fairness is Not a Valid Assumption

                      Alberto Camacho                                      Sheila A. McIlraith
                Department of Computer Science                        Department of Computer Science
                 University of Toronto, Canada                         University of Toronto, Canada
                  acamacho@cs.toronto.edu                                 sheila@cs.toronto.edu


                          Abstract                                description of the actions and their effects that associates a
                                                                  non-zero probability with alternative action outcomes. While
     We address the class of fully-observable non-                less expressive than MDPs, such systems are often easier to
     deterministic (FOND) planning problems where                 solve and show better scalability.
     non-deterministic actions are not guaranteed to dis-            In cases where the stochasticity of a system is not well
     play fair behaviour. Typical solutions to FOND               understood, or deemed unnecessary, Fully Observable Non-
     planning problems either guarantee goal reachabil-           Deterministic (FOND) planning [Cimatti et al., 2003] pro-
     ity with a bounded number of transitions (so-called          vides a framework, similar to probabilistic planning, but with
     strong acyclic solutions), or are predicated on a fair-      the exception that the effects of actions are non-deterministic
     ness assumption that presumes each action effect             without commitment to the probability of alternative action
     occurs infinitely often when the action is applied           effects. Solutions to FOND planning problems are policies
     infinite times in the same state (so-called strong-          that map states into actions. Cimatti et al. [2003] identi-
     cyclic solutions). We introduce the FOND+ plan-              fied three different classes of solutions to FOND problems:
     ning model, that extends the FOND model with                 weak, strong, and strong-cyclic. Weak solutions are plans that
     a more informed description of the action tran-              achieve the goal, but without guarantees. Strong solutions
     sitions. Solutions to FOND+ problems guarantee               guarantee goal achievement in all executions. This condition
     goal reachability, even when the classical fairness          is too restrictive, as FOND problems often do not have strong
     assumption is not valid. We characterize different           solutions. The class of strong-cyclic solutions is an alterna-
     topologies, or classes, of solutions showing that            tive, and guarantees goal achievement provided that all exe-
     typical strong acyclic, and fair strong-cyclic plan-         cutions are fair. When the fairness assumption is not valid,
     ning are particular cases. Moreover, we character-           policy executions may not achieve the goal.
     ize a class of FOND+ solutions that also solve 1-
     primary normative fault-tolerant problems, and are              In 2004, Jensen et al. introduced the notion of fault-
     robust to any finite number of faults during execu-          tolerant planning, motivated by two observations. First, they
     tion. We present algorithms to solve each class of           interpreted non-determinism in many real-world problems as
     problems. Finally, we present the results of one of          the consequence of infrequent errors (so-called faults) during
     our algorithms in the search for so-called norma-            execution. Second, they claimed that in general, no action is
     tive solutions for a selection of blocksworld prob-          ever guaranteed to achieve non-faulty behaviour. The resul-
     lems where the classical fairness assumption is not          tant framework for fault-tolerant planning includes complete
     valid.                                                       knowledge about which state transitions are normative and
                                                                  which are otherwise faulty. Solutions attempt to reach a pre-
                                                                  scribed goal under the assumption that no more than κ faults
1   Introduction                                                  will occur during execution.
   Most real-world dynamical systems are best modeled us-            In this paper we recognize that there exist non-
ing non-deterministic action effects, reflecting our inability    deterministic transition systems where the fairness assump-
to characterize, with certainty, the outcome of actions, and      tion does not hold, and the challenge we address is to find so-
in some cases to additionally model the effects of exoge-         lutions that guarantee goal achievement when strong FOND
nous events. A number of different frameworks have been           solutions do not exist. To this end, we introduce the FOND+
proposed to model and plan with non-deterministic transition      planning framework as an extension to the FOND frame-
systems. For the purposes of this paper, we limit our discus-     work that includes a more informed description of transitions.
sion to fully-observable systems. Markov Decision Processes       The FOND+ framework makes it possible to find policies
(MDPs) [Puterman, 1994], for example, model the effect of         that guarantee goal achievement when the classical fairness
an action in each state as a probability distribution over the    assumption is not valid. Alternatively, the FOND+ frame-
successor states. In contrast, many probabilistic planning sys-   work can be informed with a description of the normative
tems (e.g. [Kolobov et al., 2009]) exploit a compact symbolic     behaviour of the system. In this case, solutions to certain
FOND+ problems are also solutions to 1-primary normative                 appears infinitely often in σ, but the transition (s, a, s0 ) oc-
fault-tolerant planning problems. We define different classes            curs a finite number of times for an outcome s0 ∈ T (s, a).
of solutions to FOND+ problems, and introduce algorithms                 Executions that are not unfair are said to be fair.
to find solutions for each class. We identify a continuum
                                                                         Definition 2 (adapted from [Cimatti et al., 2003]). A solution
of FOND+ solutions that includes strong, strong-cyclic, and
                                                                         to a FOND problem is strong cyclic when all executions result
fault-tolerant planning solutions in extreme cases. This re-
                                                                         in either finite sequences of states that terminate in a goal
sult is perhaps most related to the work done by Trevizan et
                                                                         state, or are infinite and unfair.
al. [2007], which defines a framework to deal simultaneously
with risk and uncertainty in an attempt to unify probabilis-                Cimatti et al. [2003] distinguish three classes of solutions
tic and non-deterministic planning. The results presented in             to a FOND planning problem: weak, strong, and strong cyclic.
this paper open the door to further advancements in the uni-             Weak solutions are plans that lead the agent from the initial
fication of FOND, fault-tolerant, and probabilistic planning             state to a goal state, but with no guarantees due to the non-
frameworks.                                                              determinism of the actions. Strong solutions are plans that
                                                                         guarantee goal achievement. Necessarily, strong solutions are
2     Background                                                         acyclic, and achieve the goal in a bounded number of steps.
   A non-deterministic planning domain is a tuple D =                    Finally, strong cyclic solutions are plans that are guaranteed
hF, S, A, T i, where F is a set of propositions; S ⊆ 2F is               to achieve the goal, but predicated on the fairness assump-
a finite set of states, typically represented by the set of propo-       tion given by Definition 1. The fairness assumption presumes
sitions that hold; A is a finite set of actions; and T : S × A →         that, for each state s and each action a applicable in s, every
2S is a transition function.                                             outcome of a occurs infinitely many times if a is applied in
                                                                         s infinitely often. Strong solutions are also strong-cyclic, but
   An actions a ∈ A is defined in terms of its preconditions
                                                                         the opposite is not always true.
P rea and effects Eff a . P rea ⊆ F is the set of propositions
that need to hold true in a state s for a to be applicable in s,
                                                                         2.2    Fault-Tolerant Planning
i.e., P rea ⊆ s. Eff a = hEff 1a , . . . , Eff na i is a finite set of
possible effects of a. Each effect e ∈ Eff a is a tuple of the              A fault-tolerant planning problem is a tuple P =
form e = hAdde , Dele i, where Adde and Dele are sets of                 hD, s0 , SG , F, κi, where D = hF, S, A, T i is a non-
propositions to be added and deleted, respectively, when an              deterministic planning domain, s0 ∈ S is the initial state,
action transition is applied. The progression of s with respect          SG ⊆ S is a set of goal states, F is an exception model as
to action a and effect e is the updated state P rog(s, a, e) =           described in Definition 3, and κ is a constant [Jensen et al.,
(s \ Dele ) ∪ Adde . Finally, the                                        2004; Domshlak, 2013]. Informally, a fault-tolerant planning
                                   S result of applying a in s is        task P requires planning for goal achievement under the as-
one of the states in T (s, a) = e∈Eff a P rog(s, a, e). A state
                                                                         sumption that no more than κ exceptions will occur during
s0 ∈ T (s, a) is said to be an outcome of a in s, and the triplet        plan execution.
(s, a, s0 ) is called the transition. An execution of domain D
in state s0 is a state-action sequence s0 , a0 , s1 , a1 . . . such      Definition 3 (adapted from [Domshlak, 2013]). For a non-
that ai is applicable in si for each i = 0, 1 . . ., and si+1 is         deterministic Splanning domain D, an exception model is a
one of the outcomes of ai in si chosen non-deterministically             function F : a∈A Eff a → N, computable in time polyno-
by the system. When s0 = P rog(s, a, e), the regression                  mial in the size of P. If, for each action a ∈ A, |{e | e ∈
of s0 with respect to action a and effect e is the partial               Eff a , F (e) = 0}| ≤ α, then F is called α-primary. Likewise,
state Regr(s0 , a, e) = (s0 \ Adde ) ∪ P rea . Conceptually,             if, for each action a ∈ A, |{e | e ∈ Eff a , F (e) = 0}| > 0,
Regr(s0 , a, e) is the minimal subset of propositions that a             then F is called normative.
state s needs to entail so that s0 is an outcome of a in s. Ac-
                                                                            In a fault-tolerant planning problem, actions have two types
tions can be applied in partial states, whose progression and
                                                                         of effects. Primary effects model the normative behaviour of
regression is defined as above (cf. [Muise et al., 2012]).
                                                                         the system, while faulty effects model one or more failures
                                                                         (faults) of the system. Primary effects, e, have F (e) = 0,
2.1    Fully Observable Non-Deterministic Planning                       whereas F (e) > 0 when e is faulty. Intuitively, the exception
   A fully-observable non-deterministic (FOND) planning                  model F maps each effect to the number of exceptions, or
problem is a tuple P = hD, s0 , SG i, where D = hF, S, A, T i            faults, associated with it.
is a non-deterministic planning domain, s0 ∈ S is the initial               Solutions to a fault-tolerant planning problem P =
state, and SG ⊆ S is a set of goal states. Solutions to FOND             hD, s0 , SG , F, κi are κ-plans, or policies that achieve the
planning problems are policies that map states into actions.             goal under the assumption that no more than κ faults
We say that π is well-defined when π(s) is defined in every              will occur during execution. Formally, a plan P is κ-
state reachable by π. Executions of a well-defined policy π              admissible when its execution generates the state-effect se-
(also called plan executions) are executions s0 , a0 , s1 , a1 . . .     quence (s0 , e0 , . . . , si , ei , . . .) with Σi F (ei ) ≤ κ. A policy is
where, for each i = 0, 1 . . . , ai = π(si ). When the execu-            a κ-plan when all plan executions are κ-admissible and reach
tion finishes in a goal state sn ∈ SG , the sequence of actions          the goal. Solutions to fault-tolerant planning problems do not
P = a0 , a1 , . . . , an−1 is called a plan.                             assume fairness as done in strong-cyclic FOND planning, and
Definition 1 (adapted from [Cimatti et al., 2003]). We say               are robust to occurrence of up to κ faults. Note that the pa-
that an execution σ is unfair when a state-action tuple s, a             rameter κ that sets the maximum number of exceptions in a
solution is part of the problem description, and needs to be
                                                                                      A                             A
known, computed, or otherwise guessed in advance.                                     B                             B
   Jensen et al. [2004] introduced the model for fault-tolerant
planning, and addressed 1-primary normative models with the
rationale that these satisfactorily address non-determinism                                   A                             A
originating from many real-world physical system models.
                                                                              B   A       B                 B   A       B
Fault-tolerant planning has found applications in the robot
community (e.g. [Lussier et al., 2007]). More recently, fault-
tolerant planning has been studied by Domshlak [2013] and                 (a) Strong-Cyclic Solution     (b) Normative Solution
Delgrande and Levesque [2013]. Recent work by Pineda et
al. [2013] addressed the problem of fault-tolerant planning         Figure 1: Different policies for the blocks-world domain. The
under uncertainty, tackling the problem from the perspective        pick-up action is non-deterministic, and may pick-up a certain
of a stochastic shortest-path (SSP) problem.                        block or drop it on the table. The put-on-block and put-block-
                                                                    on-table actions are deterministic.
3   Planning with Unfair Non-Determinism                            Illustrative Example
   We seek to generate solutions to goal-oriented planning             Consider the blocks-world domain, where block A is ini-
problems with non-deterministic action effects where the fair-      tially on top of block B, and the goal is to put A on the ta-
ness assumption does not necessarily hold. Different notions        ble. The agent can pick up blocks and put them on top of
of fairness over executions have been discussed in the liter-       other blocks, or on the table. The pickup-block action is non-
ature. In particular, D’Ippolito et al. [2011] defined strong       deterministic and the gripper may drop the block onto the ta-
independent fairness, a notion that presumes it is known, for       ble, but is guaranteed to eventually pick-up the block if tried
each action, which outcomes are good and which ones are             repeated times. The put-on-block (resp. put-block-on-table)
failures (similar to the distinction made in fault-tolerant plan-   action is deterministic and puts the block held by the grip-
ning), and requires failures and assumptions – normally, tem-       per on top of another block (resp. on the table). Strong-cyclic
porally extended constraints – on the environment behaviour         FOND solutions that assume fairness may rely on the even-
to be independent. Sardina and D’Ippolito [2015] defined dif-       tual occurrence of the faulty effect of pickup-block in order
ferent logical characterizations of fairness based on the no-       to reach the goal. Figure 1a shows the strong-cyclic solution
tions introduced by D’Ippolito et al. [2011]. Kupferman and         that picks up block A and puts it on top of B repeatedly until
Vardi [1996] proved that verification of fair transition systems    block A is dropped on the table. If it is the case that the effect
is PSPACE-complete.                                                 of the pickup-block action that drops the block is not guaran-
                                                                    teed to occur during execution, the solution described above
   Our premise is that in many transition systems, and in con-      does not guarantee goal achievement. In contrast, the solu-
crete non-deterministic planning domains, it is known which         tion illustrated in Figure 1b is strong, and goal achievement
state-action pairs s, a are guaranteed to produce, eventually,      is guaranteed.
certain outcomes s0 if a would be applied in s infinitely often.       A similar pattern appears when we consider that the block
While not true in general, it is a valid assumption for many        being picked up is the normative behaviour of the pickup-
real-world transition systems. For example, it is known that        block action. The solution described in Figure 1a loops until
pressing the elevator’s button on the wall repeatedly causes        the pick-up action is faulty, which is certainly not a good qual-
the elevator’s doors to open. On the other hand, it is known        ity solution because the normative behaviour of the system
there is no guarantee that a fixed number will be awarded in a      can lead to the goal, regardless of whether the faulty effect is
lottery, even if the lottery is played infinitely many times un-    guaranteed to occur or not. Note that the solution described
der identical conditions. When fairness does not hold, solu-        in Figure 1b is of good quality because executions lead to the
tions need to be robust to non-determinism, and goal achieve-       goal when the behaviour of the system is normative, and the
ment in plan executions should not rely on a particular tran-       solution is robust to non-determinism.
sition for which there is no guarantees of occurrence. A sim-
ilar argument can be made in non-deterministic domains that         3.1     The FOND+ Model
have known normative behaviour. In the scope of this paper,            A Fully Observable Non-Deterministic Planning problem
we focus our attention on 1-primary normative fault-tolerant        with Labeled UnfairnesS (for short, FOND+ problem) is a
planning problems. In these problems, solutions need to be          tuple P = hD, s0 , SG , Li, where D = hF, S, A, T i is a non-
robust to failures, and the goal needs to be achievable when-       deterministic planning domain, s0 ∈ S is the initial state,
ever the system manifests its normative behaviour.                  SG ⊆ S is a set of goal states, and L : S×A×S → {F, U} is a
   The following example motivates the need for policy gen-         labeling function that maps each triplet (s, a, s0 ) ∈ S ×A×S
eration mechanisms that guarantee goal achievement in those         into one of the symbols F or U. The semantics of L supports
non-deterministic domains where the fairness assumption is          different interpretations. Before elaborating, we first define
not valid. Further, it illustrates a parallelism between solu-      what constitutes an L-fair execution. For a non-deterministic
tions to goal-oriented planning problems where fairness gov-        planning domain D = hF, S, A, T i, and labeling function
erns the non-determinism of the actions, and those where the        L : S × A × S → {F, U}, we say that an execution in state s0
non-deterministic actions have primary and faulty effects.          is L-unfair when there exists a state-action tuple (s, a) such
that (i) (s, a) appears infinitely often, and (ii) there exists a
transition (s, a, s0 ) such that L(s, a, s0 ) = F and (s, a, s0 ) oc-                    Strictly Fair
curs a finite number of times. Executions that are not L-unfair
are said to be L-fair. Note that fairness, as defined by Cimatti
et al. [2003], is a particular case of L-fair that occurs when
L assigns F to all transitions. Solutions to a FOND+ problem                                                  Strictly Unfair
are policies that guarantee goal achievement, predicated on                            Normative
the assumption that all executions of D in s0 are L-fair.
                                                                                                  Mixed
Definition 4. A FOND+ problem is a tuple P =
hD, s0 , SG , Li, where D = hF, S, A, T i is a non-                     Figure 2: Distribution of different classes of solutions to
deterministic planning domain, s0 ∈ S is the initial state,             FOND+ planning problems.
SG ⊆ S is a set of goal states, and L : S × A × S → {F, U}
is a labeling function.                                                 class of normative solutions that is particularly interesting for
Definition 5. Solutions to a FOND problem P = +                         its practical applications and proximity with solutions to 1-
hD, s0 , SG , Li are policies that guarantee goal achievement,          primary normative fault-tolerant planning problems.
predicated on the assumption that all executions of D in s0                A solution π to a FOND+ problem is strictly fair when
are L-fair.                                                             all transitions t produced by L-fair plan executions have
                                                                        L(t) = F. Strictly fair solutions to a FOND+ problem P =
   In the scope of this paper, we focus our attention on two            hD, s0 , SG , Li are strong-cyclic solutions to the FOND prob-
different interpretations of the labeling function. The first in-       lem P 0 = hD, s0 , SG i that results from ignoring the labeling
terpretation assumes it is known, for each state-action pair            function in P. The opposite is not always true, and strong-
(s, a), which transitions (s, a, s0 ) are not guaranteed to oc-         cyclic solutions to P’ may or may not be strictly fair solutions
cur infinitely often in executions where the pair (s, a) ap-            to P – only when all transitions t produced by plan execu-
pears an infinite number of times. In those transitions, we as-         tions have L(t) = F. In general, a strong cyclic solution can
sign L(s, a, s0 ) = U. Otherwise, we assign L(s, a, s0 ) = F.           belong to any class of FOND+ solutions. Like strong-cyclic
With this interpretation, all plan executions are L-fair and            solutions, strictly fair solutions can contain cycles.
FOND+ solutions guarantee goal achievement during ex-
                                                                        Definition 6. A solution π to a FOND+ problem is strictly
ecution. In other words, FOND+ solutions are robust to
                                                                        fair when all transitions t produced by L-fair plan executions
non-deterministic transitions during plan execution, and goal
                                                                        have L(t) = F.
achievement does not rely on any specific transition for
which there is no guarantees of eventual occurrence. The                   The class of strictly unfair solutions is analogous to the
FOND+ model overcomes difficulties of the FOND model                    class of strictly fair solutions, this time requiring transitions
when strong solutions do not exist, as strong-cyclic solutions          t produced by L-fair plan executions have L(t) = U. Strictly
do not guarantee goal achievement, in general, when Cimatti             unfair solutions to a FOND+ problem P = hD, s0 , SG , Li
et al.’s fairness assumption is not valid.                              are strong solutions to the FOND problem P 0 = hD, s0 , SG i
   The second interpretation we examine describes the nor-              that results from ignoring the labeling function in P. The op-
mative behaviour of the system, and assigns L(s, a, s0 ) = F            posite is not always true either, and strong solutions to P’ are
(resp. L(s, a, s0 ) = U) when s0 follows from a primary (resp.          strictly unfair solutions to P only when all transitions t pro-
faulty) effect of a in s. With this interpretation, an execution        duced by plan executions have L(t) = U. In general, a strong
is L-fair when it is finite, or when it is infinite and for each        solution can belong to any class of FOND+ solutions. Like
state-action pair (s, a) that appears infinitely often, the system      strong solutions, strictly unfair solutions are acyclic.
responds with each primary effect of a in s infinitely often. As        Definition 7. A solution π to a FOND+ problem is strictly
we will see later, this interpretation is particularly interesting      unfair when all transitions t produced by L-fair plan execu-
in certain FOND+ planning problems, and makes it possible               tions have L(t) = U.
to find policies that guarantee goal achievement under the as-             The class of mixed solutions completes the space of so-
sumption that the number of transitions produced by faulty              lutions to a FOND+ planning problem. Mixed solutions are
effects during plan execution is finite.                                those that are neither strictly fair nor strictly unfair. Plan ex-
   In the blocksworld example introduced above, and with                ecutions s0 , a0 , s1 , a1 . . . sn are such that sn is a goal state,
both interpretations, all transitions t have L(t) = F except            and may contain cycles. More precisely, there may exist
the transitions produced by the effect of the pick-up action            k < m < n such that sk = sm . As all infinite plan exe-
that drops the block on the table, which have L(t) = U. With            cutions need to be L-unfair (cf. Definition 5), at least one of
this labeling, the policy depicted in Figure 1b is a FOND+              the outcomes of si by ai , for a certain k ≤ i ≤ m, must yield
solution, whereas the policy depicted in Figure 1b is not.              a transition (si , ai , s0i+1 ) with L(si , ai , s0i+1 ) = F.
3.2   Classes of Solutions                                              Definition 8. A solution π to a FOND+ problem is mixed
   We characterize different classes of solutions to FOND+              when it is neither strictly fair nor strictly unfair.
problems according to the fairness of plan executions.                    Finally, we characterize the class of normative solutions,
Namely, we distinguish between strictly fair, strictly un-              which are particularly interesting when the labeling function
fair, and mixed solutions. Furthermore, we characterize the             describes the normative behaviour of the system – i.e., when
transitions (s, a, s0 ) with label L(s, a, s0 ) = F are exactly      model and its solutions. More precisely, normative FOND+
those produced by a primary effect of a in s. A solution π           solutions are robust to any number of faulty transitions dur-
is normative when, in each state s, reachable by π: (i) there        ing execution. This is an advantage with respect to solutions
exists a plan execution in s that reaches the goal and such          to fault-tolerant planning, which are robust to a limited num-
that all transitions t have L(t) = F, and (ii) exactly one out-      ber of κ faults during execution. Besides, κ is a parameter of
come of s by π(s) produces a transition t with L(t) = F.             the model that has to be fixed prior search, even if a solution
The class of normative solutions intersects with the class of        robust to any number of faults exists.
strictly fair solutions in those solutions that are deterministic
plans (cf. Figure 2).                                                4     Search Algorithms for FOND+
                          +
Definition 9. A FOND solution π is normative when, in                   This section presents algorithms to search for strictly fair,
each state s, reachable by π: (i) there exists a plan execution      strictly unfair, and normative solutions to FOND+ planning
in s that reaches the goal and such that all transitions t have      problems. This is followed by an algorithm that is, in prin-
L(t) = F, and (ii) exactly one outcome of s by π(s) produces         ciple, less efficient, but can search for general solutions
a transition t with L(t) = F.                                        to FOND+ problems. Notwithstanding, the algorithms pre-
                                                                     sented below are intended to provide a baseline for future im-
   With the interpretation of the labeling function that de-         provements, and their main purpose is to demonstrate the ex-
scribes the normative behaviour of the system, policy actions        istence of methods to find solutions to FOND+ . We decided
of normative solutions are required to have, by definition, one      to follow a branch-and-bound search scheme, motivated by its
primary effect. Solutions need to be robust to faulty transi-        efficiency in the search for strong-cyclic solutions to FOND
tions and guarantee goal achievement, at any point during            problems [Muise et al., 2012]. The similarity between the
execution, when the system follows its normative behaviour.          FOND+ and FOND planning models suggests that a similar
In order to facilitate comparison with the fault-tolerant plan-      technique may be effective in the search for FOND+ solu-
ning model, we assume the value L(s, a, s0 ) depends only            tions.
on the effect that produces the outcome s0 , but not on the
state s itself. Formally, we assume L(s, a, P rog(s, a, e)) =        4.1     Search for Strictly Fair Solutions
L(s0 , a, P rog(s0 , a, e)) for all actions a ∈ A, all effects e        By definition, all transitions t produced by plan executions
of a, and all pairs of states s, s0 ∈ S in which a is applica-       of strictly fair solutions are such that L(t) = F. It is straight-
ble. When this property holds, the labeling function can be          forward to see that strictly fair solutions to a FOND+ problem
described compactly as a function of the actions effects.            P = hD, s0 , SG , Li are all and only those strong-cyclic solu-
   With the restrictions described above, normative solutions        tions to the FOND problem P 0 = hD0 , s0 , SG i. Intuitively,
to a FOND+ problem P = hD, s0 , SG , Li are also 1-primary           D0 is like D, but the actions applicable in a given state s
normative solutions 1 to fault-tolerant planning problems            are restricted to those a’s that only yield transitions (s, a, s0 )
P 0 = hD, s0 , SG , F, κi with an exception model F that as-         labeled with L(s, a, s0 ) = F. This constraint can be imple-
signs faults F (e) > 0 to effects e that produce transitions         mented by means of extra dummy fluents in D0 . Alternatively,
(s, a, P rog(s, a, e)) such that L(s, a, P rog(s, a, e)) = U.        when L accepts a compact description such that its value de-
Note that normative FOND+ solutions are robust to occur-             pends only on the effects of the actions that yield each transi-
rence of any possible number of faults during execution, as          tion, we can safely prune an action a from D0 when not all its
opposed to standard fault-tolerant solutions, whose execu-           effects yield transitions t with L(t) = F.
tions are guaranteed to be robust up to a bounded number                For a FOND+ problem P, the algorithm consists of two
of faults κ specified by the model.                                  steps:
Theorem 1. Let P = hD, s0 , SG , Li be a FOND+ prob-                     1. P is relaxed into a FOND problem P 0 as described
lem, and let P 0 = hD, s0 , SG , F, κi be a 1-primary normative             above.
fault-tolerant planning problem with exception model F that
assigns faults F (e) > 0 to effects e that produce transitions           2. A sound and complete strong-cyclic FOND planner –
(s, a, P rog(s, a, e)) such that L(s, a, P rog(s, a, e)) = U.               e.g. PRP [Muise et al., 2012] – is used to search for
Then, FOND+ normative solutions to P are 1-primary nor-                     a strong-cyclic solution to P 0 , which is returned as a
mative solutions to P’ for any κ < ∞.                                       strictly fair solution to P.
                                                                     Since strictly fair solutions to P are the strong-cyclic solu-
   The practicality of 1-primary normative fault-tolerant plan-
                                                                     tions to the FOND problem P 0 , soundness and completeness
ning to model and solve many real-world physical prob-
                                                                     of the algorithm derives from soundness and completeness of
lems has been discussed by Jensen et al. [2004] and Domsh-
                                                                     the strong-cyclic FOND planner.
lak [2013]. Normative FOND+ solutions overcome certain
limitations of the 1-primary normative fault-tolerant planning       4.2     Search for Strictly Unfair Solutions
   1                                                                    By definition, all transitions t produced by plan executions
     For convenience, we informally say that a solution π is α-
primary (resp. normative) when |{e | e ∈ Eff a , F (e) = 0}| ≤ α     of strictly unfair solutions are such that L(t) = U. The algo-
(resp. |{e | e ∈ Eff a , F (e) = 0}| > 0) in all effects from plan   rithm to find strictly unfair solutions is somewhat analogous
executions of π. Technically, the α-primary normative model intro-   to the algorithm used to find strictly fair solutions. This time,
duced by Domshlak [2013] requires F to be α-primary and norma-       however, we exploit the correspondence between strictly un-
tive globally, which is a harder constraint.                         fair solutions to a FOND+ problem P = hD, s0 , SG , Li
and strong FOND solutions to the FOND problem P 0 =                    Algorithm 1 Search for Normative Solutions
hD0 , s0 , SG i. Intuitively, D0 is like D, but the actions applica-    1: procedure P OLICY S EARCH(V, s0 , s? , A)
ble in a given state s are restricted to those a’s that only yield      2:    while π changes do
transitions (s, a, s0 ) labeled wth L(s, a, s0 ) = U. This con-         3:        π←∅
straint can also be implemented by means of extra dummy                 4:        Open ← s0
fluents in D0 . Alternatively, when L accepts a compact de-             5:        Seen ← {}
scription such that its value depends only on the effects of            6:        Deferred ← {}
the actions that yield each transition, we can safely prune             7:        while Open 6= ∅ ∨ Deferred 6= ∅ do
an action a when not all its effects yield transitions t with           8:            s =U NSTACK ()
L(t) = U.                                                               9:            if s 6|= s? ∧ s 6∈ Seen then
   For a FOND+ problem P, the algorithm consists of two                10:                 Seen.add(s)
steps:                                                                 11:                 if π(s) is undefined then
  1. P is relaxed into a FOND problem P 0 as described                 12:                      G EN FAIR P LAN(V, s, s? , A, π)
     above.                                                            13:                 end if
  2. A sound and complete strong FOND planer – e.g.                    14:                 if π(s) is defined then
     [Jaramillo et al., 2014] – is used to search for a strictly       15:                     hp, ai = π(s)
     unfair solution to P 0 , which is returned as a strictly un-      16:                      S TACK S UCCESSORS(s, a)
     fair solution to P.                                               17:                 else break
                                                                       18:                 end if
   Since strictly unfair solutions to P are the strong solutions       19:            end if
to the FOND problem P 0 , soundness and completeness of the            20:        end while
algorithm derives from soundness and completeness of the               21:        P ROCESS D EADENDS ()
strong FOND planner.                                                   22:    end while
                                                                       23:    return π
4.3   Search for Normative Solutions                                   24: end procedure
    Normative solutions are those where, for every reachable
                                                                       Add Successors to Corresponding Stack
state s, there exists a plan for s for which all transitions t have
L(t) = F. Algorithm 1 exploits this property. Intuitively, the         25: function S TACK S UCCESSORS(s, a)
algorithm explores plans that exclusively produce transitions          26:    for e ∈ Effa do
t such that L(t) = F, which are extended into robust policies          27:        s0 = P rog(s, a, e)
that also consider occurrence of other transitions.                    28:        if L(s, a, s0 ) = F then
    Without loss of generality, we assume the problem has the          29:            Open.add(s0 )
property that all actions produce, in each state s in which a          30:        else
is applicable, exactly one transition t such that L(t) = F. If         31:            Deferred.add(s0 )
this property is not true, and the value of L(s, a, s0 ) in any        32:        end if
state s depends only on the effect of a that generates outcome         33:    end for
s0 , actions that do not have exactly one effect that produces a       34: end function
transition t such that L(t) = F can be safely pruned in a pre-         Get Next Node to Explore
processing stage. Alternatively, a constraint during search can        35: procedure U NSTACK
be added, so that a state s can be expanded by action a only           36:    if OpenF 6= ∅ then
when exactly one outcome produces a transition t such that             37:        s = Open.pop()
L(t) = F. If this check is not performed, Algorithm 1 still            38:    else
finds solutions (perhaps non-normative mixed solutions) that           39:        s = Deferred.pop()
guarantee goal achievement when the interpretation of the la-          40:    end if
beling function describes fairness, but goal achievement dur-          41:    return s
ing execution is not guaranteed anymore when the labeling              42: end procedure
function describes the normative behaviour of the system.
    Procedure P OLICY S EARCH in Algorithm 1 maintains three           L(s, π(s), s0 ) = F, and stacks s0 into Deferred otherwise.
separate stacks: Open, Seen, and Deferred. The following               The construction of the policy is performed by, iteratively,
search procedure is performed until policy π does not change           unstacking a new state from Open, or Deferred if Open is
with respect to the previous policy found. First, Open is ini-         empty, and finding a plan with G EN FAIR P LAN search. This
tialised with the initial state s0 and Seen, Deferred, and π           iterative process is run until the stacks Open and Deferred are
are initialised to empty sets. For each state s in Open being          empty, or G EN FAIR P LAN search for plans fails.
processed, for which π is undefined, procedure G EN FAIR -
P LAN searches a plan P for s that exclusively produces tran-             Procedures G EN FAIR P LAN and P ROCESS D EADENDS are
sitions t such that L(t) = F. Then, policy π is extended               inspired by procedures G EN P LAN PAIRS and P ROCESS -
with information extracted from P, and the successors of s             D EADENDS in the FOND planner PRP [Muise et al., 2012].
by π(s) are stacked. This is done by procedure S TACK S UC -           G EN FAIR P LAN does two things. First, it searches for a plan
CESSORS, which stacks each successor s0 into Open when                 P for s such that all transitions t have L(t) = F. For effi-
ciency, such a plan may end in a goal state or in a state for        Algorithm 2 Search of FOND+ Solutions
which a policy action has already been computed. Second,
                                                                      1: procedure P OLICY S EARCH(κ, V, s0 , s? , A)
like PRP’s G EN P LAN PAIRS, it extends π with new state-
                                                                      2:    while π changes do
action pairs. More precisely, for each pair hs, ai in the plan,
                                                                      3:        π←∅
G EN FAIR P LAN computes, via regression, the relevant part p
                                                                      4:        Open ← s0
of s so that the goal is still reachable (e.g. [Waldinger, 1977;
                                                                      5:        Seen ← {}
Reiter, 2001]). Finally, it extends π with the (partial) state-
                                                                      6:        while Open 6= ∅ do
action pair hp, ai. PRP maintains a list of forbidden state-
                                                                      7:            s = Open.pop()
action pairs (FSAPs) that recognisably lead to a state where
                                                                      8:            if s.ucount > κ then break
no strong-cyclic policy exists. FSAPs allow to prune the
                                                                      9:            end if
search space effectively, and reduce search times consider-
                                                                     10:            if s 6|= s? ∧ s 6∈ Seen then
ably, in practice. We apply the same ideas in our algorithm.
                                                                     11:                 Seen.add(s)
Whenever G EN FAIR P LAN fails to find a plan for state s (i.e.,
                                                                     12:                 if π(s) is undefined then
s is a dead end), procedure P ROCESS D EADENDS computes
                                                                     13:                      G EN B OUND P LAN(κ, V, s, s? , A, π)
the reasons why s is a dead end and performs regression to
                                                                     14:                 end if
create a set of forbidden (partial) state-action pairs that need
                                                                     15:                 if π(s) is defined then
to be avoided in future searches.
                                                                     16:                     hp, ai = π(s)
   Algorithm 1 finds normative solutions to a FOND+ prob-            17:                      S TACK S UCCESSORS(s, a)
lem, as proved in Theorem 2.                                         18:                 else break
Theorem 2. Algorithm 1 is sound and complete in the search           19:                 end if
for normative solutions to a FOND+ problem.                          20:            end if
                                                                     21:        end while
Proof sketch. Line 16 of Algorithm 1 stacks all states reach-        22:        P ROCESS D EADENDS ()
able by π, Every state a reachable by π is considered in line        23:    end while
8 of Algorithm 1. If no fair plan for s has been computed yet,       24:    return π
procedure G EN FAIR P LAN finds a fair plan for it. This proves      25: end procedure
soundness of the algorithm. Procedure P OLICY S EARCH ex-
                                                                     Add Successors to Corresponding Stack
plores all the states space, except those forbidden state-action
pairs that are pruned from search by P ROCESS D EADENDS.             26: function S TACK S UCCESSORS(s, a)
Therefore, the algorithm is complete.                                27:    for e ∈ Effa do
                                                                     28:        s0 = P rog(s, a, e)
   Algorithm 1 maintains soundness and completeness when             29:        if L(s, a, s0 ) = F then
procedure S TACK S UCCESSORS stacks all successors into the          30:            s0 .ucount ← s.ucount
Open stack – i.e. the use of Deferred stack is not necessary.        31:        else
However, there is an advantage to employing the Deferred             32:            s0 .ucount ← s.ucount + 1
stack. The rationale appears with the interpretation that tran-      33:        end if
sitions with L(t) = F model the normative behaviour of the           34:        Open.add(s0 )
system. Intuitively, we want faulty transitions to be repaired       35:    end for
so the system moves back to a normative state. When han-             36: end function
dling of states originated by faulty transitions (L(t) = U) is
deferred, it is more likely that plans found by procedure G EN -     representation of states that includes, for each planning state
FAIR P LAN plan to a state that is already handled by the pol-       s explored, a counter s.ucount that records the number of
icy rather than yielding a niew plan all the way to the goal. In     transitions with L(t) = U produced to visit state s. The ini-
other words, it is more likely that those plans are repairs rather   tial state has s.ucount = 0. For a transition (s, a, s0 ), the
than alternative plans. This is a desired property of solutions      counter s0 .ucount equals s.ucount when L(s, a, s0 ) = F, and
that justifies, in principle, the use of the Deferred stack.         s.ucount + 1 when L(s, a, s0 ) = U.
                                                                        We abuse notation and say that s entails s0 when s |= s0
4.4   Search of FOND+ Solutions                                      and s.ucount ≤ s0 .ucount. For a state s in the Open stack
   We presented algorithms to obtain strictly fair, strictly un-     being processed, procedure G EN B OUND P LAN searches for
fair, and normative solutions to FOND+ problems. In this             plans P = s, a0 , s1 , a1 , . . . an , sn that either: (i) exclusively
section, we present a search algorithm to obtain general solu-       produce transitions t with L(t) = F and sn is a goal state,
tions to FOND+ problems.                                             (ii) exclusively produce transitions t with L(t) = F and sn
   For a FOND+ problem P, our algorithm is an iterated call          entails a partial state for which the policy action is defined. or
to Algorithm 2 with parameter κ = 0, 1, . . . until a solu-          (iii) exclusively produce transitions t with L(t) = F except
tion is found. Similar to normative fault-tolerant planning,         (sn−1 , an , sn ), and all transitions produced by the progres-
each iteration fixes the maximum number κ of transitions             sion of sn−1 by an have L(t) = U. Intuitively, conditions
with L(t) = U allowed to occur in any plan execution. How-           (i),(ii),(iii) ensure that all L-fair plan executions achieve the
ever, solutions do not presume that no more than κ faulty ef-        goal.
fects will occur. The algorithm makes use of an augmented               In G EN B OUND P LAN, plans are regressed as in procedure
G EN FAIR P LAN of Algorithm 1. Regressed states keep the                              Strong-Cyclic          Normative
counter s.ucount in each state s of the plan, which is not                problem     run-time   size      run-time   size
altered by plan regression. Procedure P ROCESS D EADENDS                  p2                 0      3             0      3
– analogous to that of Algorithm 1 – keeps track of the state-            p3             0.002      5         0.016      5
action pairs that recognisably lead to a dead end or a state              p4             0.020     11         0.048     11
for which procedure G EN B OUND P LAN fails to find a plan,               p5             0.070     27         0.178     27
where representation of states s include the counter s.ucount.            p6             0.110     39         0.296     39
The list of forbidden state-action pairs is initialised in each           p7             0.114     32         0.270     32
call to P OLICY S EARCH. We leave further optimizations of                p8             0.150     26         0.356     26
the algorithm for future work.                                            p9             0.278     46         0.664     46
   Strictly speaking, solutions found by Algorithm 2 are not              p10            0.336     49         0.782     49
policies. The reason is that the action performed in an aug-              p11            0.522   120          1.936     97
mented state s, in general, may depend on the value of the                p12            0.626     97         1.840 119.5
counter s.ucount. Certainly, the action returned by a policy              p13            0.682     57         1.810     57
should depend only on the planning state, not on the counter.             p14            3.794 1117           37.10 1123
We have two options to mitigate for this. One, is to construct            p15            1.500   278          7.814    278
the policy that selects, for each planning state s, the action      Table 1: Average run-time (sec.) and policy size of FOND
among all (augmented partial) state-action pairs {p, a} such        and FOND+ solutions to blocksworld-new problems. Strong-
that s |= p and p.ucount is maximal. The second option is           cyclic solutions presume fairness, whereas computation of
to represent the solution found by Algorithm 2 as a FSC π           normative solutions is informed by the labeling function.
that keeps track, implicitly, of occurrence of transitions with
L(t) = U along execution.
                                                                    pick-up is to pick-up the chosen block (or the tower of two
Theorem 3. Algorithm 2 is sound and complete in the search          blocks). Likewise, the primary effect of put-on-block is to put
of FOND+ solutions whose plan executions produce a num-             a block (or a tower of two blocks) that is held by the grip-
ber of transitions t with L(t) = U that is bounded by κ.            per on top of another block. The faulty effect for both actions
Corollary 1. The iterated call to Algorithm 2 with parame-          drops the block (or the tower of two blocks) on the table. Al-
ters κ = 0, 1, . . . finds a solution to a FOND+ problem P,         ternatively, we can interpret the gripper is guaranteed to suc-
if such one exists. If a solution is found in the κ-th iteration,   ceed in picking-up the block eventually, and so on.
the number of transitions t with L(t) = U in plan execution is         Table 1 summarizes the results of our tests. For each prob-
bounded by κ.                                                       lem, we report the run-time and policy size, averaged over
                                                                    10 runs per problem. Normative policies have, sometimes,
5   Experiments                                                     equal size to strong-cyclic policies. We conjecture that in
                                                                    these problems strong-cyclic policies are also robust to un-
   We conducted a series tests with the objective of demon-         fairness (i.e. are also normative solutions to the FOND+ prob-
strating the need for FOND+ solutions, that differ from             lems). On the other hand, in some FOND+ problems – like
strong-cyclic FOND solutions and that guarantee goal                p11, p12, and p14– the sizes of the normative policy varies
achievement when the classical fairness assumption is not           with respect to the size of the strong-cyclic policy of the cor-
valid. We selected a non-deterministic planning domain, the         responding FOND problem. In those cases, normative poli-
blocksworld-new, and a set of FOND problems. For each               cies can be of greater or lesser size relative to strong-cyclic
FOND problem, we constructed FOND+ problem by labeling              policies. As normative FOND+ solutions are also solutions
transitions. Finally, we compared FOND+ normative solu-             to the FOND problem, this illustrates that policies found by
tions with strong-cyclic FOND solutions to each of the prob-        PRP are not necessarily of minimal size. Run-times of Algo-
lems. We used PRP as the strong-cyclic planner; Algorithm 1         rithm 1 are, in general, two times greater than those of PRP
is implemented on top of PRP. In all cases (also in Algorithm       in those problems where the size of the policies generated
1), we configured PRP search with dead end detection dis-           are the same. Comparing the run times of our algorithm with
abled, and strong-cyclic-detection optimization disabled. Ex-       those of PRP is not fair, as the algorithms find solutions to
periments were run on an Intel Xeon E5-2430 CPU @2.2GHz             different problems. However, general run-times suggest that
Linux server, limiting processes to 2GB of memory usage.            the branch-and-bound technique used in Algorithm 2 is, in
   The blocksworld-new problems used in our tests have been         principle, efficient in the search for normative solutions. As
extracted from Muise et al. [2012]. They are based on the           the implementation of Algorithm 2 is preliminary and not op-
blocksworld benchmark used in past International Probabilis-        timal, we believe there is room for further improvements in
tic Planning Competitions (IPPCs), and include some fixes           its performance.
and larger problem sizes with respect to the original instances.
In the blocksworld-new domain, the gripper can pick up a
block (or two blocks at once) and put it on top of other block,     6   Summary and Future Work
or on the table. The goal is to set the blocks in a certain or-        There exist a diversity of models for planning in the face
der. The action that puts blocks on the table is determinis-        of non-deterministic action outcomes including probabilistic
tic. However, the pick-up and put-on-block actions have non-        planning, MDPs, and fault-tolerant planning. We focus our
deterministic effects. More precisely, the primary effect of        attention on FOND planning. Solutions to FOND planning
problems are required to be robust to non-deterministic ac-            [Jaramillo et al., 2014] Andres Calderon Jaramillo, Jicheng Fu,
tion effects, and to guarantee goal achievement during execu-             Vincent Ng, Farokh B Bastani, and I-Ling Yen. Fast strong
tion. The class of strong solutions provides such guarantees,             planning for fond problems with multi-root directed acyclic
but the existence of such solutions is restricted in general.             graphs. International Journal on Artificial Intelligence Tools,
On the other hand, the soundness of strong-cyclic solutions               23(06):1460028, 2014.
is predicated on fairness, and therefore its execution does not        [Jensen et al., 2004] Rune M Jensen, Manuela M Veloso, and Ran-
guarantee goal achievement, in general, when the fairness as-             dal E Bryant. Fault tolerant planning: Toward probabilistic uncer-
sumption is not valid.                                                    tainty models in symbolic non-deterministic planning. In Proc. of
                                                                          the 14th Int’l Conference on Automated Planning and Scheduling
   To mitigate for this, we introduced the model for Fully Ob-
                                                                          (ICAPS), pages 335–344, 2004.
servable Non-Deterministic Planning with Labeled Unfair-
ness (FOND+ ), where fairness of action transitions can be             [Kolobov et al., 2009] Andrey Kolobov, Mausam, and Daniel S
described explicitly by means of the labeling function L, and             Weld. Retrase: Integrating paradigms for approximate proba-
                                                                          bilistic planning. In Proc. of the 21st Int’l Joint Conference on
solutions guarantee goal achievement provided that execu-
                                                                          Artificial Intelligence (IJCAI), pages 1746–1753, 2009.
tions are L-fair. We leave complexity analysis of our model
to future work.                                                        [Kupferman and Vardi, 1996] Orna Kupferman and Moshe Y
                                                                          Vardi. Verification of fair transition systems. In Computer Aided
   We distinguished four classes of solutions to FOND+ plan-
                                                                          Verification, pages 372–382. Springer, 1996.
ning problems. Namely, strictly fair, strictly unfair, mixed,
and normative. Strictly fair (resp. strictly unfair) solutions         [Lussier et al., 2007] Benjamin Lussier, Matthieu Gallien, Jérémie
are strong-cyclic (resp. strong) solutions to FOND relax-                 Guiochet, Félix Ingrand, Marc-Olivier Killijian, and David Pow-
                                                                          ell. Fault tolerant planning for critical robots. In 7th Annual
ations of the problem. This results in a continuum between
                                                                          IEEE/IFIP Int’l Conf. on Dependable Systems and Networks
strong planning and strong-cyclic planning. Additionally, we              (DSN), pages 144–153. IEEE, 2007.
found a connection between normative solutions and fault-
                                                                       [Muise et al., 2012] Christian Muise, Sheila A. McIlraith, and
tolerant planning. In particular, when primary action transi-
                                                                          J. Christopher Beck. Improved Non-deterministic Planning by
tions have L(s, a, s0 ) = F, normative solutions are solutions
                                                                          Exploiting State Relevance. In Proc. of the 22th Int’l Conference
to 1-primary normative models of fault-tolerant planning. No-             on Automated Planning and Scheduling (ICAPS), pages 172–
tably, our model does not impose a bound on the number                    180, 2012.
of faults during policy execution, and solutions are robust to
                                                                       [Pineda et al., 2013] Luis Enrique Pineda, Yi Lu, Shlomo Zilber-
any finite number of faults during execution. This overcomes
                                                                          stein, and Claudia V Goldman. Fault-tolerant planning under un-
the limitation of fault-tolerant planning, where that the num-            certainty. In Proc. of the 23rd Int’l Joint Conference on Artificial
ber of faults κ that a plan can tolerate must be designated               Intelligence (IJCAI), 2013.
prior to policy generation and execution. We leave further ex-
                                                                       [Puterman, 1994] Martin Puterman. Markov Decision Processes:
ploration of the connection between the FOND+ model and
                                                                          Discrete Dynamic Programming. Wiley, New York, 1994.
fault-tolerant planning to future work.
   Finally, we presented algorithms to find different classes          [Reiter, 2001] Raymond Reiter. Knowledge in Action: Logical
                                                                          Foundations for Specifying and Implementing Dynamical Sys-
of solutions to FOND+ planning problems, and an algorithm
                                                                          tems. MIT Press, Cambridge, MA, 2001.
to find general solutions to FOND+ . Our algorithms aim to
provide a baseline for future enhancements, although the per-          [Sardina and D’Ippolito, 2015] Sebastian Sardina and Nicolas
formance of Algorithm 1 suggests that a branch-and-bound                  D’Ippolito. Towards fully observable non-deterministic planning
                                                                          as assumption-based automatic synthesis. In Proc. of the 24th
search scheme that explores deterministic plans in a certain
                                                                          Int’l Joint Conference on Artificial Intelligence (IJCAI), pages
order is effective. We leave the design and evaluation of bet-            3200–3206. AAAI Press, 2015.
ter algorithms to future work.
                                                                       [Trevizan et al., 2007] Felipe W Trevizan, Fabio Gagliardi Coz-
                                                                          man, and Leliane Nunes de Barros. Planning under risk and
References                                                                knightian uncertainty. Proc. of the 20th Int’l Joint Conference
[Cimatti et al., 2003] Alessandro Cimatti, Marco Pistore, Marco           on Artificial Intelligence (IJCAI), 2007:2023–2028, 2007.
   Roveri, and Paulo Traverso. Weak, strong, and strong cyclic         [Waldinger, 1977] Richard Waldinger. Achieving several goals si-
   planning via symbolic model checking. Artificial Intelligence,         multaneously. In Machine Intelligence 8, pages 94–136. Ellis
   147:35–84, 2003.                                                       Horwood, Edinburgh, Scotland, 1977.
[Delgrande and Levesque, 2013] James P Delgrande and Hector J
  Levesque. A formal account of nondeterministic and failed ac-
  tions. In Proc. of the 23rd Int’l Joint Conference on Artificial
  Intelligence (IJCAI). Citeseer, 2013.
[D’Ippolito et al., 2011] Nicolás D’Ippolito, Victor Braberman, Nir
   Piterman, and Sebastián Uchitel. Synthesis of live behaviour
   models for fallible domains. In 33rd Int’l Conf. on Software En-
   gineering (ICSE), pages 211–220. IEEE, 2011.
[Domshlak, 2013] Carmel Domshlak. Fault tolerant planning:
  Complexity and compilation. In Proc. of the 23th Int’l Confer-
  ence on Automated Planning and Scheduling (ICAPS), 2013.