=Paper=
{{Paper
|id=Vol-2538/paper6
|storemode=property
|title=Compiling Planning Problems with Non-deterministic Events into FOND Planning
|pdfUrl=https://ceur-ws.org/Vol-2538/paper6.pdf
|volume=Vol-2538
|authors=Lukas Chrpa,Martin Pilat,Jakub Gemrot
|dblpUrl=https://dblp.org/rec/conf/aiia/ChrpaPG19
}}
==Compiling Planning Problems with Non-deterministic Events into FOND Planning==
Compiling Planning Problems with Non-deterministic Events into FOND Planning Lukáš Chrpa1,2 , Martin Pilát1 , and Jakub Gemrot1 1 Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic 2 Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic Abstract. Automated Planning seeks to find a sequence of actions, a plan, transforming the environment from its initial state to some goal state. In real-world environments, however, exogenous events might oc- cur and might modify the environment without agent’s consent. Besides disrupting agent’s plan, events might hinder agent’s pursuit towards its goals and even cause damage (e.g. destroying the robot). In this paper, we present how planning problems with non-deterministic events can be translated into Fully-Observable Non-Deterministic (FOND) planning problems and hence we can exploit FOND planning engines to solve them (i.e., find strong cyclic plans). Specifically, we will consider two cases in a single agent scenario – at most one event can occur af- ter agent’s action or a set of independent events can occur after agent’s action. We compare the FOND-based approach with a traditional plan- ning/execution/replanning one to highlight the performance gap as well as success rate of the latter approach. 1 Introduction Planning and acting in real-world scenarios [8] encounters numerous challenges. For example, non-deterministic events (e.g. failure to establish communication, or a blocked passage) that might (or might not) happen under specific circum- stances can change the environment without the consent of the actor. The concept of events in planing is not new [5] and was used in some systems such as Circa [12]. These systems, however, reason with a very small state space. MDP-based approaches consider events [9] as well as Monte Carlo Tree Search based approaches [14]. They, however, focus on selecting the most promising action in a current state. Fully-Observable Non-Deterministic (FOND) planning considers non-deterministic action effects [3] that we can leverage for problems with non-deterministic events. In this paper, we focus on a single-agent planning in fully observable environ- ment with deterministic action effects and non-deterministic exogenous events Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). where, roughly speaking, the task is to find policies such that the agent even- tually reaches its goal (i.e., strong cyclic plans) under the fairness assumption that each event when applicable has a chance to occur. We consider two as- sumptions such that in the first one after agent’s action at most one event can occur, while in the second one after agent’s action a set of independent events can occur. We show how planning problems with events under both assump- tions can be compiled to FOND planning that focuses on a different challenge – non-deterministic action effects (note that, for example, problems with partially observable environment can also be compiled to FOND [11]). Solutions of the compiled FOND problems (strong cyclic plans) can be translated to solutions of problems with non-deterministic events. Such an approach guarantees com- pleteness under the fairness assumption, in other words, solutions of problems with events are “safe” to execute and the agent eventually reaches its goal. Con- sequently, solutions are “robust” even for problems with dead-ends which, on the other hand, are problematic for the planning/execution/replanning (PER) approach (e.g. FF-Replan [17]) that interleaves planning and execution phases while replanning if the next action becomes inapplicable, or events change the environment. We experimentally compare the FOND-based approach with a tra- ditional planning/execution/replanning one to highlight the performance gap as well as success rate of the latter approach. 2 Preliminaries Classical planning, in particular, assumes a static, deterministic and fully ob- servable environment; a solution plan amounts to a sequence of actions. Tech- nically, a classical planning domain model is a tuple D = (L, A), where L is the set of propositional atoms used to describe the state of the environ- ment (set of propositions from L that are true), and A is the set of actions over L. An action is a tuple a = (pre(a), del (a), add (a)), where pre(a), del (a) and add (a) are sets of atoms from L representing a’s precondition, delete, and add effects, respectively. We assume that del (a) ∩ add (a) = ∅. An action a is applicable (or executable) in a state s if and only if pre(a) ⊆ s. If possi- ble, application (or execution) of a in s, denoted as γ(s, a), yields the succes- sor state of the environment (s \ del (a)) ∪ add (a), otherwise γ(s, a) is unde- fined. The notion of applicability can be extended to sequences of actions, i.e., γ(s, ha1 , . . . , an i) = γ(. . . γ(s, a1 ) . . . , an ). A classical planning problem is a tuple P = (D, I, G), where D is a planning domain model, I is the initial state of the environment, and G is the goal, generally in the form of a set of propositions. A solution plan (for a planning problem) is a sequence of actions such that their consecutive application starting in the initial state results in a state satisfying the goal (i.e., a goal state). 2.1 FOND Planning Fully Observable Non-Deterministic (FOND) planning assumes fully observable and static environment but in contrast to classical planning actions have non- deterministic effects [3, 6]. In particular, the result of application might be one of the sets of effects. Formally a non-deterministic action is a tuple a = (pre(a), del 1 (a), add 1 (a), . . . , del k (a), add k (a)), where pre(a) its precondition, and del 1 (a), add 1 (a), . . . , del k (a), add k (a) its delete and add effects respectively. Action applicability is the same as for classical planning. The result of applying a in s (if possible) is one of the states from {s0 | s0 = (s \ del i (a)) ∪ add i (a), 1 ≤ i ≤ k}. The FOND planning problem definition is analogous to the classical planning problem definition. The task in FOND planning is to find a strong (a)cyclic plan, which forms an (a)cyclic graph such that its nodes are states (including the initial state) and directed edges go from each state where a (single) non-deterministic action can be applied to all possibly resulting states, and from all states there is a path to a goal state. In plain words, strong cyclic plans guarantee that under the fairness assumption, i.e., each action outcome has a chance to occur, the agent will eventually achieve its goal [3]. 2.2 Events Similarly to the definition of an action, an event is a tuple e = (pre(e), del (e), add (e)), where pre(e), del (e) and add (e) are sets of atoms representing e’s pre- condition, delete, and add effects, respectively. We assume that del (e)∩add (e) = ∅. Applicability of an event in a state as well as the result of an application (or execution) of an event is defined in the same way as for actions. In contrast to actions that are executed by agents, events can occur regardless of agent’s consent. Technically, an event can (but does not necessarily have to) occur in a state where event’s precondition is met modifying the state of the environment according to event’s effects. A planning domain model , in this case, is a triple D = (L, A, E), where L is the set of propositions, A and E is the set of actions and events over L, respectively. A planning problem, P = (D, I, G), is defined analogously to the classical planning case. We define a noop action/event that represents that no action has been taken by agent, or event has occurred. Formally, pre(noop) = del (noop) = add (noop) = ∅. We say that events ei , ej are independent if and only if del (ei ) ∩ (pre(ej ) ∪ add (ej )) = ∅ and del (ej ) ∩ (pre(ei ) ∪ add (ei )) = ∅. The following assumption simplifies the reasoning by considering a single- agent scenario such that actions of the agent and events of the environment alter like in a two-players game. The process hence follows the pattern in which the agent can apply an action (not necessarily has to), then the environment can trigger (apply) an event (not necessarily has to), and so on. Formally: Assumption 1. Let D = (L, A, E) be a planning domain model. In a current state s ∈ 2L , the agent can apply an action a ∈ A ∪ {noop} such that a is applicable in s. After that, a (arbitrarily selected) event e ∈ E∪{noop}, applicable in γ(s, a), is applied resulting in a state s0 = γ(γ(s, a), e) which will become a new current state. We say that s0 is a successor state of s, a. The following assumption extends the previous assumption by considering the pattern in which the agent can apply an action (not necessarily has to), then the environment can trigger (apply) a set of independent events (not necessarily has to), and so on. Formally: Assumption 2. Let D = (L, A, E) be a planning domain model. In a current state s ∈ 2L , the agent can apply an action a ∈ A ∪ {noop} such that a is applicable in s. After that, a (arbitrarily selected) set of independent events E i ⊆ E ∪{noop}, applicable in γ(s, a), is applied resulting in a state s0 = γ(γ(s, a), Ei ) which will become a new current state. We say that s0 is a successor state of s, a. Let D = (L, A, E) be a planning domain model. We say that a state s0 ∈ 2 is reachable from a state s ∈ 2L with respect to D and Assumption 1 L (Assumption 2) if and only if there exists a sequence of actions and events (sets of independent events), including noops, such that its consecutive application in s results in s0 . Otherwise, we say that s0 is unreachable from s. Let P = (D, I, G) be a planning problem. Atoms p, q for which it is the case that there does not exist a state s reachable from I such that p, q ∈ s are mutually exclusive, or shortly mutex . Solutions in non-deterministic planning are in the form of policies, mappings from states to actions (similarly to probabilistic or FOND planning, for example). Policies, in plain words, provide information about which action the agent has to apply in a current state of the environment. As we assume events might be triggered between actions the agent executes, the policy execution interleaves agent’s actions (including “noop”) and environment’s events (including “noop”). Definition 1. Let D = (L, A, E) be a planning domain model and P = (D, I, G) be a planning problem. We say that Π : 2L → A ∪ {noop} is a policy for P such that for each s ∈ 2L reachable from I, Π(s) is applicable in s. In analogy to FOND planning, we can define a strong cyclic plan that rep- resents a policy that, roughly speaking, has to guarantee that the agent will eventually achieve its goal under the fairness assumption that each applicable event/set of independent events (including “noop”) in each (reachable) state has a chance to be applied. The strict variant, i.e., a strong acyclic plan, guarantees that the agent always reaches a goal state in a finite number of steps (no state can be visited more than once). Definition 2. Let D = (L, A, E) be a planning domain model and P = (D, I, G) be a planning problem. Let π be a policy for P. We construct a directed graph G such that its nodes are states S ⊆ 2L , I ∈ S, for each s ∈ S and each its successor state s0 of s, π(s) there is an edge from s to s0 . A strong cyclic plan for P is a policy π (for P) such that for each state s in G there is a path to a goal state. A strong acyclic plan for P is a policy π (for P) such that for each state s in G there is a path to a goal state and G is acyclic. 2.3 Relations between Actions and Events Actions as well as events influence each other by achieving atoms that are re- quired by other actions/events, or, in contrast, delete atoms required by other actions/events. An early work of Chapman [1] studies relations between actions such as being an “achiever” (i.e., one action creates an atom for another actions) or being a “cloberrer” (i.e., an action deletes an atom required by another ac- tion). Inspired by this work, we define notions of an enabler and a disabler that denote whether an action or event achieves, or deletes, an atom for an event. Definition 3. Let D = (L, A, E) be a planning domain model. We say that an action/event x ∈ A ∪ E is an enabler for an event e ∈ E if add(x) ∩ pre(e) 6= ∅. We also say that x is a sole enabler for e if add(x) ⊇ pre(e). Definition 4. Let D = (L, A, E) be a planning domain model. We say that an action/event x ∈ A ∪ E is a disabler for an event e ∈ E if del(x) ∩ pre(e) 6= ∅. 2.4 Conditional Effects Standard action (and event) definition can be extended by Conditional Effects that capture possible state changes if some extra condition is met in the cur- rent state. Formally, a conditional effect of an action (or event) x is spec- ified as ceff (x) = (cond (x), cdel (x), cadd (x)) where cond (x) is a set of atoms representing a condition and cdel (x), cadd (x) are sets of atoms representing conditional delete and add effects respectively. In plain words, if a condition in a current state is met, then the effects take place in the resulting state af- ter action application. Action (or event) definition can be extended as follows x = (pre(x), del (x), add (x), ceff 1 (x), . . . , ceff k (x)), where ceff 1 (x), . . . , ceff k (x) are conditional effects of x. Applicability of x in a state s is still determined as whether pre(x) ⊆ s. The result of application of x in s is [ [ s \ (del (x) ∪ cdel i (x)) ∪ add(x) ∪ cadd i (x) cond i (x)⊆s cond i (x)⊆s In a nutshell, conditional effects do not influence action (or event) applicabil- ity but modify the resulting state if the conditions are met in the current state. Conditional effects can be compiled away, i.e., an equivalent classical represen- tation can be obtained, however, the representation either grows exponentially or the length of solution plans grows polynomially [13]. For the sake of clarity, we will use conditional effects in our compilations of planning problems with events into FOND planning. Conditional effects are now supported by many state-of-the-art classical planners [16] as well as the FOND planner PRP [10]. 3 Case Studies To illustrate our concepts we introduce three case studies. 3.1 “Sticky” and “Slippery” BlocksWorld We introduce a variant of the well known 4-ops BlocksWorld domain [7]. In this variant, we have two types of robotic hands that can move blocks. One robotic hand, called “slippery”, is not able to hold blocks firmly, so they might fall down to the table while held by this hand. The other robotic hand, called “sticky”, is able to hold blocks firmly, however, blocks will get “sticky” and might eventually become unmovable, i.e., get sticked to the table or to another block, so no robotic hand can unstack them, or pick them up. Technically speaking, the operators unstack, stack, pickup and putdown are amended as follows. Unstack and pickup are split into “slippery” and “sticky” variants, where each variant requires the corresponding robotic hand as a precon- dition. Precondition of both variants of unstack and pickup contains a (movable ?b) predicate representing whether a block ?b can be moved by either robotic hand. The add effects of the “sticky” variant of unstack and pickup is extended by a (sticky ?b) predicate representing that a block ?b became “sticky”. We define two events such that an event slip causes the block held by the “slippery” hand to fall onto the table and an event stick makes the “sticky” block unmovable (by removing the (movable ?b) predicate). A strong acyclic plan unstacks blocks from their initial position with the “slippery” hand and puts them down on the table (if they do not fall down by themselves) and then stack them on their goal positions with the “sticky” hand (if they are in the goal positions they do not have to be moved). A strong cyclic plan, for instance, can afford to use only the “slippery” hand, since blocks that fell down can be picked up again (and eventually be placed into their goal positions). 3.2 AUV Sampling We introduce a simplified variant of task planning for AUV, inspired by the recent work of Chrpa et al. [2]. It simulates the situation where an AUV has to perform sampling of some objects of interest while there might be ships passing by that might endanger the AUV. We have a 4-grid environment, an AUV, a ship and several resources. Resources can be found on given cells. Each cell is either free, has the AUV on it, or the ship on it (presence of a resource does not interfere with any cell status). The AUV can move to an adjacent cell, if the cell is free. The AUV can sample a resource if it is at the same cell. The task for the AUV is to sample the resources and return back to the place of origin. Ships, however, are not controlled by the agent, i.e., ships are controlled by the environment. Ships can move only on some cells of the grid or might not be present in the area. Each ship can enter the area at its entry cells, can move to adjacent cells according to its route, and leave the area at its exit cells. A ship can appear in its entry cell, if the ship is not already in the area. A ship can leave the area, if it is in its exit cell. Two “move” events are considered, move-ship-to- free and move-ship-to-auv. Both require that the ship can move to the destination cell. The effect of both events is that the ship moves to the destination cell. If the ship moves to a free cell, then besides the cell becomes not free for a moment, nothing else happens. However, if the ship moves to the cell with the AUV, then the AUV is destroyed (and can no longer perform any action). A strong cyclic plan has to avoid situations in which the AUV is, after ap- plying its action, next to any ship, or in any ship’s entry point (if the ship is not yet in the area). Strong acyclic plans might not always exist, since ships can (if being active adversaries) prevent the AUV to get to some resources. 3.3 The “Perestroika” domain The Perestroika domain we introduce here is inspired by the well known Per- estroika game (also known as a Toppler game). In our domain, an agent has to navigate through a 4-grid of solid and shrinking platforms and collect all re- sources that can be placed on solid platforms. Solid platforms remain stable, i.e., they do not change its size nor disappear. In contrary, the shrinking platforms can have large, medium or small shape, or disappear completely. The agent can perform two types of actions. It can move to a neighbouring platform (if it has not disappeared) and/or collect a resource if the resource is on the same platform as the agent. Shrinking platforms are affected by five events each. Two events change the shape of the platform from large to medium and from medium to small, respectively. Two events make the platform disappear if it has a small shape – the difference is whether the platform is empty or the agent is on it. In the former case, the platform just disappears whereas in the latter case it kills the agent. The last event allow the platform to reappear in a large shape. A strong cyclic plan has to avoid situations in which the agent moves to or stays on small platforms. Also, the agent must always have an escaping way to a nearby solid platform to avoid being “trapped” on a shrinking platform which might disappear at some point and kill the agent. Similarly to the AUV domain, strong acyclic plans might not always exist as the agent might have to wait until the shrinking platforms are large enough to be safely crossed. 4 Translating Planning with Events into FOND Planning In FOND planning, the agent does not have control of action outcome but the outcome is determined immediately after action execution finishes. In contrast, an event might be applicable even without “enabling” it by agent’s actions (e.g., the event preconditions are met in the initial state), or when “enabled” by agent’s action event’s applicability does not cease until some other action or event “dis- ables” it. 4.1 Considering Assumption 1 Let D = (L, A, E) be a domain model. We construct an equivalent FOND domain model DF = (LF , AF ) as follows. We introduce atoms (act-turn) and (ev-turn) https://en.wikipedia.org/wiki/Perestroika (video game) to determine “action” and “event” turn respectively. For each event ei ∈ E we introduce atoms (enab-ei) and (disab-ei) representing whether ei is applicable in a current state or not, and an atom (selected-ei) representing whether ei has been selected. LF is constructed by a union of the introduced atoms and L (without loss of generality we assume that none of the introduced atoms is in L). An action aj ∈ A is translated into aF F j ∈ A as follows: pre(aF j ) = pre(aj ) ∪ {(act-turn)} del (aF j ) = del (aj ) ∪ {(act-turn)} ∪ ∪{(enab-ei) | aj is a disabler for ei } ∪ ∪{(disab-ei) | aj is a sole enabler for ei } add (aF j ) = add (aj ) ∪ {(ev-turn)} ∪ ∪{(disab-ei) | aj is a disabler for ei } ∪ ∪{(enab-ei) | aj is a sole enabler for ei } For each ei ∈ E such that aj is a non-sole enabler for ei ceff ei (aF j ) = (pre(ei ) \ add (aj ), (disab-ei), (enab-ei)) On top of the actions defined in the domain model we have to explicitly model the “noop” action aF F noop ∈ A that only switches from the action to the event turn (as the agent decided to do nothing in its turn). In particular, pre(aF noop ) = del (aF noop ) = {(act-turn)} and add (aF noop ) = {(ev-turn)}. The aFj actions can be executed in the “action turn” and since they cor- respond to the actions in A their effects are deterministic. After executing an action we move to the “event turn”. In the “event turn”, we have to “transfer” non-deterministic events into non- deterministic action effects. We can do that in two steps. First, we select an event or “noop” by non-deterministic action effects. Then, if the selected event is enabled, the event is executed, otherwise (if the event is disabled) the case is considered as “noop”. The “event selecting” action asel ∈ AF is encoded as follows, del 0 (asel ) and add 0 (asel ) represent the selection of “noop” (i.e., no event will be executed in the current “turn”) while del i (asel ) and add i (asel ) represent the selection of an event ei ∈ E (all the events in E are considered): pre(asel ) = {(ev-turn)} del 0 (asel ) = {(ev-turn)} add 0 (asel ) = {(act-turn)} del i (asel ) = {(ev-turn)} add i (asel ) = {(selected-ei)} Then, for each event ej ∈ E we construct two actions aeej ∈ AF and adej ∈ AF representing an execution of ej if enabled or resorting to the “noop” case if ej is disabled: pre(aeej ) = {(selected-ej), (enab-ej)} del (aeej ) = del (ej ) ∪ {(selected-ej)} ∪ ∪{(enab-ei) | ej is a disabler for ei } ∪ ∪{(disab-ei) | ej is a sole enabler for ei } add (aeej ) = add (ej ) ∪ {(act-turn)} ∪ ∪{(disab-ei) | ej is a disabler for ei } ∪ ∪{(enab-ei) | ej is a sole enabler for ei } For each ei ∈ E such that ej is a non-sole enabler for ei ceff ei (aeej ) = (pre(ei ) \ add (ej ), {(disab-ei)}, {(enab-ei)}) pre(adej ) = {(selected-ej), (disab-ej)} del (adej ) = {(selected-ej)} add (adej ) = {(act-turn)} For a planning problem P = (D, I, G), the corresponding FOND problem P F = (DF , I F , G) is constructed such that I F = I ∪ {(act-turn)} ∪ {(enab-ei) | pre(ei ) ⊆ I} ∪ {(disab-ei) | pre(ei ) 6⊆ I}. We can see that to solve the FOND planning problem we have to simulate “action” and “event” turns. The (act-turn), (ev-turn), (selected-ei) atoms ensure the applicability of an action, of the event selection action, of an event, re- spectively. In particular, in the action turn an action is (non-deterministically) selected. The event turn comprises an “event selecting” action and possibly an “event” action. The “event selecting” action represents event selection (or noop selection) in its non-deterministic effects. Noop selection skips the “event” action (as no event occurs). Otherwise, if an event ei is selected ((selected-ei) becomes true), then either aeei or adei can be applied depending whether ei is applicable or not. Applying aeei simulates ei application while adei simulates noop as ei is inap- plicable. Also, we have to maintain information about applicability of particular events, which is done through the (enab-ei) and (disab-ei) atoms. If an action or event is a sole enabler for ei , then it adds (enab-ei) and removes (disab-ei). Analogously, if an action or event is a disabler for ei , then it adds (disab-ei) and removes (enab-ei). Let π F be a strong cyclic plan of P F , then a strong cyclic plan π for a F corresponding planning problem P is constructed as follows. For each sF ∈ 2L F F F F F F and ai ∈ A such that π (s ) = ai we define π(s ∩ L) = ai . In plain words, we consider only pairs hstate,actioni from the strong cyclic plan associated with actions translated from actions defined in P. As states of P F contain additional atoms, these have to be removed to obtain states of P. It can be observed that if a strong cyclic plan of P F is found there is no “open state” which is not a goal state. As events occur implicitly (as in Assumption 1), the corresponding strong cyclic plan of P has to consider only “action turns”. The above is summarised in the following theorem. Theorem 1. Let P be a planning problem (with events) and P F be its trans- lation into a FOND problem (as described above). Then, π F is a strong cyclic plan of P F if and only if π (constructed from π F as above) is a strong cyclic plan of P. Proof (Sketch). To find a strong cyclic plan of P F the “action” and “event” turns alternate. In particular, in the action turn a planner (non-deterministically) selects an action. The event turn comprises an “event selecting” action and pos- sibly an “event” action so simulate all possibilities of event occurrence. The “event selecting” action represent possibilities of event selection (or noop selec- tion) in its non-deterministic effects. Noop selection skips the “event” action (as no event occurs). Otherwise, if an event ei is selected ((selected-ei) becomes true), then either aeei or adei can be applied depending whether ei is applicable or not. Applying aeei simulates ei application while adei simulates noop as ei is inapplicable. It can be seen that (enab-ei) is true if and only if ei is applicable while (disab-ei) is true if and only if ei is inapplicable. Multiple noops caused by disabled events result in the same state as the regular noop. Enabled events result in (possibly) new states that have to be explored. If a strong cyclic plan of P F is found there is no “open state” which is not a goal state. As events occur implicitly (as in Assumption 1), the corresponding strong cyclic plan of P has to consider only “action turns”. 4.2 Considering Assumption 2 Let D = (L, A, E) be a domain model. We construct an equivalent FOND do- main model DF = (LF , AF ) as follows. We introduce atoms (act-turn), (enab-ei), (disab-ei) and (selected-ei) as in the previous case. On top of that, we introduce atoms (ev-turn) and (ev-turn2) representing two parts of the event turn and for each event ei ∈ E, an atom notsel-ei representing that ei has not been selected (in a given turn), and an atom (wenab-ei) representing that ei will be enabled in the next turn. Similarly to the previous case, LF is constructed by a union of the introduced atoms and L (without loss of generality we assume that none of the introduced atoms is in L). The translation of each action aj ∈ A into aF j ∈ A F as well as the noop action is the same as in the previous case. The “event selecting” action asel ∈ AF has to reflect selection of sets of independent events (including noop). That involves calculating power sets for all sets of independent events. To reduce the number of these power sets we can exploit mutexes such that events whose preconditions are mutex are not present in any set together (despite being independent). In the AUV domain, for example, a single ship cannot move between different locations at the same time as it can be at at most one location, or outside at the same time. The precondition of asel as well as the effects representing the noop event selection is the same as in the previous case. For a set of independent events {ei1 , . . . , eim } ⊆ E we define non-deterministic effects del i (asel ) and add i (asel ) in which all the sets of independent events from E are considered: del i (asel ) = {(ev-turn), (notsel-ei1), . . . , (notsel-eim)} add i (asel ) = {(ev-turn2), (selected-ei1), . . . , (selected-eim)} For the “event” actions aeej ∈ AF and adej ∈ AF representing an execution of ej (if enabled or resorting to the “noop” case if disabled), the main difference, in contrast to the previous case, is that after applying the “event” action, we are still in the “event” turn (hence we can apply all the events from the selected set). Also, enabling events must not influence applicability of selected events because the selected (independent) events have to be applied at one step. It, however, might be that case that one independent event is an enabler for another independent event (on the other hand, an independent event cannot be a disabler for another independent event) and thus it has to be indicated that an event will be enabled in the following step (by the wenab atom). pre(aeej ) = {(selected-ej), (enab-ej)} del (aeej ) = del (ej ) ∪ {(selected-ej)} ∪ ∪{(enab-ei), (wenab-ei) | ej is a disabler for ei } add (aeej ) = add (ej ) ∪ {(notsel-ej)} ∪ ∪{(disab-ei) | ej is a disabler for ei } ∪ ∪{(wenab-ei) | ej is a sole enabler for ei } For each ei ∈ E such that ej is a non-sole enabler for ei ceff ei (aeej ) = (pre(ei ) \ add (ej ), {}, {(wenab-ei)}) pre(adej ) = {(selected-ej), (disab-ej)} del (adej ) = {(selected-ej)} add (adej ) = {(notsel-ej)} Resorting back to the “action” turn can be done after all the selected events (respectively their corresponding “event” actions) were applied. In other words, if no event is selected, then we can resort to the “action” turn while enabling events that were marked as “will be enabled”. The “resorting” action ares ∈ AF is defined as follows: pre(ares ) = {(ev-turn2), (notsel-e1), . . . , (notsel-en)} del (ares ) = {(ev-turn2)} add (ares ) = {(act-turn)} For each ei ∈ E : ceff i (ares ) = ({(wenab-ei)}, {(wenab-ei),(disab-ei)}, {(enab-ei)}) For a planning problem P = (D, I, G), the corresponding FOND problem P F = (DF , I F , G) is constructed in such a way that I F = I ∪ {(act-turn)}∪ {(enab-ei) | pre(ei ) ⊆ I} ∪ {(disab-ei) | pre(ei ) 6⊆ I} ∪ {(notsel-ei) | ei ∈ E}. Similarly to the previous case, to solve the FOND planning problem we sim- ulate action and event turns. The main difference is that in this case more (independent) events can be selected in one turn. Therefore, the event turn is divided into three stages, event selection, event application and resorting to the action turn. On top of that, as events might enable other events, even one in- dependent event can be an enabler of another independent event, to correctly simulate the event turn we have to postpone a possible enabling of events until the end of the turn. Technically speaking, the “event selecting” action selects a set of independent events that are all sequentially processed by “event” actions. The effects of a selected event take place if the event is enabled, otherwise the event is skipped. However, as the event effects might enable other events (even independent ones) it is important to postpone the information about newly en- abled events until all the selected events were processed. The information about newly enabled events has to be kept (the wenab atoms) and when all the selected events are processed, the “resorting” action uses the information to determine enabled events for the next turn (the enab atoms). Let π F be a strong cyclic plan of P F , then a strong cyclic plan π for a corresponding planning problem P is constructed as follows. Analogously to the F previous case, for each sF ∈ 2L and aF F F F F i ∈ A such that π (s ) = ai we define F π(s ∩ L) = ai . In plain words, we consider only pairs hstate,actioni from the strong cyclic plan associated with actions translated from actions defined in P. Also the additional atoms defined in P F have to be removed to obtain states of P. It can be observed that if a strong cyclic plan of P F is found there is no “open state” which is not a goal state. As events occur implicitly (as in Assumption 2), the corresponding strong cyclic plan of P has to consider only “action turns”. The above is summarised in the following theorem. Theorem 2. Let P be a planning problem (with events) and P F be its trans- lation into a FOND problem (as described above). Then, π F is a strong cyclic plan of P F if and only if π (constructed from π F as above) is a strong cyclic plan of P. Proof (Sketch). Analogously to the proof sketch of Theorem 1, the “action” and “event” turns alternate and we keep track of which events are enabled or disabled. The difference is that a set of independent events can occur at once in a single event turn. An “event selecting” action therefore selects a set of indepen- dent events. Selected events are processed in a sequence one by one (by “event” actions) such that event effects take place if the event is enabled, otherwise the event is considered as “noop”. However, as “event” actions are processed in se- quence, effects of one might enable an event that is going to be processed later. That would have deviated from Assumption 2 as such an event is inapplicable in the state before the set of independent events can occur. To prevent enabling events that are disabled in a given turn, we use the wenab atoms that condition- ally enable events. Conditional enabling if persists becomes “real” at the end of the event turn. The “resorting” action, which switches from event to action turn, can be applied only if all selected events have been processed (i.e., all the events Assumption 1 Assumption 2 Problem RA-S RA-T RE-S RE-T FOND RA-S RA-T RE-S RE-T FOND AUV-p1 71 0.28 75 1.05 203.26 74 0.16 70 1.03 69.92 AUV-p2 49 0.45 39 1.50 - 37 0.30 24 1.39 - AUV-p3 83 0.44 80 6.04 273.80 76 0.23 88 3.31 986.96 AUV-p4 57 0.75 57 8.10 - 49 0.42 57 3.94 - AUV-p5 87 0.57 75 9.24 - 50 0.35 47 5.08 - BW-p1 100 0.36 100 0.91 16.58 100 0.61 100 0.98 234.73 BW-p2 100 0.70 100 1.42 21.34 100 0.72 100 1.80 29.25 BW-p3 23 1.28 95 3.99 60.54 72 2.28 92 4.36 - BW-p4 100 3.63 100 8.76 19.05 100 3.89 100 6.60 169.26 BW-p5 0 - 41 34.84 - 0 - 55 49.03 - BW-p6 38 20.73 66 42.47 27.24 30 28.54 34 51.51 - PER-p1 65 0.29 76 1.00 15.20 58 0.21 46 1.02 - PER-p2 71 0.30 82 1.01 14.33 52 0.29 37 1.09 - PER-p3 81 0.49 66 2.45 - 24 0.54 15 2.71 - PER-p4 69 0.37 71 2.46 - 18 0.43 10 2.66 - PER-p5 80 0.46 59 3.47 - 14 0.49 13 3.14 - Table 1. (S)uccess rate (%) and average (T)ime (successful runs only) of the replanning approach – the next action becomes inapplicable (RA) or event(s) occur (RE), and the planning time of the FOND approach. All times are in seconds. are “not selected”). On top of that, the “resorting” action makes all condition- ally enabled events enabled. Noteworthy, the selecting action uses the (ev-turn) atom and achieves the (ev-turn2) atom for the resorting action such that they are applied in the correct order. Disabling events cannot affect those selected in the current event round be- cause one of the properties for events being independent is that delete effects of one event are disjoint with preconditions of other (independent) events. Also, disabling an event (even non-selected) is “permanent” for a given turn as the other property of independence guarantees that add effects are disjoint with delete effects among independent events. Hence any atom deleted by one event cannot be re-achieved by another independent event. If a strong cyclic plan of P F is found there is no “open state” which is not a goal state. As events occur implicitly (as in Assumption 2), the corresponding strong cyclic plan of P has to consider only “action turns”. 5 Experimental Evaluation The aim of the experiments is to measure performance gap between a complete (offline) FOND approach and the (online) PER approach for solving planning problems with non-deterministic events. Also, we would like to highlight how the success rate of the latter approach can drop in problems with dead-ends. For our experimental evaluation, we specified the following problems. In BW, Problems 1,3,5 consider one sticky and one slippery hand, while Problems 2,4,6 two slippery hands only. Problems 1,2 have 5 blocks, 3,4 have 10 blocks and 5,6 have 15 blocks. For AUV, three resources are located in (or near) corners for each of the problems. Problems 1 and 2 are on 4x4 grid. Problem 1 has one ship that can move on 3th column and 3th row from 3th column to 4th column. Problem 2 has two ships moving “to the cross” (3th row and 3th column). Problems 3 and 4 are proportionally scaled to 8x8 grid. Problem 5 has three ships moving in adjacent columns (4th to 6th) such that the middle ship moves in the opposite direction than the other two. In Perestroika, Problems 1,2 are on the 3x3 grid while problems 3,4 are on 5x5 grid such that in Problems 1 and 3 solid platforms are on coordinates that are either both odd or both even. In Problems 2 and 4, shrinking platforms are on even rows and even columns. In Problems 1,2, resources are located in the other corners and for Problems 3,4 also in the middle of the grid. In Problem 5, resources are located on all solid platforms. In all problems, the agent starts in a corner. All problems are solvable, i.e., there exists a strong cyclic plan for each problem. As a FOND planner, we used the PRP planner that also supports conditional effects [10]. For the PER approach, we considered two variants, replanning when the current action is inapplicable, and replanning always after event(s) occur- rence. As a planner we used the well known LAMA planner [15]. For each prob- lem, we considered the limit of 500 replanning episodes. The runtime limit per problem was 1800s (applies also for the FOND problems). The experiments were conducted on Intel Core i7 6700, 16GB RAM, Gentoo Linux . The results summarised in Table 1 show that the PER approach can solve the problem in a few seconds in most cases, however, at the cost of lower success rate (especially for larger Perestroika problems considering Assumption 2). The FOND approach, on the other hand, can solve only the simpler problems, which is exacerbated for Assumption 2. The results, of course, are not very surprising given as the FOND approach is complete and has to consider all possibilities of event occurrence while the PER approach “takes a chance” by ignoring events in the planning stage, which is “risky” for problems with dead-ends (e.g. the agent might step on a small shrinking platform which due to an event might disappear and kill the agent). To give an illustration why the FOND approach struggles to solve even rather toy problems, especially under the (more realistic) Assumption 2, let us see the Perestroika Problem 1 that has 4 shrinking platforms. It results, in consequence, in 16 sets of independent events (including noop) to be considered in every step (turn). Hence the number of “open states” can explode exponentially (with respect to the number of events). 6 Conclusion Planning with non-deterministic events presents a challenge of finding strong cyclic plans for agents in dynamic environments. In this paper, we present a Scripts for compilation to FOND and benchmark problems are available at https://github.com/martinpilat/events-FOND compilation of planning problems with events into FOND planning problems. We have experimentally shown that finding strong cyclic plans is viable, under reasonable time and memory constraints, for very small problems. On the other hand, the global reasoning for solving the problem might be unnecessarily ex- pensive (at least for some classes of problems). For example, in Perestroika, the agent might need to reason only with shrinking platforms that are on the way between solid platforms. Generally speaking, the agent might need to find strong cyclic plans between “safe” states [4]. In future, we plan to investigate how the problem can be decomposed such that we have to find a sequence of local policies that can be combined into a strong cyclic plan. Acknowledgements This Research was funded by the Czech Science Founda- tion (projects no. 17-17125Y and 18-07252S). References 1. Chapman, D.: Planning for conjunctive goals. Artif. Intell. 32(3), 333–377 (1987) 2. Chrpa, L., Pinto, J., Ribeiro, M.A., Py, F., de Sousa, J.B., Rajan, K.: On mixed- initiative planning and control for autonomous underwater vehicles. In: IROS. pp. 1685–1690 (2015) 3. Cimatti, A., Pistore, M., Roveri, M., Traverso, P.: Weak, strong, and strong cyclic planning via symbolic model checking. Artif. Intell. 147(1-2), 35–84 (2003) 4. Cserna, B., Doyle, W.J., Ramsdell, J.S., Ruml, W.: Avoiding dead ends in real-time heuristic search. In: AAAI (2018) 5. Dean, T., Wellman, M.: Planning and Control. Morgan Kaufmann Publishers (1990) 6. Ghallab, M., Nau, D.S., Traverso, P.: Automated Planning and Acting. Cambridge University Press (2016) 7. Gupta, N., Nau, D.S.: On the complexity of blocks-world planning. Artif. Intell. 56(2-3), 223–254 (1992) 8. Ingrand, F., Ghallab, M.: Deliberation for autonomous robots: A survey. Artif. Intell. 247, 10–44 (2017) 9. Mausam, Kolobov, A.: Planning with Markov Decision Processes: An AI Perspec- tive. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers (2012) 10. Muise, C.J., McIlraith, S.A., Beck, J.C.: Improved non-deterministic planning by exploiting state relevance. In: ICAPS (2012) 11. Muise, C.J., McIlraith, S.A., Belle, V.: Non-deterministic planning with conditional effects. In: ICAPS (2014) 12. Musliner, D.J., Durfee, E.H., Shin, K.G.: CIRCA: a cooperative intelligent real- time control architecture. IEEE Trans. Systems, Man, and Cybernetics 23(6), 1561–1574 (1993) 13. Nebel, B.: On the compilability and expressive power of propositional planning formalisms. Journal of Artificial Intelligence Research 12, 271–315 (2000) 14. Patra, S., Ghallab, M., Nau, D.S., Traverso, P.: Acting and planning using opera- tional models. In: AAAI. pp. 7691–7698 (2019) 15. Richter, S., Westphal, M.: The LAMA planner: Guiding cost-based anytime plan- ning with landmarks. Journal of Artificial Intelligence Research (JAIR) 39, 127– 177 (2010) 16. Vallati, M., Chrpa, L., Grzes, M., McCluskey, T.L., Roberts, M., Sanner, S.: The 2014 international planning competition: Progress and trends. AI Magazine 36(3), 90–98 (2015) 17. Yoon, S.W., Fern, A., Givan, R.: FF-Replan: A baseline for probabilistic planning. In: ICAPS 2007. pp. 352–359 (2007)