Introduction

Dynamic Epistemic Logic with ASP Updates: Application to Conditional Planning

Pedro Cabalar

Jorge Fandinno

Luis Fariñas del Cerro

0 0 Institut de Recherche en Informatique de Toulouse, Universty of Toulouse , CNRS , FRANCE 1 University of Corunna , SPAIN 2 University of Potsdam , GERMANY

Dynamic Epistemic Logic (DEL) is a family of multimodal logics that has proved to be very successful for epistemic reasoning in planning tasks. In this logic, the agent's knowledge is captured by modal epistemic operators whereas the system evolution is described in terms of (some subset of) dynamic logic modalities in which actions are usually represented as semantic objects called event models. In this paper, we study a variant of DEL, that we call DEL[ASP], where actions are syntactically described by using an Answer Set Programming (ASP) representation instead of event models. This representation directly inherits high level expressive features like indirect e ects, quali cations, state constraints, defaults, or recursive uents that are common in ASP descriptions of action domains. Besides, we illustrate how this approach can be applied for obtaining conditional plans in single-agent, partially observable domains where knowledge acquisition may be represented as indirect e ects of actions.

eol>Answer Set Programming Dynamic Epistemic Logic Epistemic Logic Programs Epistemic Speci cations Conditional Planning Equilibrium Logic Non-Monotonic Reasoning

Introduction

Automated planning is the eld of Arti cial Intelligence concerned with the generation of strategies to achieve a goal in a given dynamic domain. A planner usually starts from a formal representation of the domain, a particular instance of the problem and the goal to achieve. The planner output is some strategy, expressed in terms of actions that cause the state transitions to reach the goal. The most common situation is that such a strategy is just a sequence of actions called a plan. In Classical Planning [1] some simplifying restrictions are assumed: the system has a nite number of states, the world is fully observable and the transition relation is deterministic and static (i.e. transitions are only caused by the execution of actions). However, a rational agent may easily face planning problems that require relaxing these assumptions. For instance, a robot may not possess all the information about the environment, either because its sensors have a limited scope, or because its actions may have non-deterministic e ects that require observation for nding out the real outcome. Removing the assumptions of determinism and fully observable world naturally leads to two important questions [2]: (i) how does a plan look like in this new context? and (ii) how to represent the changes in the agent’s knowledge along the plan execution?

Regarding (i), two new categories of plans have been de ned in this context: conformant plans and conditional plans. A conformant plan is a sequence of actions that guarantees achieving the goal regardless unknown values of the uents in the initial situation or the precise e ect of the non-deterministic actions. If we further allow sensing actions (acquiring knowledge from the environment) then the structure of a sequential plan is not reasonable any more: a conditional plan may contain “if-then-else” constructs that allow the agent to follow di erent strategies depending on the knowledge she acquired when executing the plan. Approaches to both conformant and conditional planning have been broadly studied in the literature [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27].

With respect to question (ii), several approaches studied the e ects of sensing actions [4, 6, 18, 19, 21, 22, 28, 29, 30, 31]. One prominent line of research is based on Dynamic Epistemic Logic (DEL) [32, 33], a multi-modal approach where the agent’s knowledge is captured by modal epistemic operators whereas the system evolution is described in terms of (some subset of) dynamic logic [34] modalities. For instance, the DEL expression [watch](K rain _ K rain) represents that, after any possible outcome of sensing action watch, the agent knows whether rain holds or not. Di erent variants of DEL have been successfully applied to the problem of planning with non-deterministic, partially observable multi-agent domains [18, 19, 21, 22, 35]. Although DEL has proved to be very convenient for epistemic reasoning in planning, it shows some important drawbacks when analysed from a Knowledge Representation (KR) viewpoint. This is because, for representing actions and their e ects, DEL uses the so-called event models [36, 37], that inherit some of the expressive limitations of the STRIPS planning language [38]. In particular, event models do not allow some important KR features, like the treatment of indirect e ects, action quali cations, state constraints or recursive uents, that are quite common in modern representation of action theories.

One popular KR formalism that naturally covers these expressive features is Answer Set Programming (ASP) [39, 40, 41], a well-established paradigm for problem solving and nonmonotonic reasoning based on the stable models semantics [42, 43]. The use of ASP for classical planning was introduced in [44, 45], leading to a methodology adopted by di erent high-level action languages (see [46] and references there) and, more recently, to a temporal extension of ASP [47]. Besides, there exists a growing list of ASP applications [48], many of them dealing with classical planning problems. When moving to conformant planning, though, the application of ASP is still under an exploratory stage. Most of the attempts in this direction relied on a extension called epistemic speci cations [49] that incorporate modal constructs (called subjective literals) for representing the agent’s knowledge. However, the semantic interpretation of this formalism is still under debate and only some preliminary implementations are still available – see [50] for a recent survey. On the other hand, the use of ASP to obtain conditional plans was still an unexplored territory.

In this paper, we study the case of single-agent planning and combine both approaches, DEL and (epistemic) ASP, to exploit the advantages of both formalisms in a single language. Our proposal, called DEL[ASP], relies on replacing event models by epistemic logic programs. In that way, the basic event to describe the transition between two epistemic models becomes an ASP epistemic speci cation, while we keep the same dynamic logic operators for temporal reasoning among transitions. On the one hand, with respect to DEL, the new approach provides all the expressive power of ASP for action domains: indirect e ects, quali cations, state constraints, defaults, or recursive uents are directly inherited from ASP. Moreover, when a classical planning scenario (represented in ASP) becomes partially observable, the new approach allows keeping the scenario representation untouched, possibly adding new epistemic rules to describe the e ects of sensing actions. On the other hand, with respect to (non-temporal) epistemic ASP, dynamic operators provide a comfortable way for explicitly representing, and formally reasoning about conformant and conditional plans.

The rest of the paper is organised as follows. In the next section, we provide some background on the formalisms that conform our proposal. After these preliminaries, we introduce the formalism of DEL[ASP] and explain its behaviour using some examples. Then, we study the representation of conditional plans in this formalism. Finally, we discuss some related work and conclude the paper.

Preliminaries

In this section, we provide some background on planning in DEL, planning in ASP, and the ASP extension of epistemic speci cations, since these three components will be present in DEL[ASP] up to some degree. In the case of DEL, we will present a slight generalisation of [19, 21] that admits abstract updating objects. These objects correspond to event models for standard DEL, which we denote here as DEL[E ], and will become epistemic speci cations for DEL[ASP]. For the case of epistemic logic programs, we will use a recent logical formalisation [51] that avoids the problem of self-supported conclusions present in the original semantics [49]. This logic, called Founded Autoepistemic Equilibrium Logic (FAEEL) is a combination of Pearce’s Equilibrium Logic [52], a well-known logical characterisation of stable models, with Moore’s Autoepistemic Logic (AEL) [53], one of the most representative approaches among modal non-monotonic logics.

Dynamic Epistemic Logic with Abstract Updating Objects

Given a set of propositional symbols P and a set of updating objects O, a (dynamic epistemic) formula ' is de ned according to the following grammar: ' ::= ? j p j ' ^ ' j ' _ ' j ' ! ' j ' j K ' j [o]' where p 2 P is a proposition and o 2 O an updating object. The modal epistemic operator K represents the (planning) agent’s knowledge: formula K ' means that “the agent knows '.” The symbol “ ” stands here for classical negation (we reserve the symbol “:” for intuitionistic negation later on). A formula ' is called objective if the operator K does not occur in it. It is called subjective if it has at least some proposition and every proposition is in the scope of K. As usual, we de ne the following abbreviations: ' $ d=ef (' ! ! '), (' ) d=ef ( ! '), and > d=ef ?. We also de ne the dual of K as follows: K^)'^d=(ef K '. We keep the Boolean operators _; ^; !; ? and avoid de ning ones in terms of the others, since this will not be valid when we use an intuitionistic reading later on. By LE(P) we denote the language containing all dynamic epistemic formulas over P.

We provide next an abstract semantics that just relies on two basic concepts: epistemic models that represent the agent’s knowledge; and the updating evaluation, a pair of generic functions that describe how updating objects cause transitions among those models.

De nition 1 (Epistemic Model). Given a (possibly in nite) set of propositional symbols P and a (possibly in nite) set of possible worlds W, a model is a triple M = hW; K; V i where – W – K

W is a nite set of worlds, W

W is an accessibility relation on W , and – V : W

! 2P is a valuation function.

D(M) = W denotes the domain of M. An epistemic model is a model where K is an equivalence relation: it is further called information cell if K is an universal relation. A belief model is a model where K = W W 0 with W 0 = W n fw0g for some w0 2 W . By M we denote the set of all possible epistemic models over P and W.

We assume that the modeller coincides with the planning agent (the one whose knowledge is captured by the epistemic models). This is usually called an internal point of view, as opposed to the external one where the modeller is a di erent agent, an omniscient and external observer who can di erentiate the actual world and knows its con guration [54, 55]. Adopting the internal orientation translates in the lack of a designated world (all worlds in the model are equally possible). A second consequence is that, even for single-agent epistemic models, we cannot replace the equivalence relation by a universal one.

Before going into further technical details, let us introduce the following scenario from [21], which will be our running example throughout the paper.

Example 1. After following carefully laid plans, a thief has almost made it to her target: the vault containing the invaluable Pink Panther diamond. Standing outside the vault, she now deliberates on how to get her hands on the diamond. She knows the light inside the vault is o , and that the Pink Panther is on either the right or the left pedestal inside. Obviously, the diamond cannot be on both the right and left pedestal, but nonetheless the agent may be uncertain about its location. Note that the thief is perfectly capable of moving in the darkness and take whatever is on top any of the pedestals, but she is not able to know whether the diamond has been taken or not. It is assumed that there are four possible actions: move, flick, take_left and take_right. The action move changes the location of the thief from outside the vault ( v) to inside the vault (v) and vice-versa. The action flick turns on the light (l). Furthermore, if the thief is in the vault (v) and the light is on (l), the thief can see (s) where the Pink Panther is. Finally, actions take_left and take_right respectively take the diamond (d) from the left ( r) or right (r) pedestal if the diamond is in the intended pedestal.

The set of propositions for the example is P = fv; l; r; s; dg. Figure 1 depicts three consecutive epistemic models, M0, M1 and M2, respectively corresponding to the initial state of Example 1 and the resulting states after performing the sequence of actions move and then flick.

For improving readability, we represent the world valuations as strings of propositions and underline those that are false. For instance, the valuation for w1 in Figure 1a is depicted

w1 : vlrsd M2 as vlrsd and corresponds to the set of true atoms frg, making false all the rest. The initial model, M0 represents the triple hW0; K0; V0i where we have two worlds W0 = fw1; w2g universally connected, that is, K0 = W0 W0 = f(w1; w2); (w2; w1); (w1; w1); (w2; w2)g with valuations V0(w1) = frg and V0(w2) = ;. This means that the agent knows that the thief is outside the vault (v), the light is o (l), she cannot see where the Pink Panther is (s) and she does not have the diamond (d). The two connected worlds reveal that she does not know whether the Pink Panther is on the right pedestal (world w1) or on the left one (world w2). The epistemic model M1 in Figure 1b re ects the same con guration, with the only exception that now the thief is in the vault (v), as a result of moving inside. That is, M1 = hW0; K0; V1i with V1 satisfying V1(w1) = fv; rg and V0(w2) = fvg. Finally, M2 shows the epistemic model that results from performing action flick on M1. In this third model, two relevant changes occur: rst, l and s became true in both worlds, since icking turns on the light, and then, the thief can see the interior of the vault. Second, more importantly, the two worlds became completely disconnected, something that reveals that the agent has now complete knowledge of the world con guration in the two possible cases, w1 and w2. Formally, we have that M2 = hW0; K2; V2i with K2 = f(w1; w1); (w2; w2)g and with V2 satisfying V2(w1) = fv; l; r; sg and V2(w2) = fv; l; sg.

As we can see, the accessibility relation needs not be universal: for instance, we had (w1; w2) 62 K2 in M2 above. In general, when (w; w0) 62 K we say that the two worlds are indistinguishable at plan-time, given epistemic model M = hW; K; V i with fw1; w2g W . In the example, model M2 tells us that, before executing the plan, the agent cannot tell which of the two possibilities represented by worlds w1 (diamond on the right) and w2 (diamond on the left) will correspond to the actual world. However, once she executes the plan, she will acquire that knowledge in both cases: w1 and w2 are disconnected, so uncertainty in the agent’s knowledge is completely removed. As a result, at that point, she will be able to make a decision whether she should perform the take_right or take_left action.

If, on the contrary, two worlds w; w0 are connected (w; w0) 2 K in some epistemic model, we say that they are indistinguishable at run-time. This expresses a higher degree of uncertainty, since the agent has no way to tell which world corresponds to “reality” either before or during the plan execution. For instance, at model M1 we have (w1; w2) 2 K1 meaning that, during the plan execution, the agent will not be able to decide (at that point) whether actions take_right or take_left will work.

Until now, we have only described the information captured by epistemic models and presented an example with transitions among them, but did not specify how those transitions were obtained. For that purpose, we will only assume, by now, the existence of a pair of generic functions called updating evaluation de ned below.

De nition 2 (Updating evaluation). Given a set of models M over set of worlds W and a set of updating objects O, an updating evaluation is a pair h ; Ri of partial functions : M O ! M and R : M O ! 2W W satisfying R(M; o) D(W ) D( (M; o)).

Function takes an initial model M and some updating object o and provides a successor model M0 = (M; o). We will usually write in in x notation and assume that it is left associative so that M o stands for (M; o) and M o1 o2 stands for (M o1) o2. Relation R(M; o) matches worlds from the initial model M and its successor model M0 = M o. We will also write RM;o instead of R(M; o).

At this point, we have all the elements for de ning the satisfaction of dynamic epistemic formulas.

De nition 3 (Satisfaction). Let h ; Ri be an updating evaluation. Then, given an epistemic model M = hW; K; V i and world w 2 W , satisfaction of formulas is given by the following recursive de nition: – M; w 6j= ?, – M; w j= p i p 2 V (w),

M; w j= '1 and M; w j= '2,

M; w j= '1 or M; w j= '2, – M; w j= '1 ! '2 i

M; w 6j= '1 or M; w j= '2, – M; w j= ' i

M; w 6j= ', – M; w j= K ' i

M; w0 j= ' for all w0 with (w; w0) 2 K, and – M; w j= [o]' i M o and RM;o are de ned and M holds for all w0 with (w; w0) 2 RM;o.

o; w0 j= '

As usual, we write M j= ' i M; w j= ' for every world w 2 W . Furthermore, given a theory , we write M j= i M j= ' for every formula ' 2 . We say that theory entails formula , also written j= , i M j= implies M j= for any epistemic model M 2 M.

It is easy to see that the semantics for the dynamic-free fragment of the language (i.e., without [ ] operator) corresponds to modal logic S5 (see [56] for instance).

– pre : E – post : E

in P. – K

E Dynamic Epistemic Logic with Event Model Updates: DEL[E ] Let us now see how these de nitions apply to the case in which updating objects are event models [36]. The following is an adaptation of the de nition from [21]. A rst peculiarity of event models is that, when making an update M o = M0, the resulting epistemic model M0 uses world names of the form (w; e) where w is a world from the updated epistemic model M and e is a world (or event) from the event model o. For this reason, along this section, we assume that the global set of available world names W is closed under formation of pairs. In other words, W satis es (w; w0) 2 W for all w; w0 2 W. For instance, given a unique “atomic” world name w0 2 W, the set W would contain in nitely many pairs (w0; w0); ((w0; w0); w0); (w0; (w0; w0)); : : : and so on.

De nition 4 (Event Model). An event model over P and W is a quadruple E = hE; K; pre; post i where – E

W is a nite set of worlds called events, ! LE(P) assigns to each event a precondition, and ! (P

! LE(P)) assigns to each event a postcondition, for some propositions By D(E ) = E we denote the domain of E . A pair hE ; ei with e 2 E is called a pointed event model.

De nition 5 (Event updating evaluation). Let M = hW; K; V i be an epistemic model and E = hE; K^; pre; post i an event model, both over P and W. The product update M E d=ef hW 0; K0; V 0i is another epistemic model where – W 0 = f (w; e) 2 W

E j M; w j= pre(e) g

W is a set of worlds,

– K0 = f ((w1; e1); (w2; e2)) 2 W 0

W 0 j (w1; w2) 2 K and (e1; e2) 2 K^ g, – V 0((w; e)) = f p 2 P j M; w j= post (e)(p) g for every (w; e) 2 W 0, Given a pointed event model hE ; ei, the event updating evaluation is a pair h ; Ri with – M hE ; ei d=ef M

E – R(M; hE ; ei) d=ef f (w; w0) 2 W

W 0 j w0 = (w; e) g.

For simplicity, we will usually write [E ; e]' instead of [hE ; ei]'. We will also use the following shorthands [E ]' d=ef

^ [E ; e]' e2D(E) hE i' d=ef [E ] ' The following result shows that, indeed, the semantics described above coincides with the semantics from [21] for the case of event models.

Proposition 1. Let M be an epistemic model, w 2 D(M) be a world in M, hE ; ei be pointed event model and ' 2 LE(P) be a formula. Then, – M; w j= [E ; e]' i

M; w j= pre(e) implies M

E ; (w; e) j= ', Proof. By de nition, we have M; w j= [E ; e] i M hE ; ei; w0 j= ' for all w0 with (w; w0) 2 RM;hE;ei i M E ; w0 j= ' for all w0 with (w; w0) 2 RM;hE;ei.

Note now that, by de nition, we have either RM;hE;ei \ (fwg or RM;hE;ei \ (fwg D(M E )) = ;. Hence, the above holds i M E ; (w; e) j= ' or RM;hE;ei \ (fwg D(M E )) = ;, i M E ; (w; e) j= ' or M; w 6j= pre(e) i M; w j= pre(e) implies M E ; (w; e) j= '.

D(M

E )) = f(w; (w; e))g

The following result helps undertanding the semantics of (non-pointed) event models. Proposition 2. Let M be an epistemic model, w 2 D(M) be a world in M, E be an event model and ' 2 LE(P) be a formula. Then, – M; w j= [E ]' i M

E ; (w; e) j= ' for every e 2 D(E ) such that M; w j= pre(e), – M; w j= hE i' i

M; w j= pre(e) and M

E ; (w; e) j= ' for some e 2 D(E ).

Proof. The second statement follows directly from its de nitions. For the third, we have M; w j= hE i' i M; w j= [E ] ' i M; w 6j= [E ] ' i M; w j= pre(e) does not imply M E ; (w; e) j= ' for some e 2 D(E ) i M; w j= pre(e) and M E ; (w; e) 6j= ' for some e 2 D(E ) i M; w j= pre(e) and M E ; (w; e) j= ' for some e 2 D(E ).

Going back to our running example, Figure 2 depicts the event models corresponding to the actions of Example 1. For instance, Figure 2a depicts the event model of the action take_left with a single event e1 whose precondition is v ^ d and whose postcondition fd 7! rg states that, in the next state, d is assigned the truth value that formula r had in the previous state. More interestingly, Figure 2c depicts the event model of the action flick which has two events e1 and e2 with the same postcondition but di erent preconditions. The precondition of e1 makes it applicable when the thief is in the vault and the diamond is on the right pedestal while the precondition of e2 is analogous but for the left pedestal.1 In this sense, when the action flick is executed in the epistemic model M1 (Figure 1b), it follows that only w1 satis es the preconditions of e1 and only w2 satis es the preconditions of e2. As a result, we can see that M1 flick has two worlds, that is, D(M1 flick) = f(w1; e1); (w2; e2)g. Furthermore, since events e1 and e2 are disconnected, we also get that worlds (w1; e1) and (w2; e2) are disconnected. In fact, the epistemic model M1 flick is isomorphic to the model M2 depicted in Figure 1c and can be obtained just by renaming each world wi in M2 as (wi; ei).

1Note how we must specify the diamond’s location in both preconditions, although the only real physical requirement for flick is being inside the vault. This need for specifying unrelated preconditions may obviously become a representational problem.

e1 : hv ^ d; fd 7!

rgi (a) take_left e1 : hv ^ r; fl 7! >; s 7! >gi e1 : hv ^

d; fd 7! rgi (b) take_right e2 : hv ^ r; fl 7! >; s 7! >gi (c) flick (d) move e1 : hv _ l; fv 7! vgi e2 : h v ^ l ^ r; fv 7! v; s 7! >gi e2 : h v ^ l ^ r; fv 7! v; s 7! >gi

Note that the existence of two disconnected events in the action flick encodes the observation that happens when the light is turned on, that is, the agent obtains the knowledge about the actual place of the diamond. For instance, if we consider the action flick0 depicted in Figure 3a, and obtained from the action flick by connecting events e1 and e2, we can see that M1 flick0 is isomorphic to the epistemic model M02 depicted in Figure 3b. Model M02 only di ers from M2 in that worlds w1 and w2 are now connected, revealing that the agent cannot tell where is the diamond. In other words, flick0 encodes the same ontic changes in the world than flick but does not capture the agent’s observation about the position of the diamond.

Finally, Figure 4a (model M3) corresponds to an state where the thief is inside the vault with the diamond in her possession. Intuitively, this model represent the result of executing action take_left or take_right according to the agent’s knowledge about the position of the diamond, whereas Figure 4b represents model M4 = M3 move, that is, the result of moving (outside the vault) afterwards.

Planning in Answer Set Programming

In this subsection, we informally describe the ASP methodology for representing problems of classical planning: for a more formal approach we refer to [45, 57].

Our purpose is, by now, merely introductory, trying to illustrate the main representational features of ASP planning that are relevant for the current discussion. For this reason, we delay the introduction of a formal semantics for later on, when epistemic ASP is introduced.

ASP speci cations or logic programs are sets of rules of the form: a

b1; : : : ; bn; not c1; : : : ; not cm where is a reversed implication, so its left hand side is called the rule head and its right hand side receives the name of rule body. The rule head a is a proposition or the symbol ?: when the latter happens, the rule is a constraint forbidding that the body holds. The elements in the body (bi and not cj ) are called literals, where bi and cj are propositions. The ordering among literals is irrelevant: in fact, commas just stand for conjunctions. Operator not represents default negation: we read not cj as “there is no evidence on cj ” or “there is no way to prove cj ”. We will also use non-deterministic rules of the form: m fa1; : : : ; akg n

Body where m; n 0 meaning that, when Body holds, we can arbitrarily add a subset of atoms from fa1; : : : ; akg of cardinality c with n c m. ASP allows a second kind of negation called strong or explicit that we will represent p. From a practical point of view, we can assume that “ p” is a new kind of atom and that models cannot make p and p true simultaneously.

For a simple representation of rules describing transitions we partly adopt the syntax of [57] and assume that, for each proposition p, we handle a second atom “•p” that stands for p at the immediately previous situation. In temporal ASP, actions are represented as regular propositions in P: the rest of non-action propositions in P are called uents.

Taking all these considerations, the behaviour of action take_left can be encoded in ASP as the following three rules: d ? ? take_left take_left; •v take_left; •r where ( 1 ) describes its direct e ect (grasping the diamond) whereas the other two rules describe the preconditions: ( 2 ) forbids executing take_left when the thief was not in the vault and ( 3 ) forbids its execution when the diamond is in the right pedestal. Analogously, the following three rules encode the action take_right: d ? ? take_right take_right; •v take_right; •r Similarly, actions flick and move are respectively represented by the rules: l ? v v flick flick; •v move; •v move; •v s

v; l f f •f; not f

•f; not f s not s Rule ( 7 ) states the postcondition of flick, that is, the light is turned on, while rule ( 8 ) states its precondition, that is, we forbid its execution when being outside vault. Rules ( 9 ) and ( 10 ) together state the postconditions of move: its execution just ips the truth value of v.

To illustrate the use of indirect e ects, we can just assert that seeing the diamond (s) just depends on being in the vault (v) with the light on (l), regardless of the actions that have been performed to reach that situation. This is naturally captured by the single ASP rule:

Default negation allows a natural representation of the inertia law, that is, a uent normally remains unchanged, unless there is evidence on the contrary. We divide the set of uents into inertial F I = fv; l; r; dg and non-inertial uents F N = fsg. Inertia is then guaranteed by the pair of rules: for each inertial uent f 2 F I . In our running example, the uent (s) is considered false by default, that is, the following rule: sating that, unless (s) is proved, we should consider that its explicit negation ( s) is added to the set of conclusions.

If we consider now the following simpli cation of Example 1 where the value of all uents in the initial situation are known, we can use ASP to obtain a plan to achieve the thief’s goal. Example 2 (Example 1 continued). Consider now the case where the thief is outside the vault ( v) and already knows the Pink Panther is inside the vault on the right (r) pedestal. ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) ( 13 ) ( 14 )

Listing 1: Program corresponding to Example 2 in the syntax of the solver telingo.

Listing 1 shows the full encoding representing Example 2 in the syntax of the ASP solver telingo.2 In this syntax, is represented as :-, •p as 'p and p as -p. By copying that encoding into a le called pink.lp and executing “telingo pink.lp” we can obtain a plan for this example.

As we said before, an important di erence between ASP and event models is the treatment of indirect e ects. In the example, note how s was captured by the ASP rule ( 11 ), which only depends on other uents (v and l) but does not refer to the original actions that caused their values. There is no exible way to express this feature when using event models: the value of s must be expressed as a direct e ect of actions flick and move, that respectively determine the values of l and v. If new actions could alter the values of those uents, directly or indirectly, then their e ect on s should also be included in their post-conditions. This is, in fact, an instance of the well-known rami cation problem [58].

The rami cation problem may also occur for epistemic e ects, if we are not careful enough for their treatment. For instance, the encoding of Example 1 in [21] did not use our uent s (where the rami cation is evident), but transferred the problem to the epistemic e ect of knowing the position of the diamond (r). Again, this epistemic e ect must be speci ed as a direct consequence of some action, something that does not always seem reasonable.

In the rest of the paper, we develop an extension of DEL and ASP, that we denote DEL[ASP] where the ontic and epistemic e ects of actions can be described both in a direct or indirect way, as needed. In particular, in DEL[ASP], the observation of the diamond position when the thief is in the illuminated vault can be expressed by the following rule analogous to ( 11 ): O r v; l ( 15 ) Here, O r is a new construct whose intuitive meaning is that “the actual value of the uent r is observed (by the agent).” Note that we just replace the uent s in ( 11 ), whose intuitive meaning is that the agent sees the position of the diamond, by this new construct O r, which makes this observation a ect the agent’s beliefs.

Epistemic Logic Programs

As explained before, we will use FAEEL for the interpretation of epistemic speci cations, the epistemic extension of ASP. FAEEL inherits both the ASP capabilities for knowledge representation and the AEL capabilities for introspective reasoning. For the sake of coherence, we adapt the de nitions of [51] to the use of Kripke structures. We also add strong negation [52, 59] to FAEEL, which for simplicity, is restricted to be used only in front of atoms, something that su ces for the goals of this paper and is usual in the ASP literature [43, 60].

Autoepistemic formulas are de ned according to the following grammar: ' ::= ? j p j

p j '1 ^ '2 j '1 _ '2 j '1 ! '2 j L ' where p 2 P is a proposition.

We assume that ASP notation is transformed into its logical counterpart: not F is represented as :F , commas in the body are replaced by conjunctions and rules G F are written from left to right F ! G.

Intuitively, the new construct, L ', is read as “it is reasonable (for the planning agent) to believe '.” Weak or intuitionistic negation is de ned as usual: :' d=ef ' ! ?. The knowledge modality is de ned as true belief: K ' d=ef ' ^ L '. We also introduce the following abbreviations: U p O p :p ^ : p (p ! L ::p) ^ ( p ! L :: p) ^ (U p ! L ::U p) whose respective intuitive meanings are that the value of proposition p 2 P is unde ned and that the actual value of proposition p 2 P is observed. Note that when an atom p is observed, the agent’s beliefs have to agree with the actual value of the atom p. The use of double negation here implies that only the agent’s beliefs will be modi ed, without providing any justi cation for believing p. Besides, we assume all previous abbreviations too, that is, (' ) d=ef ( ! '), ' $ d=ef (' ! ) ^ ( ! '), and > d=ef ? ! ?. An autoepistemic theory is a set of autoepistemic formulas as de ned above. When a theory is a singleton, we will usually write just ' instead of f'g.

A literal L is either a proposition p 2 P or a proposition preceded by strong negation p and by Lit d=ef P [ f p j p 2 Pg we denote the set of all literals over the signature P.

We de ne next an (autoepistemic) HT-model, as a combination of modal epistemic logic with the logic of here-and-there (HT) [61], an intermediate logic with two intuitionistic worlds, h (standing for “here”) and t (read as “there”) satisfying h t.

De nition 6 (HT-Model). Given a set of propositional symbols P, an HT-model is a quadruple M = hW; K; V h; V ti where – W is a set of worlds, – K – V x : W

W is an accessibility relation on W , and ! 2Lit is a valuation with x 2 fh; tg such that V h(w) V t(w) for all w 2 W .

D(M) = W denotes the domain of M. A belief HT-model is an HT-model where K = K with K0 = K n fw0g for some distinguish world w0 2 K.

A HT-model M = hW; K; V h; V ti is called total i V h = V t. Furthermore, by Mt d=ef hW; K; V t; V ti we denote the total model corresponding to M. Satisfaction of autoepistemic formulas is then given by the following recursive de nition: – M; w j= L i L 2 V h(w) for any L 2 Lit , As usual, we say that M is an HT-model of some theory , in symbols M j= , i M; w j= ' for every world w 2 D(M) and every formula ' 2 . As mentioned before, when = f'g is a singleton we will omit the brackets, so that M j= ' stands for M j= f'g and holds i M; w j= ' for every world w 2 D(M).

De nition 7 (Bisimulation between HT-models). Let M1 = hW1; K1; V1h; V1ti and M2 = hW2; K2; V2h; V2ti be two HT-models. Given some binary relation Z W1 W2, we write M1 Z M2 i – every (w1; w2) 2 Z satis es V t(w1) = V t(w2) and V h(w1) – for every (w1; w10) 2 K1, there is (w2; w20) 2 K2 such that (w10; w20) 2 Z, – for every (w2; w20) 2 K2, there is (w1; w10) 2 K1 such that (w10; w20) 2 Z.

We write M1 M2 i there is a total relation Z s.t. M1 Z M2. We also say that M1 and M2 are bisimilar, in symbols M1 M2, i there is a total relation Z s.t M1 Z M2 and M2 Z M1. As usual, we write M1 M2 i M1 M2 and M1 6 M2.

De nition 8 (Equilibrium model). A total belief HT-model M of some theory is said to be an equilibrium model of i there is no other belief HT-model M0 of such that M0 M.

Given some information cell M = hW; K; V i and some set of literals I Lit , by M + I we denote the total belief HT-model M0 = hW 0; K0; V 0; V 0i where W 0 = fw0g [ W with w0 2= W , K0 = W 0 W and V 0 : W 0 ! 2Lit satis es V 0(w) = V (w) for all w 2 W and V 0(w0) = I. De nition 9 (World view). Given a set of propositions P, an information cell M = hW; K; V i over Lit is called a world view of some theory i the following two conditions hold: – M + V (w) is an equilibrium model of , for every world w 2 W , – M+I is not an equilibrium model of for every set of literals I

for all w 2 W , and – either p 2= V (w) or p 2= V (w) for all p 2 P and w 2 W .

Lit satisfying I 6= V (w) We say that a theory is consistent i it has some world view and by WV[ ] we denote the set of all world views of .

Example 3 (Example 1 continued). For instance, the formula '0 = v ^ l ^ (r _ r) ^ s ^ d ( 16 ) has a unique world view that is depicted in Figure 5. Note that every propositional theory has a unique world view [51] that corresponds to the set of all answer sets of the theory. Furthermore, since '0 contains no negation, its answer sets coincide with its minimal classical models when we treat each strong negated literal p as a new atom.

Dynamic Epistemic Logic with ASP Updates: DEL[ASP]

In this section, we present the major contribution of this paper, DEL[ASP], an instance of the abstract DEL framework where updating objects correspond to logic programs. Our motivation is twofold: on the one hand, to allow unrestricted use of indirect e ects (both ontic and epistemic); on the other hand, to preserve the ASP representation of non-epistemic planning problems without need of any adjustment or modi cation. We illustrate these two objectives through our running example. w1 : f v; l; r; s; dg w2 : f v; l; r; s; dg s ^

Characterising information cells in FAEEL

Let us start by showing how any information cell can be represented by some autoepistemic formula in FAEEL. Note that world views are an information cell over Lit , so they represent a kind of three valued epistemic models where each proposition p can be true p 2 V (w), false p 2 V (w) or unde ned p; p 2= V (w). We will show here how (two-valued) information cells over P can be simply represented as propositional formulas in FAEEL, allowing to map these three valued epistemic models into standard two valued ones.

Example 4 (Example 3 continued). Continuing with our running example, we can see now that this model satis es either p 2 V (w) or p 2 V (w) for every proposition p 2 fv; l; r; s; dg and world w 2 fw1; w2g. Hence, we can map this model into a (two-valued) information cell by considering as true every proposition p 2 V (w) and as false every proposition satisfying p 2 V (w). It is easy to see that the obtained information cell is precisely the model M0 depicted in Figure 1a, that is, the epistemic model corresponding to the initial situation of Example 1. In this sense, we can use the formula '0 to represent the initial state of this example.

De nition 10. Given some information cell M = hW; K; V i, its characteristic (autoepistemic) formula is 'M d=ef Ww2W 'wM where 'wM is de ned as follows: 'w

M def =

^ p2V (w) p ^

^ p2PnV (w) p De nition 11 (Bisimulation). Given two models M1 = hW1; K1; V1i and M2 = hW2; K2; V2i, we say that they are bisimilar, in symbols M1 M2, if and only if hW1; K1; V1; V1i hW2; K2; V2; V2i.

De nition 12. Given a set of propositions P P, we say that an HT-model M = hW; K; V h; V ti over Lit is P -classical i every world w 2 W and proposition p 2 P satisfy that either p 2 V h(w) or p 2 V h(w) holds. A theory is P -classical i it is consistent and, in addition, every world view is P -classical.

De nition 13. Given a set of propositions P P and any P -classical total HT-model M = hW; K; V t; V ti over Lit , by M # P = hW; K; V i we denote the model over P where V : W ! 2P is a valuation satisfying V (w) = V t(w) \ P for every world w 2 W .

Proposition 3. Let M be an information cell over P. Then, 'M has a unique world view M0 and we have that M and M0 # P are bisimilar. w1 : •f v; l; r; s; dg [ fmove; v; l; r; s; dg w2 : •f v; l; r; s; dg [ fmove; v; l; r; s; dg Proof. First note that, since 'M is a propositional formula, it has a unique world view M0 [51, Proposition 3]. Let M = hW; K; V i and M0 = hW 0; K0; V 0i. Then, we have that w0 2 W 0 i V 0(w0) is a stable model of 'M. Note also that the stable models of 'M are exactly its classical models understood as sets of literals. Hence, for every w0 2 W 0, there is some w 2 W such that V 0(w0) = V (w)[ (PnV (w)) and vice-versa. Consequently, M and M0 # P are bisimilar. Example 5 (Example 4 continued). Continuing with our running example, we have 'M0 = 'wM10 _ 'w2 with

M0 'w1 =

M0 'w2 =

M0 v ^ v ^ l ^ r ^ l ^ r ^ s ^ s ^ d d By applying distributivity of conjunctions over disjunctions, it is easy to see that 'M0 is classically (and intuitionistically) equivalent to ( 16 ). As a result, M0wv is the unique world view of 'M0 and, wv as expected from Proposition 3, it can be checked that it satis es M0 # P = M0.

Epistemic Model Updates with FAEEL

In this section, we show how autoepistemic equilibrium logic can be used to de ne epistemic model updates just by using an extended signature. Given a set of propositions S P, we de ne •S d=ef f •p j p 2 S \ P g and Pbi = P [ •P where •p intuitively means that p is true in the previous state. It will also be convenient to use • in front of any propositional formula ' such that •' is as an abbreviation for the formula obtained by writing • in front of every proposition occurring in '.

Example 6 (Example 3 continued). Let pink be a theory containing formulas ( 1 )-( 15 ) and let 1 = pink [ fmove; •'M0 g. This program has a unique world view shown in Figure 6. Note that, if we disregard all the information corresponding to the previous situation (that is all literals preceded by •) and the action move, then we have the same information as the epistemic model M1 in Figure 1a. In other words, 1 encodes the transition that occurs between the epistemic models M0 and M1 when executing action move.

As shown in the example above, we can represent the transition between two epistemic models as an autoepistemic theory. Let us now formalise this intuition. We begin introducing some auxiliary de nitions.

Given a set of epistemic models S = fM1; M2; : : : g where each Mi is a model over a set of atoms P of the form Mi = hWi; Ki; Vii and satisfying Wi \ Wj = ; for all Mi; Mj 2 S with i 6= j, by F S d=ef hW; K; V i, we denote an epistemic model where – W 0 = Sf Wi j Mi 2 S g, – K0 = Sf Ki j Mi 2 S g, and – V 0 : W 0

! 2P with V 0(w) = Vi(w) for all w 2 Wi and all Mi 2 S.

As usual, if S = fM1; M2g, we just write M1 t M2 instead of FfM1; M2g. De nition 14. Let P be a set of atoms and be some P -classical autoepistemic theory. Then, by mwv( ; P ) d=ef Ff M # P j M 2 WV[ ] g we denote the epistemic model capturing all the world views of projected into the vocabulary P . If is not P -classical, we assume that mwv( ; P ) is not de ned.

In other words, for every P -classical autoepistemic theory , mwv( ; P ) is the two-valued epistemic model that has an information cell for every world view of such that the valuation of every proposition p 2 P in every world in mwv( ; P ) corresponds to the valuation of that proposition in that same world in the corresponding world view. Recall that, in a P -classical theory, all its worlds views satisfy either p or p for every proposition p 2 P . This is necessary so it is possible to map three-valued world views into two-valued epistemic models. We could remove this restriction by allowing three-valued epistemic models, but we have decided to stay as close as possible to the original DEL semantics, which is only two-valued. Example 7 (Example 6 continued). Note that if P P, then the epistemic model mwv( ; P ) always corresponds to the current situation, discarding all information about the previous one. In this sense, if we consider the program 1 of Example 6 and the epistemic model M1 of Figure 1b, we have that mwv( 1; P ) = M1 where P = fv; d; r; l; sg is the set of uent of Example 1.

As a further example, consider now the theory 2 = pink [ fflick; •'M1 g. Then, 2 has two world views which correspond to the two cell informations in the epistemic model M2 depicted in Figure 1c. This explains why we need to join together all the world views of the theory in a single epistemic model. Every world view, which becomes an information cell, represents the knowledge the agent will have after executing the action flick, while the set of all world views represent the knowledge the agent had before executing it.

Let us now de ne the transition between states borrowing the notion of product update from DEL.

De nition 15. Given an information cell M over P [ f•'Mg is Pbi-classical, we de ne:

P and a theory

over Pbi such that – the product update of M with respect to as the epistemic model M d=ef mwv( [ f•'Mg; P ). – the binary relation RM; D(M) with M0 = mwv( [ f•'Mg; •P).

D(M ) s.t. (w; w0) 2 RM; i

M0; w0 j= •'w

M Example 8 (Example 7 continued). It is now easy to see that

M0 4 = mwv( 4 [ f•'M0 g; P ) = mwv( 2; P ) = M2 with 3 = pink [fmoveg and 4 = pink [fflickg respectively being the theories representing the execution of the action move and flick according to program pink , M1wv the epistemic model depicted in Figure 6 and M1 and M2 the epistemic models depicted in Figure 1b and Figure 1c. In other words, M1 is the result of executing action move in epistemic model M0 according to the description provided by pink . Furthermore, for each world w 2 fw1; w2g, we have M1wv; w j= •'w . In its turn, this implies M1wv # •P; w j= •'wM and, thus, that RM0; 3 = f(w1; w1); (w2; w2M)g is the identity3, that is, it maps each world in M0 to a world in M1 with the same name. Similarly, M2 is the result of executing the action flick in the epistemic model M1 according to the description provided by pink and we can check that RM1; 4 is also the identity.

We de ne now the updating evaluation for ASP epistemic speci cations for epistemic models. In a nutshell, this evaluation is the result of combining the evaluation for each individual information cell in the model.

De nition 16 (ASP updating evaluation). Given any epistemic model M and theory , the ASP updating evaluation is a pair h ; Ri satisfying

M RM; def = def =

G [ f M0 f M0 j M0 2 cell(M) g j M0 2 cell(M) g

We can now directly apply De nition 3 to obtain the satisfaction of DEL[ASP] formulas, that is, DEL formulas in which the updating objects are autoepistemic theories.

Example 9 (Example 2 continued). Let us now resume the simpli ed version of our running example introduced in Example 2. In this case, the initial situation can be represented by an epistemic model M00 depicted in Figure 7a. Then, it can be checked that

M00 j= K[ 3][ 5][ 3]( v ^ d) holds with 5 = pink [ ftake_rightg. In other words, the thief knows that after executing the sequence of actions hmove; take_right; movei she will be out of the vault with the diamond.

3Note that, for the sake of clarity, the names of worlds have been chosen so that RM0; 3 is the identity, but this is not necessarily the case. In fact, worlds from M0 and M1 could be disjoint or even be switched so that w1 could be called w2 and vice-versa.

That is, this sequence of actions is a valid plan that achieves the goal of getting out of the vault with the diamond regardless of the actual initial situation. This means that this is a conformant plan.

For the sake of completeness, Figures 7b, c and d respectively depict the epistemic models M00 3, M00 3 5 and M00 3 5 3.

Example 10 (Example 1 continued). As a further example, consider another variation of Example 1 where we have actions try_take_left and try_take_right that are similar to take_left and try_take_left, but that can be executed even when the diamond is not in the right location, having no e ect in such case. This can be represented by a theory 0pink obtained from pink by replacing rules ( 1-6 ) by the following rules: d ? d ? try_take_left ^ • r try_take_left ^ •v try_take_right ^ •r try_take_right ^

•v M0 j= K[ 3][ 6][ 7][ 3]( v ^ d) (17) (18) (19) (20) (21)

Now, we can check that

holds with 6 = 0pink [ ftry_take_leftg and 7 = 0pink [ ftry_take_rightg. Recall that M0 3 = M1 is the epistemic model depicted in Figure 1a. We also can check that M0 3 6 7 3 = M4 is the epistemic model depicted in Figure 4b. Figure 8 depicts the epistemic models corresponding to intermediate states. With these models, we can see that (21) holds. That is, the thief knows that after executing the sequence of actions actions hmove; try_take_left; try_take_right; movei she will be outside the vault with the diamond. Therefore, this sequence of actions constitutes a conformant plan for this problem. Note that the thief achieves her goal without ever getting to know where the diamond actually was.

Conditional Planning in DEL[ASP]

In this section, we show how to use DEL[ASP] to represent conditional plans. Let start by de ning what a plan is by introducing the following plan language from [21]. De nition 17 (Plan Language). Given disjunct sets of actions A and uents F , a plan is an expression built with the following grammar: ::=

a j skip j if K ' then else j ; where a 2 A and ' is a formula over F . We write (if K ' then ) as a shorthand for the plan (if K ' then else skip).

As mentioned in the introduction, conditional plans contain “if-then-else” structures that allow the agent to apply di erent strategics depending on the knowledge she has obtained along the execution of the plan. For instance, move ; flick ; if K r then take_right else take_left ; move (22) is a conditional plan for the problem laid out in Example 1. It is a plan since, as we will prove next, the thief eventually takes the diamond out in all possible outcomes, and it is conditional because the third step contains an alternative decision. If the thief acts according to her knowledge about the diamond position at that point, the plan is guaranteed to succeed. We will show that in fact, after executing the actions move and flick, the thief knows that she will know where the diamond is.

Let us now formalise these intuitive ideas by providing a translation from plans into DEL[ASP] as follows: De nition 18 (Translation). Let A P and F P be a pair of disjoint sets of propositions, respectively corresponding to actions and uents. The translation of a plan over A applied to a formula over F , with respect to a theory over Pbi is denoted as J K and is recursively de ned as:

J a K J skip K J ; 0 K J if K ' then else 0 K def = def = def = def = h [ fagi> ^ [ [ fag] J K (J 0 K (K ' ! J K ) ) ^ ( K ' ! J 0

K ) where ' is any formula over F .

As a rst remark, note that the translation J a K of an action a is always made by adding a constant theory that de nes the behaviour of the action domain ( xing the transition relation). As a result, each elementary action in the plan becomes a complete autoepistemic theory [ fag in the translation. When is clear from the context, we will simply write J K instead of J K . Conjunct [ [ fag] requires that becomes true in any resulting state whereas h [ fagi> ensures that action a is executable indeed.

We check next that the evaluation of plan (22) corresponds to what we have already seen in Example 8. For the sake of clarity, we gather together all rules of the theory pink in Figure 9. Example 11 (Example 8 continued). Going on with our running example, let us consider plans J K pink simply denoted as J K. We have seen that M0 3 = M1 where 3 = ? ? d ? ? l ? v v take_left take_left; •v take_left; •r take_right take_right; •v take_right; •r flick flick; •v move; •v move; •v ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 ) ( 9 ) ( 10 )

pink [ fmoveg was the theory representing the execution of action move according to pink . We have also seen that RM0; 3 = f(w1; w1); (w2; w2)g is the identity. Similarly, given theory 8 = pink [ fflickg representing the execution of action flick, we have M1 8 = M2 and RM1; 8 = f(w1; w1); (w2; w2)g. Figure 10 shows these three models together with the corresponding relations RM0; 3 and RM1; 8. Looking at this gure, we observe that M2; w1 j= v ^ K r and, thus, also M2; w1 j= v ^ (K r _ K r). From this we can conclude that M1; w1 j= [ 8](v ^ (K r _ K r)). Note that M1 j= h 8i> holds and, thus, it follows

M1; w1 j= J flick K(v ^ (K r _ K r)) Now we can check that M0; w1 j= [ 3]J flick K(v ^ (K r _ K r)) and M0 j= h 3i> hold and, thus, we can conclude

M0; w1 j= J move K J flick K(v ^ (K r _ K r)) An analogous reasoning, allow us to see that the same holds for M0; w2 and, thus, we obtain

M0 j= KJ move K J flick K(v ^ (K r _ K r)) By de nition, these two facts imply

M0 j= KJ move; flick K(v ^ (K r _ K r)) (23) In other words, the thief knows that, after executing actions move and flick, she will be inside the vault and that she will know where the diamond is. So she will be ready for the next step: using her knowledge to decide what is the suitable action to continue the plan.

Let us now continue with the thief’s reasoning process after the execution of the rst two actions.

Example 12 (Example 11 continued). We will show now that

M2; w1 j= J if K r then take_right else take_left; move K(v ^ d) is satis ed. First note that

M31 RM2; 3

3 M2 RM2; 9 9 = = = =

M41 f(w1; w1)g M31 f(w1; w1)g and that M41; w1 j= v ^ d and, thus, we get M41; w1 j= J move K( v ^ d). Let now pink [ ftake_rightg. Then, we have from where we get M2; w1 j= [ 9]J move K( v ^ d). Furthermore, this implies M2; w1 j= J take_right KJ move K( v ^ d), which in its turn implies

M2; w1 j= K r ! J take_right KJ move K( v ^ d) Note that M2; w1 j= K r and, thus,

M2; w1 j=

K r ! J take_right KJ move K( v ^ d) also follows. As a result, we can see that (24) holds. Now we follow the reasoning from Example 11 to show that M0; w1 j= J (22) K( v ^d). That is, (22) is a plan that achieves the goal of Example 1 in the case that the diamond is in the right pedestal. Analogously, Figure 11 shows the models needed to prove M0; w2 j= J (22) K( v^d), that is, when the diamond was on the left. As a result, we obtain M0 j= KJ (22) K( v ^ d). In other words, the thief knows that after executing (22), she will succeed in her goal: being outside of the vault with the diamond. (24) 9 d=ef

w1 : vlrsd

To conclude this section, we formalise the concepts of planing task and planning solution. De nition 19 (Planning task). Given the disjoint sets of actions A P and uents F P, a planning task is a triple = h 0; ; 'gi where 0 is a theory over P n A de ning the initial state, is a theory over Pbi de ning the interpretation of actions and 'g is the goal formula over F . De nition 20 (Planning solution). A plan is a conditional solution for the planning task = h 0; ; 'gi i mwv( 0; F ) j= J K 'g. A conditional solution without occurrences of the “if-then-else” construct is called a conformant solution.

In particular, Example 1 can be now formalised as the planning task = h 0; pink ; 'gi where 0 is a singleton containing ( 16 ), describing the initial situation, and 'g = v ^ d. Then, we can see that (22) is a conditional solution for the planning task . We can also formalise Example 2 as the planning task = h 00; pink ; 'gi where 00 contains the single formula: that describes the corresponding initial situation. It can be checked that (22) is also a conditional solution for 0, though this example also has the simpler (conformant) solution: '00 = v ^ l ^ r ^ s ^

d move ; take_right ; move Finally, Example 10 becomes the task

= h 0; 0pink ; 'gi for which move ; try_take_right ; try_take_left ; move is a conformant solution. (25) (26) (27)

Conclusions and Future Work

As discussed in [62], the traditional DEL[E ] approach with event model updates is a semantic approach, where states and actions are represented as semantic objects, epistemic and event models respectively. On the other hand, DEL[ASP] is a syntactic approach, where states and actions are represented as knowledge-bases, that is, sets of formulas known to be true. Semantic and syntactic approaches are mutually dual, with the semantic approach modelling ignorance (the more ignorance, the bigger the state) and the syntactic approach modelling knowledge (the more knowledge, the bigger the knowledge-base). The generalisation of DEL for abstract updating objects can easily accommodate both approaches: it su ces with allowing both event models and epistemic programs to occur in the dynamic operator, and selecting the corresponding updating evaluation.

Another interesting observation is that both DEL[E ] and ASP can be considered as generalisations of the STRIPS planning language in orthogonal directions. On the one hand, DEL[E ] allows planning in domains where the world is not fully observable, the e ects of actions are not necessarily deterministic and where sensing actions may allow to gain knowledge about the actual state of the world. On the other hand, ASP introduces high level KR features like the treatment of indirect e ects, action quali cations, state constraints or recursive uents (for motivation about the need of such features we refer to [63]). The approach presented here, DEL[ASP], combines the strengths of both generalisations so that it is possible to use high level KR features in non-fully observable or non-deterministic domains where observing the world may be needed to achieve a valid plan.

Similar to our approach, the action language mAL [64] also combined the treatment of indirect e ects and action quali cations with the possibility of de ning sensing actions. The main handicap of mAL with respect to DEL[ASP] is that the former only allows rami cations on the ontic e ects, but not on the epistemic ones, as we did for instance with rule ( 15 ). In mAL, as in DEL[E ], this indirect observation needs to be encoded as a direct e ect of all actions that may a ect those uents. On the other hand, an advantage of both DEL[E ] and mAL is that they can be applied on domains that involve several agents and in which those agents may even hold false beliefs [22], while, so far, DEL[ASP] is only able to deal with domains involving a single agent. Extending DEL[ASP] to cover these domains is a matter of future work. It will be also interesting to study the relation between DEL[ASP] and Temporal ASP [57] and the possibility of extending the latter with an epistemic modality to deal with non-fully observable or non-deterministic domains.

Regarding the computation of planning solutions in DEL[ASP], it is worth to mention that the algorithm based on planning trees described in [21] for DEL[E ] is general enough and does not really depend of the kind of updating object used. In this sense, we can apply that same algorithm with the only variation of using the ASP updating evaluation when we expand the tree. Then, solutions can be retrieved from the planning tree in exactly the same way as described there.

Acknowledgements This work has been partially funded by the Centre International de Mathématiques et d’Informatique de Toulouse through contract ANR-11-LABEX-0040-CIMI within the program ANR-11-IDEX-0002-02, grant 2016-2019 ED431G/01 CITIC Center (Xunta de Galicia, Spain), grant TIN 2017-84453-P (MINECO, Spain).

Acknowledgments References

[17] J. Ho mann, R. I. Brafman, Conformant planning via heuristic forward search: A new approach, Arti cial Intelligence 170 (2006) 507–541. [18] B. Löwe, E. Pacuit, A. Witzel, Planning based on dynamic epistemic logic, Technical Report,

Citeseer, 2010. [19] T. Bolander, M. B. Andersen, Epistemic planning for single and multi-agent systems,

Journal of Applied Non-Classical Logics 21 (2011) 9–34. [20] E. Pontelli, T. C. Son, C. Baral, G. Gelfond, Answer set programming and planning with knowledge and world-altering actions in multiple agent domains, in: Correct Reasoning, Springer, 2012, pp. 509–526. [21] M. B. Andersen, T. Bolander, M. H. Jensen, Conditional epistemic planning, in: JELIA, volume 7519 of Lecture Notes in Computer Science, Springer, 2012, pp. 94–106. [22] T. Bolander, Seeing is believing: Formalising false-belief tasks in dynamic epistemic logic, in: ECSI, volume 1283 of CEUR Workshop Proceedings, CEUR-WS.org, 2014, pp. 87–107. [23] F. Kominis, H. Ge ner, Beliefs in multiagent planning: From one agent to many, in: R. I.

Brafman, C. Domshlak, P. Haslum, S. Zilberstein (Eds.), Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling, ICAPS 2015, Jerusalem, Israel, June 7-11, 2015., AAAI Press, 2015, pp. 147–155. [24] F. Kominis, H. Ge ner, Multiagent online planning with nested beliefs and dialogue, in: L. Barbulescu, J. Frank, Mausam, S. F. Smith (Eds.), Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling, ICAPS 2017, Pittsburgh, Pennsylvania, USA, June 18-23, 2017., AAAI Press, 2017, pp. 186–194. [25] M. C. Cooper, A. Herzig, F. Ma re, F. Maris, P. Régnier, Simple epistemic planning: Generalised gossiping, in: G. A. Kaminka, M. Fox, P. Bouquet, E. Hüllermeier, V. Dignum, F. Dignum, F. van Harmelen (Eds.), ECAI 2016 - 22nd European Conference on Arti cial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands, volume 285 of Frontiers in Arti cial Intelligence and Applications, IOS Press, 2016, pp. 1563–1564. [26] M. C. Cooper, A. Herzig, F. Ma re, F. Maris, P. Régnier, A simple account of multi-agent epistemic planning, in: G. A. Kaminka, M. Fox, P. Bouquet, E. Hüllermeier, V. Dignum, F. Dignum, F. van Harmelen (Eds.), ECAI 2016 - 22nd European Conference on Arti cial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands, volume 285 of Frontiers in Arti cial Intelligence and Applications, IOS Press, 2016, pp. 193–201. [27] T. Bolander, T. Engesser, R. Mattmüller, B. Nebel, Better eager than lazy? how agent types impact the successfulness of implicit coordination, in: M. Thielscher, F. Toni, F. Wolter (Eds.), Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR 2018, Tempe, Arizona, 30 October - 2 November 2018., AAAI Press, 2018, pp. 445–453. [28] R. C. Moore, A formal theory of knowledge and action, Technical Report, SRI International,

Menlo Park, CA, Arti cial Intelligence Center, 1984. [29] M. Thielscher, Representing the knowledge of a robot, in: KR, 2000, pp. 109–120. [30] T. C. Son, C. Baral, Formalizing sensing actions—a transition function based approach,

Arti cial Intelligence 125 (2001) 19–91. [31] R. B. Scherl, H. J. Levesque, Knowledge, action, and the frame problem, Arti cial Intelligence 144 (2003) 1–39. [32] H. Van Ditmarsch, W. van Der Hoek, B. Kooi, Dynamic epistemic logic, volume 337,

Springer Science & Business Media, 2007. [33] H. van Ditmarsch, B. Kooi, Semantic results for ontic and epistemic change, Logic and the foundations of game and decision theory (LOFT 7) 3 (2008) 87–117. [34] V. Pratt, Semantical consideration on oyd-hoare logic, in: Proceedings of the Seventeenth Annual Symposium on Foundations of Computer Science (SFCS’76), IEEE Computer Society Press, 1976, pp. 109–121. [35] H. P. van Ditmarsch, W. van der Hoek, B. P. Kooi, Dynamic epistemic logic with assignment, in: Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, ACM, 2005, pp. 141–148. [36] A. Baltag, L. S. Moss, S. Solecki, The logic of public announcements and common knowledge and private suspicions, in: TARK, Morgan Kaufmann, 1998, pp. 43–56. [37] A. Baltag, L. S. Moss, Logics for epistemic programs, Synthese 139 (2004) 165–224. [38] R. E. Fikes, N. J. Nilsson, STRIPS: A new approach to the application of theorem proving to problem solving, Arti cial Intelligence 2 (1971) 189 – 208. [39] V. W. Marek, M. Truszczyńki, Stable models and an alternative logic programming paradigm, in: K. R. Apt, V. W. Marek, M. Truszczyński, D. Warren (Eds.), The Logic Programming Paradigm, Arti cial Intelligence, Springer Berlin Heidelberg, 1999, pp. 375– 398. [40] I. Niemelä, Logic programs with stable model semantics as a constraint programming paradigm, Annals of Mathematics and Arti cial Intelligence 25 (1999) 241–273. [41] C. Baral, Knowledge Representation, Reasoning and Declarative Problem Solving, Cambridge University Press, 2010. [42] M. Gelfond, V. Lifschitz, The stable model semantics for logic programming, in: Proc. of the 5th Intl. Conference on Logic Programming (ICLP’88), 1988, pp. 1070–1080. [43] M. Gelfond, V. Lifschitz, Classical negation in logic programs and disjunctive databases,

New Generation Comput. 9 (1991) 365–386. [44] V. Lifschitz, Answer set planning, in: D. de Schreye (Ed.), Proceedings of the International

Conference on Logic Programming (ICLP’99), MIT Press, 1999, pp. 23–37. [45] V. Lifschitz, Answer set programming and plan generation, Arti cial Intelligence 138 (2002) 39 – 54. Knowledge Representation and Logic Programming. [46] J. Lee, V. Lifschitz, F. Yang, Action language BC preliminary report, in: F. Rossi (Ed.), IJCAI 2013, Proceedings of the 23rd International Joint Conference on Arti cial Intelligence, Beijing, China, August 3-9, 2013, IJCAI/AAAI, 2013, pp. 983–989. [47] P. Cabalar, R. Kaminski, T. Schaub, A. Schuhmann, Temporal answer set programming on nite traces, Theory and Practice of Logic Programming 18 (2018) 406–420. [48] E. Erdem, M. Gelfond, N. Leone, Applications of ASP, AI Magazine 37 (2016) 53–68. [49] M. Gelfond, Strong introspection, in: T. L. Dean, K. McKeown (Eds.), Proceedings of the

AAAI Conference, volume 1, AAAI Press/The MIT Press, 1991, pp. 386–391. [50] A. P. Leclerc, P. T. Kahl, A survey of advances in epistemic logic program solvers, CoRR abs/1809.07141 (2018). URL: http://arxiv.org/abs/1809.07141. arXiv:1809.07141. [51] P. Cabalar, J. Fandinno, L. Fariñas del Cerro, Founded world views with autoepistemic equilibrium logic, Under consideration for 15th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2019), Corr abs/1902.07741 (2019). URL: https://arxiv.org/abs/1902.07741. [52] D. Pearce, A new logical characterisation of stable models and answer sets, in: J. Dix, L. M. Pereira, T. C. Przymusinski (Eds.), Non-Monotonic Extensions of Logic Programming, NMELP 1996, Bad Honnef, Germany, September 5-6, 1996, Selected Papers, volume 1216 of Lecture Notes in Computer Science, Springer, 1996, pp. 57–70. [53] R. C. Moore, Semantical considerations on nonmonotonic logic, Arti cial Intelligence 25 (1985) 75–94. [54] G. Aucher, An internal version of epistemic logic, Studia Logica 94 (2010) 1–22. [55] R. Fagin, Reasoning about knowledge, MIT Press, 1995. [56] J. Y. Halpern, Reasoning about knowledge: an overview, in: Theoretical aspects of reasoning about knowledge, Elsevier, 1986, pp. 1–17. [57] P. Cabalar, R. Kaminski, T. Schaub, A. Schuhmann, Temporal answer set programming on nite traces, Theory and Practice of Logic Programming 18 (2018) 406–420. [58] H. Kautz, The logic of persistence, in: Proceedings of the 5th National Conference of

Arti cial Intelligence, 1986, pp. 401–405. [59] D. Nelson, Constructible falsity, J. Symbolic Logic 14 (1949) 16–26. [60] V. Lifschitz, L. R. Tang, H. Turner, Nested expressions in logic programs, Ann. Math. Artif.

Intell. 25 (1999) 369–389. [61] A. Heyting, Die formalen Regeln der intuitionistischen Logik, in: Sitzungsberichte der Preussischen Akademie der Wissenschaften, Deutsche Akademie der Wissenschaften zu Berlin, 1930, p. 42–56. Reprint in Logik-Texte: Kommentierte Auswahl zur Geschichte der Modernen Logik, Akademie-Verlag, 1986. [62] T. Bolander, A gentle introduction to epistemic planning: The DEL approach, in:

M4M@ICLA, volume 243 of EPTCS, 2017, pp. 1–22. [63] V. Lifschitz, Answer set programming and plan generation, Arti cial Intelligence 138 (2002) 39–54. [64] C. Baral, G. Gelfond, E. Pontelli, T. C. Son, Reasoning about the beliefs of agents in multi-agent domains in the presence of state constraints: The action language mal, in: J. Leite, T. C. Son, P. Torroni, L. van der Torre, S. Woltran (Eds.), Computational Logic in Multi-Agent Systems, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 290–306.

[1]

Ghallab ,

D. S.

Nau , P. Traverso, Automated planning - theory and practice , Elsevier, 2004 .

[2]

P. H.

Tu ,

T. C.

Son ,

Baral , Reasoning and planning with sensing actions, incomplete information, and static causal laws using answer set programming , Theory and Practice of Logic Programming 7 ( 2007 ) 377 - 450 .

[3]

M. A.

Peot ,

D. E.

Smith , Conditional nonlinear planning , in: Arti cial Intelligence Planning Systems , Elsevier, 1992 , pp. 189 - 197 .

[4]

Golden ,

D. S.

Weld , Representing sensing actions: The middle ground revisited , in: KR, Morgan Kaufmann, 1996 , pp. 174 - 185 .

[5]

Pryor , G. Collins, Planning for contingencies: A decision-based approach , J. Artif. Intell. Res . 4 ( 1996 ) 287 - 339 .

[6]

Lobo , G. Mendez,

S. R.

Taylor , Adding knowledge to the action description language A , in: B. Kuipers , B. L. Webber (Eds.), Proceedings of the Fourteenth National Conference on Arti cial Intelligence and Ninth Innovative Applications of Arti cial Intelligence Conference, AAAI 97, IAAI 97, July 27-31 , 1997 , Providence, Rhode Island, USA., AAAI Press / The MIT Press, 1997 , pp. 454 - 459 .

[7]

A. L.

Blum ,

M. L.

Furst , Fast planning through planning graph analysis , Arti cial intelligence 90 ( 1997 ) 281 - 300 .

[8]

D. E.

Smith ,

D. S.

Weld , Conformant graphplan , in: AAAI/IAAI , 1998 , pp. 889 - 896 .

[9]

Golden , Leap before you look: Information gathering in the PUCCINI planner , in: AIPS, AAAI, 1998 , pp. 70 - 77 .

[10]

D. S.

Weld ,

C. R.

Anderson ,

D. E.

Smith , Extending graphplan to handle uncertainty & sensing actions , in: Aaai/iaai, 1998 , pp. 897 - 904 .

[11]

Rintanen , Constructing conditional plans by a theorem-prover , Journal of Arti cial Intelligence Research 10 ( 1999 ) 323 - 352 .

[12]

Bonet , H. Ge ner, Planning with incomplete information as heuristic search in belief space , in: Proceedings of the Fifth International Conference on Arti cial Intelligence Planning Systems , AAAI Press, 2000 , pp. 52 - 61 .

[13]

Castellini ,

Giunchiglia ,

Tacchella , Sat-based planning in complex domains: Concurrency, constraints and nondeterminism , Arti cial Intelligence 147 ( 2003 ) 85 - 117 .

[14]

Eiter ,

Faber ,

Leone ,

Pfeifer ,

Polleres , A logic programming approach to knowledge-state planning, ii: The dlvk system , Arti cial Intelligence 144 ( 2003 ) 157 - 211 .

[15]

Cimatti ,

Roveri ,

Bertoli , Conformant planning via symbolic model checking and heuristic search , Arti cial Intelligence 159 ( 2004 ) 127 - 206 .

[16]

Bryce ,

Kambhampati ,

D. E.

Smith , Planning graph heuristics for belief space search , Journal of Arti cial Intelligence Research 26 ( 2006 ) 35 - 99 .