Belief State Estimation for Planning via Approximate Logical Filtering and Smoothing Brent Mombourquette† and Christian Muise∗ and Sheila A. McIlraith† † Department of Computer Science, University of Toronto ∗ CSAIL, Massachusetts Institute of Technology † {bgmomb,sheila}@cs.toronto.edu, ∗ {cjmuise}@mit.edu Abstract liefs about individual fluents – a frequent and time critical component of many decision-making systems. State estimation is the task of estimating the state Our concern is with logical state estimation in service of a partially observable dynamical system given of tasks such as planning, execution monitoring, diagnosis, a sequence of executed actions and observations. and activity recognition. We are particularly concerned with In logical settings, state estimation can be real- systems that include a rich characterization of how the ac- ized via logical filtering. Unfortunately such fil- tions of an agent indirectly affect their environment. These tering, though exact, can be intractable. To this are typically captured by causal or ramification constraints end, we propose logical smoothing, a form of (e.g., a causal constraint might say that if the battery and radio backwards reasoning that works in concert with are ok and the radio is on then sound is emitted.). We assume logical filtering to refine past beliefs in light of that such constraints are compiled into the transition system new observations. We characterize the notion of as additional effects of actions following, e.g., [Pinto, 1999; logical smoothing together with an algorithm for Strass and Thielscher, 2013; Baier et al., 2014]. In planning backwards-forwards state estimation. We prove such constraints tend to create problems with large confor- properties of our algorithms, and experimentally mant width [Palacios and Geffner, 2009]. demonstrate their behaviour. Smoothing together We exploit the observation that for planning, only a subset with backwards-forwards reasoning are important of the state is necessary to track. Planning systems need to techniques for reasoning about partially observable know (1) when actions are applicable, and (2) when the goal dynamical systems, introducing the logical ana- is reached [Bonet and Geffner, 2014]. Execution monitoring logue of effective techniques from control theory systems need only track the conditions under which a plan and dynamic programming. remains valid (e.g., [Fikes et al., 1972]). Diagnosis systems track the confirmation and refutation of candidate diagnoses. 1 Introduction These observations motivate the development of state es- Many applications of artificial intelligence from automated timation techniques tailored to the task of tracking the truth planning and diagnosis to activity recognition require reason- of (conjunctions of) fluent literals. In Section 2 we formal- ing about dynamical systems that are only partially observ- ize state estimation as semantic logical filtering and propose able. A necessary component of such systems is state estima- a sound under-approximation that is computationally appeal- tion – the task of estimating the state of the systems given a ing. Motivated by the technique of smoothing for stochastic sequence of executed actions and observations. State estima- systems (e.g., [Einicke, 2012]), in Section 3, we introduce tion is well-studied in control systems where transition sys- the notion of logical smoothing, which allows for the updat- tems are stochastic and the belief state is typically represented ing of beliefs about the past in light of observations about the as a probability distribution. State estimation is commonly re- present. In Section 4, we propose an algorithm for backwards- alized via filtering, of which Kalman filtering [Kalman, 1960] forwards reasoning that combines smoothing and filtering in is a well-known example. order to perform state estimation. The application of (approx- In logical settings, an analogous form of logical filtering imate) logical smoothing mitigates for the incompleteness of was proposed by Amir and Russell [2003] in which an agent’s approximate logical filtering, while preserving many of its belief state – the set of possible world states – can be com- computational advantages. We evaluate the behaviour of our pactly represented as a formula, and filtering is a form of be- approach. This is followed by a discussion of related work lief update. While logical filtering is intractable in the general and concluding remarks. case [Eiter and Gottlob, 1992], there are tractable subclasses often involving restricted transition systems or compact en- 2 The Problem: State Estimation codings of the belief state (e.g., [Shahaf and Amir, 2007; State estimation is a core task in reasoning about dynam- Shirazi and Amir, 2011]). Unfortunately, typical belief state ical systems with partial observability. Consider the simpli- representations often require further inference to ascertain be- fied action-observation sequence in support of diagnosing a car. You turn the key in the ignition, turn ignition, result- 1. Filter[hi](ρ) = ρ ing in ignition turned. If ignition turned, battery ok and 2. Filter[a](ρ) = {s0 | s0 = R(s, a), s ∈ ρ} gas ok hold, then so will car started. You observe that the car did not start (car started = F alse), and so, under the 3. Filter[o](ρ) = {s | s ∈ ρ and o is true in s} assumption that the characterization of the vehicle function- 4. Filter[hai , oi , . . . , at , ot i](ρ) = ing is complete, you can you infer ¬battery ok ∨ ¬gas ok. Filter[hai+1 , oi+1 , . . . , at , ot i](Filter[oi ](Filter[ai ](ρ))) You turn on the radio, turn on radio, causing radio on as well as sound if battery ok ∧ radio ok. You observe We call step 2 progression with a and step 3 filtering with o. sound (sound = T rue). Under completeness and frame as- When action a is filtered, every state s ∈ ρ is updated with sumptions, you are now able to infer radio ok, battery ok respect to the transition system R(s, a). When an observa- and ¬gas ok. So following the action-observation sequence tion o is filtered, every state inconsistent with the observation (turn ignition,¬car started,turn on radio,sound), your is eliminated. Logical filtering results in an interleaving of estimated belief state comprises just one state here repre- action progression and observation filtering (e.g.,[Vassos and sented by the set of fluents {ignition turned, radio on, Levesque, 2013]). Logical filtering naı̈vely is hard: there are ¬car started, battery ok, radio ok, ¬gas ok, sound}. |F | 22 belief states. As such, algorithms for logical filtering Informally, the state estimation task we address is: Given a typically represent the belief state ρ compactly as a logical dynamical system, a belief state, and a sequence of executed formula φ, called a belief-state formula or state formula. actions and observations, infer the resulting belief state of While logical filtering realizes state estimation, the result- the system. For logical theories, state estimation is captured ing belief state representation can cause fundamental infer- by logical filtering [Amir and Russell, 2003]. ence operations to be intractable, such as inferring beliefs To relate our work to planning, execution monitoring about individual fluents [Shahaf and Amir, 2007]. A core and diagnosis, we appeal to standard finite domain plan- component of automated planning is determining the appli- ning language syntax. A dynamical system is a tuple Σ = cability of actions during the search for a plan, as well as hF, S, A, R, Ii, where F is a finite set of propositional fluent determining whether the goal condition has been achieved. symbols such that if p ∈ F, then p and ¬p are fluent literals, Similarly dynamical diagnosis requires determination of the S = P ow(F) is the set of possible world states, A is a set of refutation or confirmation of candidate diagnoses as the result actions including sensing actions, R ⊆ S × A × S is the tran- of (treatment) actions and sensing (e.g., [McIlraith and Reiter, sition relation (equivalently we use the notation (s, a, s0 ) ∈ R 1992]). Diagnoses, action preconditions, and planning goals or s0 = R(s, a)), and I is a set of clauses over F that defines are typically represented as conjunctions of fluents, motivat- a set of possible initial states, collectively – the initial be- ing the following definition of approximate logical filtering: lief state. As noted in the introduction, we assume that causal constraints are compiled into our transition system as extra Definition 2 (Approximate Logical Filtering) effects of actions (e.g., following [Pinto, 1999]). For the pur- Given belief-state formula φ of dynamical system Σ, the ap- poses of this paper, non-sensing actions a ∈ A are assumed proximate filtering of a sequence of actions and observations to be deterministic and are defined by a precondition prec(a), ha1 , o1 , . . . , at , ot i with respect to φ is defined as: which is a conjunction of fluent literals, and eff (a), a set of 1. Filtera [hi](φ) = φ; conditional effects of the form C → L, where C is a con- V junction of fluent literals and L is a fluent literal. We write Va [a](φ) = {L | (C → L) ∈ ef f (a) ∧ φ |= C} 2. Filter {L | φ |= L ∧ ∀(C → ¬L) ∈ ef f (a), φ |= ¬C}; the unconditional effect true → L as simply L, and use true to denote an empty precondition. Each sensing action, on the 3. Filtera [o](φ) = φ ∧ o; other hand, is defined by its precondition prec(a), which is 4. Filtera [hai , oi , . . . , at , ot i](φ) = a conjunction of fluent literals, and obs(a), which is the flu- Filtera [hai+1 , oi+1 , . . . , at , ot i](Filtera [oi ](Filtera [ai ](φ))) ent literal that is observed by the sensing action. We assume no exogenous actions and that the transition relation is com- Approximate logical filtering describes the belief state only plete, characterizing all and only the conditions under which in terms of a subset of the fluent literals entailed by the be- the truth value of a fluent changes. lief state. One can see from line 2 that progressing a state φ Throughout this paper we take the viewpoint that the state through an action a produces a state consisting of what was of Σ represents the belief state of the agent. The semantics of known to be caused (directly or indirectly) by a and what was logical filtering is defined by considering the belief state to be known to persist through a. a set of possible world states ρ ⊆ S. For our purposes , we Returning to our car example, we can see how this is a will often represent the belief state (henceforth also the state weak approximation. After the observations car started = of the system) as a state formula φ. Later we will be restrict- F alse and sound = T rue, no further inferences could be ing φ to a conjunction of fluent literals, sometimes denoted made, yielding the belief state {ignition turned, radio on, as a set of fluent literals and referred to as a conjunctive state ¬car started, sound} formula. Following Amir and Russell (2003): Theorem 1 (Sound Under Approximation) Given dy- Definition 1 (Logical Filtering Semantics) namical system, Σ, belief-state formula φ representing pos- Given belief state ρ ⊆ S of dynamical system Σ, the filtering sible belief state ρ ⊆ S, and action-observation se- of a sequence of actions and observations ha1 , o1 , . . . , at , ot i quence ha1 , o1 , . . . , at , ot i, Filter[ha1 , o1 , . . . , at , ot i](ρ) |= with respect to ρ is defined as: Filtera [ha1 , o1 , . . . , at , ot i](φ). This follows from Definitions 1 and 2. Approximate Logi- Definition 3 (Belief State History) Given dynamical sys- cal Filtering is not complete with respect to the logical filter- tem, Σ, a belief state history over Σ is a sequence of tu- ing semantics. ples (ρ0 , a0 ),(ρ1 , a1 ),. . . (ρn , an ) such that each ρi is a belief Proposition 1 (Conjunctive State Formula Preservation) state, a set of possible world states ρi ⊆ S and each ai is an Given dynamical system, Σ, conjunctive state formula φ, and action of Σ. action-observation sequence ha1 , o1 , . . . , at , ot i, where each The intent of a history is to capture the evolution of the oi is a conjunctive formula, Filtera [ha1 , o1 , . . . , at , ot i](φ) is system starting from some designated initial belief state ρ0 . a conjunctive formula. The observations are not modeled as separate entities. Rather, The above proposition follows naturally from lines 2 and they reside in the intermediate belief states, presumably as 3 of Definition 2 and is key to our complexity results and part of an original filtering process. tractable approximate representations. Definition 4 (Logical Smoothing Semantics) Theorem 2 (Complexity) Given dynamical system, Σ, con- Given belief state ρ ⊆ S of dynamical system Σ, the smooth- junctive state formula φ, and action-observation sequence ing of a belief state history (ρ0 , a0 ),(ρ1 , a1 ),. . . (ρn , an ) with ha1 , o1 , . . . , at , ot i, where each oi is a conjunctive formula, respect to the ρ is defined as: Filtera [ha1 , o1 , . . . , at , ot i](φ) is computable in time O(t · c · 1. Smooth[hi](ρ) = ρ |F|) where c is the maximum number of conditional effects over actions in the sequence. 2. Smooth[o](ρ) = Filter[o](ρ) = {s | s ∈ ρ and o is true in s} This follows from fluent entailment of a conjunctive for- mula and the limit of |F| fluents per consistent conditional 3. Smooth[(ρ0 , a0 )](ρ) = effect. For automated planning and similar problems where ρ0 ∩ {s | s ∈ PreImage(s0 , a0 ), s0 ∈ ρ} the general assumption is that observations are conjunctive 4. Smooth[h(ρ0 , a0 ), (ρ1 , a1 ), . . . (ρn , an )i](ρ) = formulae, approximate logical filtering enables state estima- Smooth[h(ρ0 , a0 ), . . . (ρn−1 , an−1 )i] tion to reduce to set operations. While computationally ap- (Smooth[(ρn , an )](ρ)) pealing, the approximation is weak and thus of limited use on its own for AI planning, dynamical diagnosis, and similar where PreImage(s0 , a) = {s | s0 = R(s, a)} tasks. In the next section, we show how to combine approxi- mate logical filtering with a new approach for reasoning over Logical smoothing works by propagating acquired infor- the past – logical smoothing. mation (typically an observation or the resultant filtered state- action pair) back through a given history and updating its con- stituent state estimates. Note that this is more akin to belief 3 Logical Smoothing updating than to belief revision. Each smoothing step refines Filtering with a stochastic transition system involves esti- previous state estimates in that the smoothed state’s set of mating the marginal posterior probability distribution of the possible world states are always a (non-strict) subset of the system conditioned on some data that was received prior state’s original set of possible world states. to the current system state. There is an analogous concept With the semantic account of logical smoothing in hand, for estimating not the current state but instead a previous we now define the notion of logical smoothing with respect state. This is called smoothing – refining a previous state es- to a more compact representation of the set of belief state in timation (e.g., [Einicke, 2012]). For stochastic models, this terms of a belief state formula. amounts to a re-computation of the marginal posterior proba- bility distribution. We can carry this idea into the the logical Definition 5 (History) Given dynamical system, setting and show how observations can be used to refine esti- Σ, a history H over Σ is a sequence of tuples mates of past states. The notion of logical smoothing is only (σ0 , a0 ), (σ1 , a1 ), . . . (σn , an ) such that each σi is a of interest if the belief state is approximated in some way. For belief-state formula over F and each ai is an action of Σ. example, in later sections, we show how logical smoothing, Definition 6 provides a formal characterization of logical can be used to improve weak approximations of logical filter- smoothing with respect to a belief-state formula represen- ing and thus produce a refined estimate of the current state. tation of the history. It utilizes regression rewriting [Reiter, We begin our treatment of logical smoothing by defining 2001] to propagate updates back through the history. Regres- the semantics of logical smoothing with respect to a belief sion, denoted as R[φ, a], takes a logical formula φ and action state. Given a representation of the sequence of actions, ob- a and rewrites the formula in terms of the weakest conditions servations, and intermediate state estimates, logical smooth- necessary for φ to hold following the execution of a. We ap- ing recursively refines previous state estimates through a peal to Reiter’s definition of regression in the situation calcu- backwards update procedure. To this end, we store previous lus [Reiter, 2001] with syntactic notation suitably modified. state estimates, coupled with the actions executed to construct When the history comprises a single state-action pair, for- successor states in a so-called history. Note that while we use mula φ is simply conjoined to the state. Otherwise, the for- a set of possible worlds to represent a belief state, histories mula is regressed step by step through the history, and any can be defined with any sort of state representation (logical new information garnered from the regression, φN EW , is formulae, sets of fluents understood as a logical conjunction, conjoined to the associated state. P I(φ) refers to the prime etc.), which we exploit later in this section. implicates of formula φ. Definition 6 (Logical Smoothing) Given dynamical system of the smoothing procedure lies within R[φ, ai−1 ] (line 3) Σ, history H = (σ0 , a0 ), . . . (σn , an ) and formula φ, the log- as explained above. Line 4 identifies those aspects of ψ that ical smoothing of H with respect to φ is defined as: are new to state σi−1 by identifying prime implicates of ψ 1. Smooth[(σ, a)](φ) = (σ ∧ φ, a) not entailed by σi−1 . This is an optimization as it allows for early termination (line 5) when further smoothing would be 2. Smooth[(σ0 , a0 ), . . . , (σn , an )](φ) = unnecessary and it minimizes the subsequent formula to be Smooth[(σ0 , a0 ), . . . , (σn−1 , an−1 )](φNEW ), regressed in the next iteration. The process then repeats on Smooth[(σn , an )](φ) the newly computed ψN EW formula at index i − 1. Finally, V where φN EW = {ϕ ∈ P I(R[φ, an−1 ]) | σn−1 2 ϕ} LSmooth returns a tuple of the refined history and the index of termination term-idx . The purpose of returning term-idx The soundness and completeness of Logical Smoothing will become apparent in Section 4 when we leverage logical (Definition 6) relative to the semantic account in Definition smoothing in a state estimation algorithm. 4 follow straightforwardly from the correspondence between Proposition 2 (Early Termination Completeness) Given a PreImage and regression. dynamical system Σ and a history H over Σ with at least n Returning to our car example, consider logical smooth- state-action tuples and a formula φ over F, the updated his- ing with respect to the observation car started = F alse. tory returned by LSmooth(H, φ, n) is the updated history re- For ease of presentation, assume approximate logical filter- turned by LSmooth(H, φ, n) with the early termination con- ing is used for action progression. Progressing the initial dition of line 5 removed. state, σ0 , through the action turn ignition results in state σ1 , It follows straightforwardly from the close correspondence which is equivalent to σ0 since the effect of turn ignition is between Definition 6 and Algorithm 1, and Proposition 2 that predicated on fluents whose truth value is unknown. When the history computed in Algorithm 1 is equivalent to the spec- the observation action obs(car started) is subsequently per- ification in Definition 6. formed and car started = F alse observed, the smoothed We say that H0 = (σ00 , a00 ), . . . , σn0 , a0n ) is a sound and history is given by Smooth[h0 , h1 ](¬car started) where complete refinement of H = (σ0 , a0 ), . . . , (σn , an ) with re- h0 = (σ0 , turn ignition) h1 = (σ1 , obs(car started)). spect to formula φ that holds at index i of H if for all Following point 2 of Definition 6, this is equivalent to 0 ≤ j ≤ n, (1) a0j = aj ; (2) σj0 |= σj ; and (2) if j < i, Smooth[(h0 )](φNEW ), Smooth[(h1 ))](¬car started). Rule 1 Filter[ha0j , σj+1 0 , ..., a0i , σi0 i](σj0 ) |= φ. In other words, a sound of the logical smoothing semantics smooths σ1 of h1 as and complete refinement captures all of what must be known σ1 ∧ ¬car started. The action obs(car started) has no ef- based on the existing history and necessary conditions for φ. fects and therefore the observation car started = F alse With this definition in hand we have, must hold in the prior state. Returning to φN EW , this for- mula hinges on R[¬car started, turn ignition]. By the Theorem 3 (Soundness and Completeness) Given a dy- regression re-writing and frame axioms, this gives φ = namical system Σ, history H over Σ, and formula φ over F ¬car started ∧ (¬battery ok ∨ ¬gas ok). As σ0 |= that must be true at σi of H, LSmooth(H, φ, i) produces a ¬car started, we are left with φN EW = ¬battery ok ∨ sound and complete refinement of H. ¬gas okay after restricting to prime implicates, giving the Logical smoothing provides a sound, complete, and prin- intuitive refinement of the initial state given the observation. cipled approach to smoothing previous state estimations in a logical system in light of additional observations or informa- tion which must hold in the associated state. As this is merely Algorithm 1: LSmooth(H, φ, i) Perform logical smooth- the general algorithmic structure of logical smoothing, fur- ing on history H given that φ holds at σi in H. Returns ther optimizations are possible but omitted for clarity of ex- updated H and the index of termination. position of the core concepts. 1 σi = σi ∧ φ 2 if i > 0 then 3.1 Approximate Logical Smoothing 3 ψ = R[φ, aVi−1 ] Unfortunately, like logical filtering, the querying of belief 4 ψN EW = {ϕ ∈ P I(ψ) | σi−1 2 ϕ} states resulting from logical smoothing may not be tractable. 5 if ψN EW is empty then To make matters worse, while a single regression step results 6 return in a linear increase in formula size with respect to the input, recursive applications of regression result in an exponential 7 LSmooth(H, ψN EW , i − 1) blow up [Fritz, 2009]. To remedy these issues, we define, as 8 return (H, i) we did with logical filtering, an analogous procedure of ap- proximate logical smoothing. We do so by one minor adjust- ment to the logical smoothing outlined in Algorithm 1: Algorithm 1 realizes Logical Smoothing together with an ^ optimization to support early termination of the regression. Line 4: ψN EW = {f | ψ ∧ σi−1 |= f and σi−1 2 f } (1) Included in its input is a state index parameter, identifying Where f is restricted to fluent literals. This approximation the location within history H where φ is to be integrated. limits the updating of the history to fluent literals entailed by Logical smoothing can thus acquire information about a past these regressed additions. Again we consider only the entail- state and smooth the preceding state estimates in light of it, ments not already captured by a state in the history. We denote as well as smoothing from the most recent state. The heart the resulting algorithm obtained by modifying LSmooth as per (1) as LSmootha . It should be stressed that this is but one 4 Backwards-Forwards Algorithm way to deal with the formula size increase due to repeated Logical smoothing refines the previous state estimates by applications of regression. Alternatives which do not sacri- reasoning backwards (regressing) over the history of actions fice completeness include adding new fluents in place of cer- and states with respect to some acquired information. In many tain sub-formulae of the regressed formula [van Ditmarsch et cases, this process removes some uncertainty about the past, al., 2007] or representing the formula as a circuit [Rintanen, particularly in dynamical systems where actions have causal 2008]. ramifications. This information about the past can then be Following Definition 6, logical smoothing produced the propagated forwards (progressed) through the state-action correct refinement of the initial state with ¬battery ok ∨ history to potentially produce further refinements. ¬gas ok. However, with approximate logical smoothing As we last left the car example from Section 3.1, each ψN EW is limited to fluent literals, resulting in an empty for- state in the history, including the initial state, was smoothed mula and therefore no refinement. Some information is lost with battery ok ∧ radio ok after observing sound = T rue. but other inferences can still be made. Continuing with the While it is obvious that this should be propagated forward sequence of actions in our example, after obs(car started) to the current state, as it stands it is not so obvious how the actions turn on radio and obs(sound) are executed, ob- and why, in general, this should be done. Consider the taining sound = T rue. By smoothing with respect to this case where the action turn ignition actually had an addi- observation the regression R[sound, turn on radio] gives tional effect battery charging if battery ok. Propagating ψ = sound∨(battery ok∧radio ok). Following the approx- battery ok forward from the initial state given the actions imation rule (1) we get ψN EW = battery ok ∧ radio ok due and intermediate states in the history further refines the post- to the previous state in the history entailing ¬sound. Even turn ignition state estimates (including the current state) with the approximation, we refine the estimate of the prior with battery charging. state with the knowledge battery ok ∧ radio ok. Continuing We can realize the forward phase outlined above via this example, approximate logical smoothing would smooth filtering. Note that when operating on a history, simu- each state in the history, including the initial state, to include lating the forwards reasoning phase by filtering works battery ok ∧ radio ok. on the corresponding sequence of actions with obser- vation formulae being the state formula. For notational Proposition 3 (Soundness) Let Σ be a dynamical system, convenience, we define subseq(H, i) of a history H = H be a history over Σ of n state-action tuples, and φ be (σ0 , a0 ), (σ1 , a1 ), . . . , (σn , an ) and index 0 ≤ i ≤ n as the a formula over the language of F that must be true at the sub-sequence hai , σi+1 , ai+1 , ..., σn , an i. Algorithm 2 out- state at index i of H. If LSmooth(H, φ, i) = (H0 , j) and lines our backwards-forwards state estimation algorithm. LSmootha (H, φ, i) = (H00 , k) then for all 0 ≤ i ≤ n, if σi0 is the i-th state of H0 and σi00 is the i-th state of H00 then σi0 |= σi00 . Algorithm 2: BF(H, φ, i) Perform backwards-forwards state estimation on history H given that formula φ holds By Proposition 3, LSmootha is an under-approximation of at σi in H. LSmooth. It amounts to a version that smooths only with re- 0 1 (H , term-idx ) = LSmootha (H, φ, i) spect to conjunctive formulae. Moreover, analogously to ap- 0 2 return Filtera [(subseq(H , term-idx )](σterm-idx ) proximate logical filtering (Proposition 1), approximate logi- cal smoothing is conjunctive state formula preserving. The BF algorithm uses approximate logical smoothing and We also sidestep the exponential size formula blowup from approximate logical filtering. BF maintains conjunctive state repeated regression operations by always regressing conjunc- formulae while computing sound state estimates. Logical tive formulae except for, potentially, in the first step. smoothing allows us to encode complex information about Theorem 4 (Complexity) Given a history H over a dynami- any state of the system into the history, keeping each individ- cal system Σ of n state-action tuples such that all states are ual state estimate in a compact and computationally manage- conjunctive formulae, a conjunctive formula φ over the lan- able form that is maintained through a simple logical filter- guage of F, and an index i of H, LSmootha (H, φ, i) can be ing procedure. Towards this goal of tractability, BF leverages approximate logical smoothing with unit propagation entail- computed in time O(n·2|F | ) with propositional entailment or ment. When combined, the result is an intuitive reasoning O(n · |F |2 ) with unit propagation entailment. mechanism for dynamical states with partial observability. While the worst case complexity is exponential, in prac- Although it may appear excessive to filter with respect to tice this is not the case. Actions typically trigger few indirect state formulae, the states are always conjunctive formulae and system effects (ramifications) in comparison to the size of the approximate filtering operates by conjoining the observation propositional domain. This results in a compact regressed for- formula with the state formula resulting from progressing the mula where all unit entailments are computable through unit previous action. Since all refinements are sound, this reduces propagation in most cases. Note that since we place no re- to simple set operations over the fluents. strictions on the syntactic form of regressed formulae, such as Proposition 4 (Complexity) Given a history H over a dy- restricting to Horn clauses, unit propagation may not produce namical system Σ of n state-action tuples such that all states all entailments, resulting in a sound under-approximation. are conjunctive formulae, a conjunctive formula φ over the language of F, and an index i of H, BF(H, φ, i) can be com- dition / Goal Coverage section. Even with a judicious under- puted in time O(n · c · 2|F | ) with propositional entailment or approximation, the BF algorithm is capable of tracking every O(n · c · |F|2 ) with unit propagation entailment, where c is relevant fluent for every action trace of all plans for the above the maximum number of conditional effects over actions of Σ. problems. This is due in part to the fact that these problems This follows from the previous complexity results. belong to the class of width-1 simple contingent problems, which has the property that once a fluent becomes known it 4.1 Space Optimization stays known [Bonet and Geffner, 2011]. For problems like the Here we outline two optimizations to alleviate the space Colored Balls, ALF is capable of solving a significant portion requirements of logical smoothing. of the action traces. This is to be expected; the domain has State Relevance Minimization. The smoothing process only very little dynamics. In the case of four colors with one ball, relies on a subset of the state. Given a history H with tuple the only unsolvable traces result from the situations when the (σ, a), it is sufficient for σ to only include the fluents f that ball must be in the last location not observed and the plan is are involved in the conditional effects of a. This is due to the able to infer this without directly observing it. Other domains, regression formula being purely over these fluents plus fluents like Wumpus and CTP, require heavy reasoning, causing ALF of φ which have no positive or negative effects with respect to fail in most, if not all, cases. to a. Such an optimization of the state fluents would greatly Table 1 also reports the average history state size as a per- reduce the memory overhead as actions typically involve and centage of the total number of fluents in the problem. The effect a small fraction of the domain. The downside is that the large reduction in state size from SRM is due to the fact that ψN EW may contain old inferences and thus the early termi- only sensing actions have relevant information: no action af- nation becomes less robust. fects observable fluents. This is a byproduct of the problem Sliding Window History. A second optimization is to structures that the planning community has chosen to focus smooth over a fixed window size called fixed-lag smoothing on and highlights the orthogonality of this work to what is [Einicke, 2012] in stochastic systems. This greatly improves currently being researched. the memory footprint, at the potential expense of quality of Lastly, Table 1 reports how much time it takes to solve ev- the estimation. ery action trace of the plan for the associated problem. There are two main take-aways here. First, the SRM optimization 5 Experimental Evaluation creates a space vs time trade off. It significantly reduces the We investigate three key questions: (1) how effective is our memory footprint of the history but has an impact on compu- approach in capturing the necessary fluent literals to deter- tation time. Second, as one would expect, the time to solve a mine action preconditions and the goal; (2) how does our state given instance correlates with the number of branches (action relevance minimization impact the system; and (3) how does traces) in the plan as well as their average length in terms of our approach perform in light of different manifestations of a the number of actions. This is why large problems like Wum- domain’s dynamics. Ideally, we would compare our approach pus05 can be solved much quicker than a smaller problem empirically with logical filtering. However, the original au- like Balls4-2. thors confirmed in private correspondence that the code is To further expand the evaluation, we introduce two new unavailable. Instead, we look to the field of automated plan- domains in the class of width-n non-simple contingent prob- ning that deals with partial observability and sensing. We use lems that specifically involve system dynamics and hidden a set of standard benchmark problems and two newly created state information that must be inferred. domains. Password. n switches may be flipped by the agent. Each has We ran the BF algorithm on valid plans, and here we an unobservable correct position. There are n + 1 lights such present statistics on the proportion of action traces (plan that the i-th light signifies that exactly i switches are in the branches) for which all action preconditions and goal con- wrong position. The goal is for the 0-th light to be on and for dition are known to hold. For comparison, these are also the agent to know the correct position of each switch. reported for Approximate Logical Filtering (ALF). Average Lights. n lights are connected in series and each may po- history state sizes over the action traces are reported for tentially be broken. A light may be off if it is broken or a the BF algorithm and with the state relevance minimization light downstream is broken. The agent must fix lights that are (BF+SRM) as outlined in Section 4.1. Problem statistics and known to be broken and reach a state where all lights are no running times are also reported. longer broken. This domain has cascading effects as fixing a The plans were produced by the planner POPRP [Muise single light may change the “lit” status of all lights upstream. et al., 2014], which leverages a compilation of partially ob- Note that the contingent width (as defined by [Bonet and servable planning problems that exhibit particular proper- Geffner, 2014]) for these problems is precisely n. ties making them easier to solve (i.e., “simple” contingent These problems represent two orthogonal classes of dy- problems [Bonet and Geffner, 2011]). For a description of namic domains with their differences best summarized by the benchmark domains Wumpus, Doors, Colored Balls, and Figure 1. First, consider the Password domain. As the prob- CTP (Canadian Traveler’s Problem), see [Muise et al., 2014]. lem size grows, the average number of conditional effects Full details and source code used for evaluation will be made grows dramatically. Any time a switch is flipped, all pos- available online. sible cases for the n + 1 lights changing must be covered. Table 1 shows the results of evaluating the BF algorithm on With the Lights domain, the actions of fixing a single light plans for each of the benchmarks. First, consider the Precon- have more repercussions as lights are added but the chain is Precondition / Goal Avg History State Size Time to Solve Problem Statistics Problem Coverage (% of runs) (% of total fluents) (seconds) |F| #-BR max-BR avg-BR BF ALF BF BF+SRM BF BF+SRM Wumpus05 802 35 43 32.9 100 0 17.4 0.02 0.91 1.69 Doors05 100 25 26 16.0 100 64.0 43.5 0.3 0.01 0.17 Doors07 196 343 60 33.2 100 62.9 43.2 0.1 9.02 16.52 CTP05 76 32 10 10.0 100 3.1 43.4 0.6 0.06 0.10 CTP07 134 128 13 14.0 100 0.7 34.3 0.3 0.57 1.01 CTP09 205 512 18 18.0 100 0.2 28.3 0.2 4.34 7.73 Balls4-1 374 48 46 26.1 100 81.2 21.3 0.1 0.83 1.54 Balls4-2 396 2304 90 51.4 100 66.0 25.1 0.09 166.85 324.69 Table 1: BF Algorithm - Planning Benchmarks Performance. (#-BR) number of branches; (max-BR) max branch depth; (avg-BR) average branch depth; (Coverage) percentage of runs where all preconditions and goal fluents are captured; (History State Size) average percentage of all fluents that must be tracked; (Time to Solve) time for each technique to process and verify each branch of the plan. Problem Size vs Domain Differences Problem Size vs Average Run-time 160 35 35 Password Password Avg # of Fluents per Conditional Effect 140 30 30 Lights Lights Avg Runtime (secs) per simulation Avg # of Conditional Effects 120 25 25 100 Dotted-Lines Solid-Lines 20 20 80 15 15 60 10 10 40 20 5 5 0 0 0 0 10 20 30 40 50 0 10 20 30 40 50 Problem Size (# of lights/switches) Problem Size (# of lights/switches) Figure 1: Differences in Password and Lights domains Figure 2: Performance of BF on Password and Lights domains only increased by one per new light. If we look at how the of the system manifest when compiled into conditional ef- average number of conditions (fluents) of the conditional ef- fects of actions, Figure 2 shows that the BF algorithm scales fects grow with respect to problem size we see the reverse - equivalently. the Lights domain grows much faster than Password. By the parametrization of the Password domain, each conditional ef- 6 Related Works fect of flipping a switch depends only on the correctness of In this paper we propose an approach to logical state esti- the switch and the light that is currently on. For the Lights mation for dynamical systems that is tailored to address the domain, as more lights are added fixing any single light has a tracking of individual fluents in a computationally efficient longer chain of potential ramifications given the status of the manner. Similar to Amir and Russell’s original work on logi- lights both upstream and downstream. Therefore, these prob- cal filtering, we elected not to perform our analysis in the sit- lems allow us to compare how BF scales with respect to these uation calculus, but rather to use a dynamical system model two important domain characteristics. that is employed by those who develop efficient algorithms We evaluate via a simulation that creates a randomized true for AI automated planning and diagnosis. Nevertheless, there state and produces a sequence of actions that should result in is a large body of work in the knowledge representation lit- a state where the goal holds. Figure 2 shows how the BF al- erature, much of it in the situation calculus, that is related gorithm performs as the problem size increase, averaged over to logical filtering; particularly work on belief update and on 100 random action traces. As with the standard benchmarks, progression. Among these, Lin and Reiter (1997) provided the BF algorithm correctly deduced all action preconditions a broad study of progression spawning a number of other and goal states over all generated action traces. The main advances and culminating in Vassos and Levesque’s (2013) point of comparison here is not pure performance but instead treatment of first-order progression, which appears to sub- performance of problem type as per the preceding discussion sume Shirazi and Amir’s (2011) work on first-order logical of Figure 1. The Password domain scales slightly better par- filtering. Also relevant to approximate filtering, and in small tially due to the general domain growth with respect to prob- degree smoothing, is the work by Liu and Levesque (2005) lem size being lower than the Lights domain as per Figure 3. that studies progression in dynamical systems with respect to As Figures 1 and 3 show, an increase in a simple notion of so-called Proper Knowledge Bases. This work shares some problem size can have a large impact on multiple facets of motivation with our conjunctive formulae restriction in at- the problem representation. Regardless of how the dynamics tempting to avoid disjunction in favor of tractability. Further, Problem Size vs Domain Growth pairs but for a slightly different purpose. The planner sam- 3000 250 ples a possible complete initial state then assumes it is correct Password and plans appropriately. When information is gained through Avg Simulation Length (# of Actions) 2500 Lights 200 sensing actions that disprove the correctness of the initial Total # of Fluents in Domain state sample, re-planning is performed and a new state is sam- 2000 pled. To ensure action preconditions are correctly followed, Dotted-Lines 150 Solid-Lines 1500 precondition fluents are regressed through the history to the 100 initial state to ensure satisfiability. This portion of the algo- 1000 rithm is similar, at a high level, to the fundamental ideas of logical smoothing - that an understanding of the evolution of 50 the past can produce new information about the present. More 500 recently Brafman and Shani (2014) also exploit regression for 0 0 effective state estimation. A key difference, however, is that 0 10 20 30 40 50 Problem Size (# of lights/switches) we use newly discovered information about past states to re- duce the uncertainty of more recent states. In contrast, they Figure 3: Password and Lights domain growth only perform a backwards pass on the history of actions and the authors discuss a limited integration of sensing via re- observations, using full regression, while we approximate for gression to determine the context of actions to be performed, efficiency. building on a similar idea for projection by De Giacomo and Most recently Bonet and Geffner (2014) developed algo- Levesque (1999). Finally, Ewin et al. (2014) study the prob- rithms for belief tracking for so-called simple planning prob- lem of query projection for long-living agents by combining lems, as noted previously. We were unable to perform an regression and progression in a manner that is similar in spirit experimental comparison with their work because their im- to the work presented here. plementation is domain-specific. Nevertheless, it is interest- Our work is inspired by work on smoothing and ing to consider when one approach works well and the other backwards-forwards reasoning in stochastic systems, build- does not. As width-n problems, both the Password and Lights ing on previous work on logical filtering [Amir and Russell, domains would cause an exponential blowup for their tech- 2003], the adaptation of the classical filtering to a logical set- nique. Conversely, there are width-1 problems where our ap- ting. Various works on (database) progression (e.g., [Vassos proximation does not capture simple entailments, such as and Levesque, 2013]) are also closely related to this exact conformant-like conditions where case-based reasoning plays logical state estimation task. In 2007, Shahaf and Amir de- a role. The complementary nature of the two approaches veloped a version of logical filtering that represents the un- makes their combination an obvious step for future research. derlying belief state as a circuit. By solving the update prob- lem completely, logical circuit filtering is able to compute and 7 Concluding Remarks represent the belief state, exactly, in polynomial time. Un- Logical smoothing and backwards-forwards reasoning are fortunately, querying of the underlying data structure used in important techniques in service of reasoning about partially logical circuit filtering requires use of a SAT solver, making it observable dynamical systems. They characterize the logical untenable for tasks such as AI planning which could require analogue of commonly used techniques in control theory and excessive SAT calls to evaluate action executability during other dynamic programming settings. This paper provides the plan construction. formal foundations for a number of interesting theoretical and One of the first works on the approximation of logical be- algorithmic tools that are of practical significance for auto- lief states in dynamical systems was the 0-approximation se- mated planning, execution monitoring, diagnosis and beyond. mantics proposed by Baral and Son (1997). Like our approx- We demonstrated the effectiveness of our approach, which imation techniques, they represent beliefs in terms of con- has flawless recall on state estimation for the preconditions junctions of fluents, but do not exploit a notion of backwards- and goal conditions in the plans for existing partially observ- forwards reasoning to do state estimation. able planning problems. Further, we witnessed a dramatic re- The field of automated planning has spawned numerous duction in the number of fluent literals that must be moni- systems which necessarily have a sub-component to perform tored. In future work we plan to integrate our algorithms with state estimation. For example, the systems CN Fct , DN Fct , a contingent planning system.While our work was presented and P Ict ([To et al., 2011b; 2011a]) explicitly maintain sets in the context of deterministic actions, the account and algo- of the possible belief states but in the worst case have ex- rithms extend trivially to non-deterministic actions, and can ponential space complexity. Palacios and Geffner realize a be extended to exogenous actions. We plan to elaborate on form of approximate state estimation via a compilation pro- such cases and to further explore variants of logical smooth- cess that introduces additional fluents and actions correspond- ing and our BF algorithm as they relate to non-deterministic ing to possible initial worlds (2009). While this translation and probabilistic transition systems. Finally we wish to fur- based approach provides efficient state querying, the number ther explore the theoretical relastionship between our work of additional fluents required for completeness grows expo- and progression over Proper KBs, and relationship to the var- nentially in problem’s contingent width. ious approximations defined by Palacios and Geffner in [Pala- The SDR Planner [Brafman and Shani, 2012] maintains cios and Geffner, 2009] and by Bonet and Geffner in [Bonet a history of actions similar to our history of state-action and Geffner, 2014]. Acknowledgements: The authors gratefully acknowledge [Lin and Reiter, 1997] Fangzhen Lin and Raymond Reiter. How to funding from the Natural Sciences and Engineering Research progress a database. Artificial Intelligence, 92:131–167, 1997. Council of Canada (NSERC). [Liu and Levesque, 2005] Yongmei Liu and Hector J. Levesque. Tractable reasoning with incomplete first-order knowledge in dy- References namic systems with context-dependent actions. In Proc. of the [Amir and Russell, 2003] Eyal Amir and Stuart J. Russell. Logical 19th Int’l Joint Conference on Artificial Intelligence (IJCAI), filtering. In Proceedings of the Eighteenth International Joint pages 522–527, 2005. Conference on Artificial Intelligence (IJCAI-03), pages 75–82, [McIlraith and Reiter, 1992] S. McIlraith and R. Reiter. On tests 2003. for hypothetical reasoning. In de Kleer J. Hamschers, W. and [Baier et al., 2014] Jorge A. Baier, Brent Mombourquette, and L. Console, editors, Readings in Model-Based Diagnosis, pages 89–95. Morgan Kaufmann, 1992. Sheila A. McIlraith. Diagnostic problem solving via planning with ontic and epistemic goals. In Principles of Knowledge Rep- [Muise et al., 2014] Christian Muise, Vaishak Belle, and Sheila A. resentation and Reasoning: Proceedings of the Fourteenth Inter- McIlraith. Computing contingent plans via fully observable non- national Conference (KR 2014), 2014. deterministic planning. In The 28th AAAI Conference on Artifi- cial Intelligence, 2014. [Baral and Son, 1997] Chitta Baral and Tran Cao Son. Approxi- mate reasoning about actions in presence of sensing and incom- [Palacios and Geffner, 2009] Héctor Palacios and Hector Geffner. plete information. In Proceedngs of the 1997 International Sym- Compiling uncertainty away in conformant planning problems posium on Logic Programming, pages 387–401, 1997. with bounded width. Journal of Artificial Intelligence Research, 35:623–675, 2009. [Bonet and Geffner, 2011] Blai Bonet and Hector Geffner. Plan- ning under partial observability by classical replanning: Theory [Pinto, 1999] Javier Pinto. Compiling ramification constraints into and experiments. In IJCAI 2011, Proceedings of the 22nd Inter- effect axioms. Computational Intelligence, 15:280–307, 1999. national Joint Conference on Artificial Intelligence, Barcelona, [Reiter, 2001] Raymond Reiter. Knowledge in Action: Logical Catalonia, Spain, July 16-22, 2011, pages 1936–1941, 2011. Foundations for Specifying and Implementing Dynamical Sys- [Bonet and Geffner, 2014] Blai Bonet and Hector Geffner. Belief tems. MIT Press, Cambridge, MA, 2001. tracking for planning with sensing: Width, complexity and ap- [Rintanen, 2008] Jussi Rintanen. Regression for classical and non- proximations. Journal of Artificial Intelligence Research, pages deterministic planning. In Proceedings of the 18th European 923–970, 2014. Conference on Artificial Intelligence, pages 568–572, 2008. [Brafman and Shani, 2012] Ronen I. Brafman and Guy Shani. Re- [Shahaf and Amir, 2007] Dafna Shahaf and Eyal Amir. Logical cir- planning in domains with partial information and sensing actions. cuit filtering. In Proceedings of the 20th International Joint Con- J. Artif. Intell. Res. (JAIR), 45:565–600, 2012. ference on Artificial Intelligence (IJCAI-07), pages 2611–2618, [Brafman and Shani, 2014] Ronen I. Brafman and Guy Shani. On 2007. the properties of belief tracking for online contingent planning [Shirazi and Amir, 2011] Afsaneh Shirazi and Eyal Amir. First- using regression. In Proc. of the 21st European Conference on order logical filtering. Artificial Intelligence, 175(1):193–219, Artificial Intelligence (ECAI), pages 147–152, 2014. 2011. [De Giacomo and Levesque, 1999] Giuseppe De Giacomo and [Strass and Thielscher, 2013] Hannes Strass and Michael Hector J. Levesque. Projection using regression and sensors. In Thielscher. A general first-order solution to the ramification Proc. of the 16th Int’l Joint Conference on Artificial Intelligence problem with cycles. Journal of Applied Logic, 11(3):289–308, (IJCAI), pages 160–165, 1999. 2013. [Einicke, 2012] Garry A. Einicke. Smoothing, Filtering and Pre- [To et al., 2011a] Son Thanh To, Enrico Pontelli, and Tran Cao Son. diction - Estimating The Past, Present and Future. InTech, 2012. On the effectiveness of CNF and DNF representations in contin- gent planning. In Proc. of the 22nd Int’l Joint Conference on [Eiter and Gottlob, 1992] Thomas Eiter and Georg Gottlob. On the Artificial Intelligence (IJCAI), pages 2033–2038, 2011. complexity of propositional knowledge base revision, updates, and counterfactuals. Artificial Intelligence, 57(2-3):227–270, [To et al., 2011b] Son Thanh To, Tran Cao Son, and Enrico Pon- 1992. telli. Contingent planning as AND/OR forward search with dis- junctive representation. In Fahiem Bacchus, Carmel Domshlak, [Ewin et al., 2014] Christopher James Ewin, Adrian R. Pearce, and Stefan Edelkamp, and Malte Helmert, editors, Proceedings of Stavros Vassos. Transforming situation calculus action theories the 21st International Conference on Automated Planning and for optimised reasoning. In Proc. of the 14th Int’l Conference Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011. on the Principles of Knowledge Representation and Reasoning AAAI, 2011. (KR), 2014. [van Ditmarsch et al., 2007] Hans P. van Ditmarsch, Andreas [Fikes et al., 1972] Richard Fikes, Peter Hart, and Nils Nilsson. Herzig, and Tiago De Lima. Optimal regression for reasoning Learning and executing generalized robot plans. Artificial In- about knowledge and actions. In Proceedings of the Twenty- telligence, 3:251–288, 1972. Second AAAI Conference on Artificial Intelligence, pages 1070– [Fritz, 2009] Christian Fritz. Monitoring the Generation and Exe- 1076, 2007. cution of Optimal Plans. PhD thesis, University of Toronto, April [Vassos and Levesque, 2013] Stavros Vassos and Hector J. 2009. Levesque. How to progress a database III. Artificial Intelligence, [Kalman, 1960] R. E. Kalman. A new approach to linear filtering 195:203–221, 2013. and prediction problems. Transactions of ASM E. J. of Basic Engineering, 82(Ser. D):35–45, 1960.