Probabilistic Belief Update via Mixing Endogenous Actions and Exogenous Events Gavin Rens and Thomas Meyer University of Cape Town, Centre for Artificial Intelligence Research - CSIR Meraka, South Africa {grens,tmeyer}@cs.uct.ac.za Abstract. We investigate methods for accommodating both agent-actions and events outside the control of an agent for updating an agent’s belief state. Update via the mixture of actions and events seems not to have enjoyed much attention in the literature. The framework we consider assumes given, a proba- bility distribution over worlds thought possible, event likelihoods, state tran- sition likelihoods and observation likelihoods. We present three methods: (i) trading off update due to agent actions, and update due to exogenous events via a measure of observation endogeny, (ii) a more direct mixture of expected action and event effects and (iii) employing a causal network involving actions and event. Keywords: Probability · Belief Update · Agent Actions · Exogenous Events. 1 Introduction When an agent performs an action, it expects its surroundings to change due to the effects of the action. An agent might also notice that its surroundings have changed, but not due to one of its own actions, but due to some exogenous event. Or the surroundings might have changed due to a chain of (indirect) effects of some action the agent performed in the past. The agent could have an idea as to the likelihood of particular actions and events happening in different situations. And the agent could have knowledge about the correctness of observations it makes in different situations. Although agent-actions and events-from-outside (including actions performed by other agents) frequently occur simultaneously, there is very little literature at- tempting to formalize how an agent should update its beliefs in such cases. There is more literature discussing belief update for purely agent actions (e.g. the literature on Markov decision processes) or purely exogenous events (e.g. the literature on Bayesian networks). In this work, we assume that an agent has a probability distribution over the possible states it can be in. We then want to answer the question, what is the agent’s new distribution over states, given that it executed a particular action while some (exogenous) events simultaneously occurred and a particular observation is made shortly thereafter? 41 For instance, if monsters leave the forest when a fire is lit, and you can light a fire (agent action) and someone else can light a fire (event) and the monsters can leave the forest (event), then what should you believe after lighting a fire while knowing that there is a probability that someone else has also lit a fire, there is a probability that the monsters will leave the forest anyway and you think you hear the monsters leaving (observation)? In this paper, we present three methods to combine knowledge of actions and events for probabilistic belief update. All of the uncertainties about an agent’s envi- ronment models may be represented by probabilities of the occurrence of actions, events, effects and observations in the different possible situations. The first method computes a measure of the degree to which an agent believes that the observation it received is the effect of its own action. The agent then updates its beliefs by trading off action update and event update using this measure. The second method assumes that updating due to actions and updating due to events should not be separate but should be more integrated. The third method deals more realistically with the causal relationships between all actions and events. The other two methods do not allow for the explicit modeling of causal relationships. We start off by covering the necessary technical background, including the basic framework: the language and probabilistic models an agent is assumed to have. Given the formal framework, we are then ready to specify an example scenario to be used throughout the rest of the paper. In Section 4, the three methods of update are presented. Section 5 discusses related work and Section 6 summarizes the findings and looks to future work. 2 The Agent Framework In this work, an agent has the following vocabulary. – F , a finite set of Boolean world features – G, a finite set of Boolean environment events – A, a finite set of agent actions Each world feature and environment event may be either true (present in the world, resp., occurred) or false (absent from the world, resp., did not occur). A valuation function assigns true (1) or false (0) to each feature or event. Let W be the set of valuation functions (aka, possible worlds) over world features F , such that for every function/possible world w 2 W , w assigns either true or false to each feature in F . And let X be the set of valuation functions (aka, atomic events) over environment events G, such that for every function/atomic event e 2 X , e assigns either true or false to each (primitive) event in G. Let L be the classical propositional language induced from F using conjunction (^), disjunction (_) and negation (¬). Let ⌦ ✓ L be a set of observations the agent is interested in. An agent has the following environment models.1 1 Environment models will have to be learned by the agent or supplied by a knowledge engineer during the agent design process. 42 – E : X ⇥ W ! [0, 1] s.t. E(e, w) is the probability that event e occurs in world w. – T : W ⇥ (A [ X ) ⇥ W ! [0, 1] s.t. T (w, æ, w0 ) is the probability of being in world w0 after æ occurs. – O : ⌦ ⇥ (A [ X ) ⇥ W ! [0, 1] s.t. O( , æ, w) is the probability that observation occurs in world w, given action or event æ, s.t. if ⌘ , then O( , æ, w) = O( , æ, w). Finally, an agent maintains its beliefs P using a belief state (probability distribution) B : W ! [0, 1] over worlds, s.t. w2W B(w) = 1. We say that a sentence 2 L is satisfied by a world w 2 W (denoted w ç ) iff it is true when evaluated under w according to the classical laws of logic. The set of worlds satisfying is denoted as Mod( ). We define satisfaction of primitive event ✏ 2 G evaluated under atomic event e 2 X similarly. When e satisfies ✏, we denote it as e ç ✏. We may write a instead of ¬a, and ab instead of a ^ b. A belief state {(w1 , p1 ), (w2 , p2 ), . . . , (w n , pn )} might be written more compactly as hp1 , p2 , . . . , pn i, where for F = {q, r, s}, w1 ç q ^ r ^ s, w2 ç q ^ r ^ ¬s, . . . , w8 ç ¬q ^ ¬r ^ ¬s. 3 An Example Scenario Consider the following scenario as a running example. There is a forest in which a wizard and a witch live. There are also orcs (monsters) living in the area. The orcs move in and out of the forest. Wherever they are, the orcs cause all sorts of mischief. When the orcs get close to the wizard, he sometimes lights a magical fire, at which time they usually leave the forest. The witch also has a magical fire for chasing the orcs away. When the orcs are chased out of the forest, they give an indignant cry. In this example, we are interested in the wizard’s mindset. To represent belief states more compactly, we use the form (a, b, c, d, p) 2 B instead of (w, p) 2 B such that w ç ¬a ^ b ^ ¬c ^ d. Let the wizard’s vocabulary be – F = { f wiz , f wch , io , co } where the features have the following respective meanings: the wizard’s fire is lit, the witch’s fire is lit, the orcs are in the forest, the orcs have just cried out. – G = {`wch , n, x} with respective meanings: the witch lit her fire, the orcs entered the forest, the orcs exited the forest. – A = {`wiz } meaning the wizard has the single action to light his fire. Atomic events `wch , n, x and `wch , n, x are not considered; orcs cannot both enter and exit the forest. The wizard uses only worlds in which features f wiz , f wch and io are considered. Let the wizard initially believe to a relatively high degree that no fires are lit and the orcs are in the forest. Let the wizard initially believe to a lesser degree that no fires are lit and the orcs are outside the forest. The wizard believes only to a low degree that only the witch’s fire is lite. This initial belief state B can be written as (w, 0.75) 2 B : w ç f wiz ^ f wch ^ io (w, 0.15) 2 B : w ç f wiz ^ f wch ^ io (w, 0.05) 2 B : w ç f wiz ^ f wch ^ io (w, 0.05) 2 B : w ç f wiz ^ f wch ^ io 43 and for all other worlds w 2 W , B(w) = 0. This implies that B( f wiz ) = 0, B( f wch ) = 0.1, B(i f ) = 0.8 and B(co ) = 0. Let the possible observations be ⌦ = {co , ¬co }. All the environment models are defined in the following tables. Unfortunately, textual explanations are unlikely to clarify the definitions, because the functions were worked out separately for each combination of arguments. This was a knowledge engineering exercise, with probabilities assigned purely according to the first author’s subjective idea about how such a scenario might behave and according to the axioms of probability. The wizard’s event likelihood function E(e, w) is defined as follows. w f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io @ e@ `wch nx 0 0 0 0 0 0 0 0.1 `wch nx 0 0 0.5 0 0 0 0.3 0 `wch nx 0 0 0.1 0.5 0.1 0 0.4 0.2 `wch nx 0 0 0 0.2 0 0.4 0 0.3 `wch nx 1 0 0.3 0 0.6 0 0.2 0 `wch nx 0 1 0.1 0.3 0.3 0.6 0.1 0.4 The wizard’s transition function T (w, æ, w0 ) is defined as follows. (We do not define it for w when B(w) = 0, because it is not necessary for the examples.) For all worlds such that w ç f wiz ^ f wch ^ io HH w0 fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io æ H H `wiz 0.05 0.45 0.05 0.45 0 0 0 0 `wch nx 0 0 0 0 0.25 0.25 0.25 0.25 `wch nx 0.5 0 0 0 0 0.5 0 0 `wch nx 0.5 0 0 0 0.5 0 0 0 `wch nx 0.25 0 0.25 0 0.25 0 0.25 0 `wch nx 0 0.25 0 0.25 0 0.25 0 0.25 `wch nx 0.25 0 0.25 0 0.25 0 0.25 0 For all worlds such that w ç f wiz ^ f wch ^ io , T (w, æ, w0 ) is defined as H w0 fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io HH æ H `wiz 0.25 0.25 0.25 0.25 0 0 0 0 `wch nx 0.5 0 0 0 0.5 0 0 0 `wch nx 0 0.5 0 0 0 0.5 0 0 `wch nx 0 0.5 0 0 0 0.5 0 0 `wch nx 0.25 0 0.25 0 0.25 0 0.25 0 `wch nx 0 0.25 0 0.25 0 0.25 0 0.25 `wch nx 0 0.25 0 0.25 0 0.25 0 0.25 For all worlds such that w ç f wiz ^ f wch ^ io , T (w, æ, w0 ) is defined as HH w0 fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io fwiz fwch io æ HH `wiz 0.25 0.25 0.25 0.25 0 0 0 0 `wch nx 0.5 0 0 0 0.5 0 0 0 `wch nx 0 0.5 0 0 0 0.5 0 0 `wch nx 0.5 0 0 0 0.5 0 0 0 `wch nx 0 0 0.5 0 0 0 0.5 0 `wch nx 0 0 0 0.5 0 0 0 0.5 `wch nx 0 0 0.5 0 0 0 0.5 0 44 For all worlds such that w ç f wiz ^ f wch ^ io , T (w, æ, w0 ) is defined as w f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io @ æ@ `wiz 0.25 0.25 0.25 0.25 0 0 0 0 `wch nx 0.5 0 0 0 0.5 0 0 0 `wch nx 0 0.5 0 0 0 0.5 0 0 `wch nx 0 0.5 0 0 0 0.5 0 0 `wch nx 0 0 0.5 0 0 0 0.5 0 `wch nx 0 0 0 0.5 0 0 0 0.5 `wch nx 0 0 0 0.5 0 0 0 0.5 The wizard’s observation function O(co , æ, w) is defined as w f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io @ æ@ `wiz 0.6 0.1 0.4 0.05 0.1 0.05 0 0 `wch nx 0.4 0 0 0 0.3 0 0 0 `wch nx 0 1 0 0 0 0.7 0 0 `wch nx 0.2 0.1 0 0 0.1 0 0 0 `wch nx 0.2 0 0 0 0.1 0 0 0 `wch nx 0 0.3 0 0.1 0 0.3 0 0 `wch nx 0.3 0.1 0.1 0 0.1 0 0 0 4 Combining Actions and Events for Update 4.1 Mixed Update via Strength of Endogeny We say that an observation is exogenous (w.r.t. an agent) when it is caused by an event outside the control of the agent, and an observation is endogenous (w.r.t. an agent) when the observation is due to a recent agent action. Suppose we know what observation to expect as an effect of an action just executed by an agent. Call that observation . And suppose that the agent actually perceives ↵. Our reasoning is that the less similar ↵ is to , the more likely it is that ↵ is an effect of some outside (exogenous) event, that is, not due to the agent’s recent action. Intuitively, if the actual observation is completely unexpected, then the agent should update its beliefs as if some environmental event had just occurred. Also intuitively, if the actual observation completely matches what the agent expects due to its most recent action, then the observation should be fully associated with the action and update should proceed as in traditional state estimation (due to a known action). We thus measure the degree of endogeny of observation ↵ as the degree of similarity of ↵ to the expected observation . In this work, observations are logi- cal propositions. The question thus becomes, how can one compute the degree of similarity between two propositions ↵ and ? We shall base the degree of similarity on the number of worlds on which the two proposition correspond. This amount is adjusted by the total number of worlds under consideration. For instance, assume that Mod(↵) and Mod( ) overlap with one world, and assume that Mod(↵) and Mod( ) also overlap with one world. Now assume further that |Mod(↵)| = 2, |Mod( )| = 2 and |Mod( )| = 4. Then we should consider more similar to ↵ than . Let the similarity between two proposition be define as follows. 45 Definition 1. . |Mod(↵) \ Mod( )| S(↵, ) = . |Mod(↵) [ Mod( )| Notice that S is symmetrical, that is, S(↵, ) = S( , ↵) for all ↵, 2 ⌦. Another approach towards defining similarity explicitly counts non-corresponding worlds, and normalizes the final count to fall within 0 and 1: The count is the number of -worlds in Mod(↵) minus the number of -worlds in Mod(¬↵). The general approach for this kind of normalization is to subtract the minimum possible count (which could be negative in this case) from the total count, and then divide it by the difference between the minimum and maximum counts. Definition 2. Let be the number of -worlds in Mod( ). Then ↵ ¬↵ ¬↵ . ( ) S 0 (↵, ) = . |Mod(↵)| ( ¬↵ ) ¬↵ Note that the maximum count is |Mod(↵)| and the minimum is . Proposition 1. S(↵, ) = S 0 (↵, ). ↵ ¬↵ ¬↵ ↵ ( ) Proof. Note that S 0 (↵, ) = |Mod(↵)| ( ¬↵ ) = |Mod(↵)|+ ¬↵ . And note that ↵ = |Mod(↵)\ Mod( )| and that |Mod(↵)| + ¬↵ = |Mod(↵)| + |(Mod( ) \ Mod(¬↵))| = |Mod(↵) [ Mod( )|. |X \Y | It turns out that |X [Y | , where X and Y are sets, was defined by Paul Jaccard in 1901, to measure the similarity of plants by the (normalized) number of common features they display [7].2 As an example, suppose the vocabulary is { f wiz , f wch , io }. Note that3 – Mod( f wiz ^ f wch ) = {111, 110}, – Mod( f wch ) = {111, 110, 011, 010}, – Mod(io ) = {111, 101, 011, 001}. Thus, S( f wiz ^ f wch , f wch ) = 2/4 = 1/2 and S( f wiz ^ f wch , io ) = 1/5 and S( f wiz ^ f wch , ¬ f wch ) = 0/4 = 0 and S( f wiz ^ f wch , f wiz ^ f wch ) = 2/2 = 1. In the context of this work, an agent does not expect a particular observation, given an executed action in a belief state. Rather, an agent has different degrees of belief for each of the observations it could possibly receive. The probability of observation is X X P r( | a, B) = B(w)T (w, a, w0 )O( , a, w0 ). w0 2W w2W We are now ready to define the expected endogeny of observation received after executing action a in belief state B. 2 The measure is apparently known by different names: Jaccard index, Intersection over Union and Jaccard similarity coefficient (https://en.wikipedia.org/wiki/Jaccard_index). 3 Here, worlds are represented by strings corresponding to their truth assignments. 46 Definition 3. . X EEndo( , a, B) = P r( | a, B)S( , ). 2⌦ Expected exogeny of observation received after executing action a in belief state B will be defined as 1 EEndo( , a, B). Definition 4. Let B be a belief state, a an action in A and be a sentence in L. The endogenous update operator endo is defined as ¶ ⇥ X ⇤© end o . Ba, = (w0 , p) | w0 2 W, p = O( , a, w0 ) B(w)T (w, a, w0 ) , w2W where X X := 1/ O( , a, w0 ) B(w)T (w, a, w0 ). w0 2W w2W Here and in the rest of the paper, denotes a normalizing factor. Definition 5. Let B be a belief state and be a sentence in L. The exogenous update operator exo is defined as . ¶ ⇥X X ⇤© B ex o = (w0 , p) | w0 2 W, p = O( , e, w0 ) B(w)E(e, w)T (w, e, w0 ) . e2X w2W An agent uses exo when it knows that observation is not (directly) due to the effects of one of its (recent) actions. The agent assumes that is due to some event in the environment, but because of the uncertainty about which events occur in which worlds, it maintains a probability distribution E over event occurrences, given a world (for all worlds). Given Definition 3, we can generalize (endogenous) state estimation to incor- porate the effects of (exogenous) events occurring in nature. We define Ba, as the mixture of endogenous update and exogenous update using expected endogeny as the trade-off factor. Definition 6. . ¶ Ba, = (w0 , p) | w0 2 W, © end o p = EEndo( , a, B) · Ba, (w0 ) + (1 EEndo( , a, B)) · B e x o (w0 ) . We now compute B` and B` . It turns out that wiz ,co wiz ,¬co – EEndo(co , `wiz , B) = 0.28, – B`end,co = h0.52, 0.09, 0.34, 0.05, 0.00, 0.00, 0.00, 0.00i, wiz o – Bcex o = h0.14, 0.44, 0.02, 0.03, 0.07, 0.31, 0.00, 0.00i, o – EEndo(¬co , `wiz , B) = 0.72, – B`end,¬c o = h0.13, 0.32, 0.20, 0.34, 0.00, 0.00, 0.00, 0.00i, wiz o ex o – B¬c = h0.20, 0.02, 0.1, 0.14, 0.22, 0.07, 0.10, 0.16i. o Therefore, we get B` = h0.24, 0.34, 0.11, 0.04, 0.05, 0.22, 0.00, 0.00i wiz ,co and B` = h0.15, 0.24, 0.17, 0.29, 0.06, 0.02, 0.03, 0.04i. wiz ,¬co 47 4.2 Hybrid Endogenous-Exogenous Update Another approach is to think of actions and events truly occurring simultaneously. Let T (w, a, e, w0 ) be the probability of a transition from world w to world w0 , given action a is performed and event e occurs. And let O( , a, e, w0 ) be the probability of perceiving in w0 , given action a is performed and event e occurs. Update could then be defined as follows. Definition 7. H . ¶ ⇥X X ⇤© Ba, = (w0 , p) | w0 2 W, p = O( , a, e, w0 ) B(w)E(e, w)T (w, a, e, w0 ) . e2X w2W If one assumes that actions and events are independent, then T (w, a, e, w0 ) can be replaced with T (w, a, w0 )T (w, e, w0 ), and O( , a, e, w0 ) can be replaced with O( , a, w0 )O( , e, w0 ), and normalizing factor is adjusted accordingly.4 With this method, using the crying orcs scenario, we get B`H ,c = h0.60, 0.34, 0.05, 0.01, 0.00, 0.00, 0.00, 0.00i wiz o and B`H ,¬c = h0.26, 0.07, 0.19, 0.48, 0.00, 0.00, 0.00, 0.00i wiz o 4.3 Using Causal Networks The reader might have felt uncomfortable with the difference in the chain of effects of the wizard’s fire and the witch’s fire. The wizard’s fire causes the orcs to leave the forest, which in turn cause them to cry out, but there is no ‘exit forest’ event (x) explicitly mentioned. On the other hand, the witch’s ‘fire lighting’ (`wch ) is an explicit event and so is ‘exit forest’, but without a causal relationship being modeled. One would expect that fire lighting causes forest exiting (at least stochasticly). To resolve these inconsistencies, we propose to use a network of actions and primitive events (elements of G) as nodes, with edges indicating causal relationships. (See, for instance, [13] on causal modeling.) More precisely, we propose a directed acyclic graph with three properties: 1. action nodes may not have parent nodes (no causes), 2. there is a special end node representing the final world reached after all chains of events have occurred and 3. there is a path from every node to the end node. The crying orcs scenario could be modeled as in Figure 1(a) or 1(b). The arcs show the causal relationships. Figure 1(b) is a made-up network capturing more elaborate causal relationships. Note that some actions (e.g. a1 ) do not cause any events, and some events (e.g. ✏1 ) are not caused by an action. Such causal nets are also more compact representations than modeling E(e, w) for every e 2 X . There is likely an opportunity to represent such networks even more 4 Note that if B and C are independent, then P r(A | B, C) = P r(A | B)P r(A | C)/P r(A). 48 (a) One causal net (b) Another causal (c) An example causal net. for the crying orcs net for the crying scenario. orcs scenario. Fig. 1. Causal networks including actions and events. compactly by reasoning/conditioning over features (F ) instead of worlds (W ), which is left to do in future. The causal net belief update operation finds the probability of each world w0 in the new belief state by employing the given causal net to calculate the probability of the end node, given the action performed in the current belief state and the subsequent observation perceived in w0 . Definition 8. C . Ba, = {(w0 , p) | w0 2 W, p = · CausalNet(a, , B, End, w0 )}. We define CausalNet(a, , B, ✏0 , w0 ) with Algorithm 1. It requires that event likeli- hoods, transition probabilities and observation probabilities are defined in terms of primitive events ✏ 2 G. Hence, we define P – E(✏, w) := e2X ,eç✏ E(e, w), P – T (w, ✏, w0 ) := e2X ,eç✏ T (w, e, w0 ), P – O( , ✏, w) := e2X ,eç✏ O( , e, w). These functions involving primitive events could be specified directly and compactly using factored representations [4, 3]. Compact specification is, however, not our focus in this paper. To keep the algorithm concise, we define E(a, w) = 1 for every action a and world w. Using this method requires no more information than the causal network and the probabilistic models used in the other two methods. Note that conditional probability distributions/tables are not used as in Bayesian networks; probabilistic information is taken from the agent’s environment models. Our biggest concern with this procedure is that weighting the final probability (sum) by the observation probabilities (line 9) may not be the best approach. Our intuition is that is an aggregate of all effect signals caused by action a and all primitive events (to the degree that they occurred). One would thus like to weight 49 Algorithm 1: CausalNet Input: a, , B, ✏0 , w0 1 sum 0; 0 2 for ✏ in Parents(✏ ) do 3 for w in W do 4 if ✏ is an orphan or ✏ == a then 5 sum sum + B(w)E(✏, w)T (w, ✏, w0 ) 6 else 7 sum sum + CausalNet(a, , B, ✏, w)T (w, ✏, w0 ) 8 if ✏0 == End thenQ 9 return sum · ✏2Parents(End) O( , ✏, w0 ) 10 else 11 return sum 0 the term being added to sum (lines 5 & 7) by the probability of some observation which contributed to the final observation . But what are the 0 ? When using the causal net of Figure 1(a), we get C B` a ,c = h0.00, 0.55, 0.00, 0.02, 0.00, 0.43, 0.00, 0.00i wiz o and C B` a ,¬c = h0.00, 0.29, 0.00, 0.19, 0.00, 0.32, 0.00, 0.20i wiz o When using the causal net of Figure 1(b), we get C B` b ,c = 0.00, 0.55, 0.00, 0.02, 0.00, 0.42, 0.0, 0.00 wiz o and C B` b ,¬c = h0.00, 0.27, 0.00, 0.22, 0.00, 0.30, 0.00, 0.22i wiz o 5 Related Work Without it being labeled as endogenous, the state estimation function has been the basic belief state update function in POMDP theory since the 60’s [1, 12, 11]. The endo operator is essentially the update operation used in partially observable Markov decision process (POMDP) theory, except that it takes propositional sentences as observations instead of elements from ⌦, and POMDPs operate over primitive states, not logical worlds. The first well-known formal definition of classical (qualitative/non-probabilistic) belief update was proposed by Katsuno and Mendelzon [8]. They propose a family of preorders {w | w 2 W } for their operator, where each w is a reflexive, transitive relation over W . Each such relation is interpreted as follows: if u w v then u is at least as plausible a change relative to w as is v; that is, situation w would more readily 50 evolve into u than it would into v. They proposed a set of rationality postulates and proved that their update operator satisfies the postulates. Shapiro and Pagnucco [17] propose an approach based on the situation calculus, where exogenous actions are presumed present, which may influence what is sensed. They assume sensors to be exact and actions to be deterministic. Our framework assumes that every action performed by the agent is immediately followed by an observation. Their framework assumes that an (endogenous) action is either a sensing action (with only epistemic and no ontic effects) or a non-sensing action (with only ontic and no epistemic effects). Agent actions (whether sensing or not) may be interspersed with exogenous actions. When what is sensed contradicts what the agent believes, the occurrence of exogenous actions is hypothesised to correct discrepancies between belief and sensing. Whereas our framework assumes that endogenous and exogenous actions (and their corresponding effects/observations) occur more or less simultaneously (hence, the uncertainty in the source of the subsequent observation), in the framework of Shapiro and Pagnucco [17], actions always occur separately, in sequences. In their framework, update involves only the ontic effects of actions; and revision occurs when observations due to sensing is unexpected. So they use revision, even for belief change due to ontic (exogenous) actions. We categorize any belief change due to ontic actions as update. Here, we see again (as is still often the case) that the difference between revision and update is unclear. Finally, the main difference between the two frameworks is that ours has a model for the likelihood of exogenous actions (which we call events) occurring. Due to our framework assuming that actions are stochastic and observations are noisy, whereas their actions are deterministic and observations exact, it makes a comparison of the two frameworks difficult. Van Ditmarsch and Kooi [5] present of modal logic for reasoning about multiple agents performing epistemic and ontic actions. To quote them, “In the literature update models are also called action models. Here we follow [...] and call them update models, since no agency seems to be involved.” Agency of agents in their logic is not modeled, that is, events (including ontic events) may or may not be associated with the actions of agents. In a sense, in their framework. all events are exogenous with some uncertainty about what event actually occurred (each agent has a different uncertainty model). It might be possible to simulate the mixing of exogenous and endogenous events/actions in our framework with their multi-pointed update (update given a set of events). However, besides the fact that Van Ditmarsch and Kooi [5] work with Kripke models to represent uncertainty (whereas we use probabilities), it is hard to see whether their logic could essentially represent the uncertainty about the source of ontic effects. In any case, they seem not to be interested or aware of the possibility. A comparison is left for future work. There are many more logics, formalisms and frameworks which contain both notions of ontic actions and epistemic observation. It seems that in most of those, there is the potential to confuse or conflate the idea that an observation is epistemic 51 in nature or has a physical source. But, as far as we know, none of them attempts to model the mixture of the source of an observation from physical events (exogenous or endogenous). Rens [15] combines probabilistic belief (belief state) update and revision by trading off between the two via a measure of ontic strength (the agent’s confidence that the observation is due to a physical happening). In that work, Rens assumes that the agent is passive, that is, that the agent does not perform actions, it only receives observations. His update operator is defined as . ¶ XX © B ⇧ = (w0 , p) | w0 2 W, p = O( , w0 ) T (w, e, w0 )E(e, w)B(w) , w2W e2X which is inspired by Boutilier [2] P whoP mentioned that event-driven update can be de- fined as (w0 , p) | w0 2 W, p = w2W e2X T (w, e, w0 )E(e, w)B(w) . Note that Rens’s observation function excludes an action/event argument. And Rens and Boutilier take the set of events to be atomic objects, not derived from a set of primitive events like G. Rens [16] defined an operator very similar to Ba, for use in a stochastic belief change framework: √ . ¶ endo © Ba, = (w0 , p) | w0 2 W, p = Eng( , w0 , a)Ba, (w0 ) + (1 Eng( , w0 , a))B ⇧ (w0 ) , where 2 L is a propositional sentence and Eng( , w0 , a) is the agent’s confidence that perceived in w0 was caused by action a. In other words, Eng has the same meaning as EEndo but is not derived, but assumed given by the framework designer. Dynamic Bayesian networks (DBNs) model physical change by specifying how a BN changes due to an action. A DBN is a pair hB0 , B! i, where B0 is a BN representing an initial belief state and B! is a BN representing how probabilities of events change from one time-slice to the next [9]. We suspect that our causal networks will be more compact than DBNs, at least, graphically. A formal comparison of the two graphical models is left for the future. Although Lang’s work [10] is not directly applicable to ours in terms of ‘mixing’ endogenous and exogenous update, it does unpack and highlight several important characteristics of update. Lang also discusses the relationship between update and revision. His insights are on the fundamental nature of update and might guide our future efforts in this area. The work of Halpern [6] and Pearl [14] should also be consulted when the causal net approach is developed. 6 Conclusion In this paper, we have made a few suggestions on how to take into account both an agent’s action and some other events that might occur simultaneously to update the agent’s belief state. With respect to our crying orcs scenario, the update operation results in very different belief states of the wizard agent. Table 6 places the updated 52 Table 1. Comparison of belief states updated after doing `wiz and perceiving co . f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io f wiz f wch io B`wiz ,co 0.24 0.34 0.11 0.04 0.05 0.22 0.00 0.00 B`Hwiz ,co 0.26 0.07 0.19 0.48 0.00 0.00 0.00 0.00 Ca B`wiz ,co 0.00 0.55 0.00 0.02 0.00 0.43 0.00 0.00 C B`wiz b ,co 0.00 0.55 0.00 0.02 0.00 0.42 0.00 0.00 belief state distributions next to each other - updated due to the wizard’s action of lighting his fire and perceiving the orcs’ crying. Questions we may ask are – Do the three methods simply reflect different but reasonable update strategies? – Are some of the methods fundamentally flawed? – Is it possible or even worth it to combine ideas used in the different methods? Developing the method employing causal networks seems especially promising. However, a major question with that method is where and how to employ the observation weight (probability) in the calculations. Note that most terms in the methods defined in this paper take the form B(w)E(æ, w)T (w, æ, w0 )O( , æ, w0 ). (1) If any one of the functions is zero, the whole term is zero. Knowing this, there are two things to watch out for. One, for an update operator to be well behaved, there must be at least one world w0 such that none of the functions in (1) evaluates to zero. Two, if either E(æ, w) or T (w, æ, w0 ) is known to be zero at w, then the other function value needs not be known/specified at w, and if either T (w, æ, w0 ) or O( , æ, w0 ) is known to be zero at w0 , then the other function value needs not be known/specified at w0 . Rationality postulates for qualitative belief update have been proposed [8, e.g.], but we have found it difficult to translate them into probabilistic versions. We would like to tackle this in future. Acknowledgements Gavin Rens was supported by a Clause Leon Foundation postdoctoral fellowship while conducting this research. This work is based on research supported in part by the National Research Foundation of South Africa (Grant number UID 98019). Thomas Meyer has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agr. No. 690974. 53 References 1. Aström, K.: Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications 10, 174–205 (1965) 2. Boutilier, C.: A unified model of qualitative belief change: A dynamical systems perspective. Artificial Intelligence 98(1–2), 281–316 (1998) 3. Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artif. Intell. Research (JAIR) 11, 1–94 (1999) 4. Boutilier, C., Poole, D.: Computing optimal policies for partially observable decision processes using compact representations. In: Proceedings of the Thirteenth Natl. Conf. on Artif. Intell. (AAAI-96). pp. 1168–1175. AAAI Press, Menlo Park, CA (1996) 5. Ditmarsch, H.V., Kooi, B.: Semantic results for ontic and epistemic change. In: G. Bonanno, W.v.d.H., Wooldridge, M. (eds.) Logic and the Foundations of Game and Decision Theory (LOFT 7), Texts in Logic and Games, vol. 3, pp. 87–117. Amsterdam University Press, Amsterdam, Netherlands (2008) 6. Halpern, J.: Actual Causality. MIT Press, Massachusetts/England (2016) 7. Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901) 8. Katsuno, H., Mendelzon, A.: On the difference between updating a knowledge base and revising it. In: Proceedings of the Second Intl. Conf. on Principles of Knowledge Representation and Reasoning. pp. 387–394 (1991) 9. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge, MA and London, England (2009) 10. Lang, J.: Belief update revisited. In: de Mantaras, R.L. (ed.) Proceedings of the Twentieth Intl. Joint Conf. on Artif. Intell. (IJCAI-07). pp. 2517–2522. AAAI Press, Menlo Park, CA (January 2007) 11. Lovejoy, W.: A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research 28, 47–66 (1991) 12. Monahan, G.: A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Science 28(1), 1–16 (1982) 13. Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, NY, USA (2000) 14. Pearl, J., Mackenzie, D.: The Book of Why: The New Science of Cause and Effect. Basic Books, New York, USA (2018) 15. Rens, G.: On stochastic belief revision and update and their combination. In: Kern-Isberner, G., Wassermann, R. (eds.) Proceedings of the Sixteenth Intl. Workshop on Non-Monotonic Reasoning (NMR). pp. 123–132. Technical University of Dortmund (2016) 16. Rens, G.: A stochastic belief change framework with an observation stream and defaults as expired observations. In: Booth, R., Casini, G., Klarman, S., Richard, G., Varzinczak, I. (eds.) Proceedings of the Third Intl. Workshop on Defeasible and Ampliative Reasoning (DARe-2016). CEUR Workshop Proceedings, ceur-ws.org (August 2016) 17. Shapiro, S., Pagnucco, M.: Iterated belief change and exogenous actions in the situation calculus. In: Proceedings of the Sixteenth European Conf. on Artif. Intell. (ECAI-04). pp. 878–882. IOS Press, Amsterdam (2004) 54