<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Probabilistic Belief Update via Mixing Endogenous Actions and Exogenous Events</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gavin Rens</string-name>
          <email>grens@cs.uct.ac.za</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Meyer</string-name>
          <email>tmeyer@cs.uct.ac.za</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Cape Town, Centre for Artificial Intelligence Research - CSIR Meraka</institution>
          ,
          <country country="ZA">South Africa</country>
        </aff>
      </contrib-group>
      <fpage>41</fpage>
      <lpage>54</lpage>
      <abstract>
        <p>We investigate methods for accommodating both agent-actions and events outside the control of an agent for updating an agent's belief state. Update via the mixture of actions and events seems not to have enjoyed much attention in the literature. The framework we consider assumes given, a probability distribution over worlds thought possible, event likelihoods, state transition likelihoods and observation likelihoods. We present three methods: (i) trading off update due to agent actions, and update due to exogenous events via a measure of observation endogeny, (ii) a more direct mixture of expected action and event effects and (iii) employing a causal network involving actions and event.</p>
      </abstract>
      <kwd-group>
        <kwd>Probability</kwd>
        <kwd>Belief Update</kwd>
        <kwd>Agent Actions</kwd>
        <kwd>Exogenous Events</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>When an agent performs an action, it expects its surroundings to change due to the
effects of the action. An agent might also notice that its surroundings have changed,
but not due to one of its own actions, but due to some exogenous event. Or the
surroundings might have changed due to a chain of (indirect) effects of some action
the agent performed in the past. The agent could have an idea as to the likelihood
of particular actions and events happening in different situations. And the agent
could have knowledge about the correctness of observations it makes in different
situations.</p>
      <p>Although agent-actions and events-from-outside (including actions performed
by other agents) frequently occur simultaneously, there is very little literature
attempting to formalize how an agent should update its beliefs in such cases. There is
more literature discussing belief update for purely agent actions (e.g. the literature
on Markov decision processes) or purely exogenous events (e.g. the literature on
Bayesian networks).</p>
      <p>In this work, we assume that an agent has a probability distribution over the
possible states it can be in. We then want to answer the question, what is the agent’s
new distribution over states, given that it executed a particular action while some
(exogenous) events simultaneously occurred and a particular observation is made
shortly thereafter?</p>
      <p>For instance, if monsters leave the forest when a fire is lit, and you can light a fire
(agent action) and someone else can light a fire (event) and the monsters can leave
the forest (event), then what should you believe after lighting a fire while knowing
that there is a probability that someone else has also lit a fire, there is a probability
that the monsters will leave the forest anyway and you think you hear the monsters
leaving (observation)?</p>
      <p>In this paper, we present three methods to combine knowledge of actions and
events for probabilistic belief update. All of the uncertainties about an agent’s
environment models may be represented by probabilities of the occurrence of actions,
events, effects and observations in the different possible situations. The first method
computes a measure of the degree to which an agent believes that the observation it
received is the effect of its own action. The agent then updates its beliefs by trading
off action update and event update using this measure. The second method assumes
that updating due to actions and updating due to events should not be separate but
should be more integrated. The third method deals more realistically with the causal
relationships between all actions and events. The other two methods do not allow
for the explicit modeling of causal relationships.</p>
      <p>We start off by covering the necessary technical background, including the basic
framework: the language and probabilistic models an agent is assumed to have.
Given the formal framework, we are then ready to specify an example scenario to be
used throughout the rest of the paper. In Section 4, the three methods of update are
presented. Section 5 discusses related work and Section 6 summarizes the findings
and looks to future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Agent Framework</title>
      <sec id="sec-2-1">
        <title>In this work, an agent has the following vocabulary. – F , a finite set of Boolean world features – G, a finite set of Boolean environment events – A, a finite set of agent actions</title>
        <p>
          Each world feature and environment event may be either true (present in the world,
resp., occurred) or false (absent from the world, resp., did not occur). A valuation
function assigns true (1) or false (0) to each feature or event. Let W be the set of
valuation functions (aka, possible worlds) over world features F , such that for every
function/possible world w 2 W , w assigns either true or false to each feature in F . And
let X be the set of valuation functions (aka, atomic events) over environment events
G, such that for every function/atomic event e 2 X , e assigns either true or false to
each (primitive) event in G. Let L be the classical propositional language induced
from F using conjunction (^ ), disjunction (_ ) and negation (¬). Let ⌦ ✓ L be a set
of observations the agent is interested in. An agent has the following environment
models.1
1 Environment models will have to be learned by the agent or supplied by a knowledge
engineer during the agent design process.
– E : X ⇥ W ! [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] s.t. E(e, w) is the probability that event e occurs in world w.
– T : W ⇥ (A [ X ) ⇥ W ! [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] s.t. T (w, ae, w0) is the probability of being in world
w0 after ae occurs.
– O : ⌦ ⇥ (A [ X ) ⇥ W ! [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] s.t. O( , ae, w) is the probability that observation
occurs in world w, given action or event ae, s.t. if ⌘ , then O( , ae, w) =
O( , ae, w).
        </p>
        <p>
          Finally, an agent maintains its beliefs using a belief state (probability distribution)
B : W ! [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] over worlds, s.t. Pw2 W B(w) = 1.
        </p>
        <p>We say that a sentence 2 L is satisfied by a world w 2 W (denoted w ç ) iff
it is true when evaluated under w according to the classical laws of logic. The set of
worlds satisfying is denoted as Mod( ). We define satisfaction of primitive event
✏ 2 G evaluated under atomic event e 2 X similarly. When e satisfies ✏ , we denote
it as e ç ✏ . We may write a instead of ¬a, and a b instead of a ^ b. A belief state
{(w1, p1), (w2, p2), . . . , (wn, pn)} might be written more compactly as h p1, p2, . . . , pni ,
where for F = {q, r, s}, w1 ç q ^ r ^ s, w2 ç q ^ r ^ ¬ s, . . . , w8 ç ¬q ^ ¬ r ^ ¬ s.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>An Example Scenario</title>
      <p>Consider the following scenario as a running example.</p>
      <p>There is a forest in which a wizard and a witch live. There are also orcs (monsters)
living in the area. The orcs move in and out of the forest. Wherever they are, the
orcs cause all sorts of mischief. When the orcs get close to the wizard, he sometimes
lights a magical fire, at which time they usually leave the forest. The witch also has a
magical fire for chasing the orcs away. When the orcs are chased out of the forest,
they give an indignant cry. In this example, we are interested in the wizard’s mindset.</p>
      <p>To represent belief states more compactly, we use the form (a, b, c, d, p) 2 B
instead of (w, p) 2 B such that w ç ¬a ^ b ^ ¬ c ^ d. Let the wizard’s vocabulary be
– F = { fwiz, fwch, io, co} where the features have the following respective meanings:
the wizard’s fire is lit, the witch’s fire is lit, the orcs are in the forest, the orcs
have just cried out.
– G = {`wch, n, x } with respective meanings: the witch lit her fire, the orcs entered
the forest, the orcs exited the forest.</p>
      <p>– A = {`wiz} meaning the wizard has the single action to light his fire.
Atomic events `wch, n, x and `wch, n, x are not considered; orcs cannot both enter and
exit the forest.</p>
      <p>The wizard uses only worlds in which features fwiz, fwch and io are considered.
Let the wizard initially believe to a relatively high degree that no fires are lit and the
orcs are in the forest. Let the wizard initially believe to a lesser degree that no fires
are lit and the orcs are outside the forest. The wizard believes only to a low degree
that only the witch’s fire is lite. This initial belief state B can be written as
(w, 0.75) 2 B : w ç fwiz ^ fwch ^ io
(w, 0.05) 2 B : w ç fwiz ^ fwch ^ io
(w, 0.15) 2 B : w ç fwiz ^ fwch ^ io
(w, 0.05) 2 B : w ç fwiz ^ fwch ^ io
and for all other worlds w 2 W , B(w) = 0. This implies that B( fwiz) = 0, B( fwch) = 0.1,
B(i f ) = 0.8 and B(co) = 0. Let the possible observations be ⌦ = {co, ¬co}.</p>
      <p>All the environment models are defined in the following tables. Unfortunately,
textual explanations are unlikely to clarify the definitions, because the functions were
worked out separately for each combination of arguments. This was a knowledge
engineering exercise, with probabilities assigned purely according to the first author’s
subjective idea about how such a scenario might behave and according to the axioms
of probability.</p>
      <p>The wizard’s event likelihood function E(e, w) is defined as follows.
@w fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio
e @
`wchnx 0 0 0 0 0 0 0 0.1
`wchnx 0 0 0.5 0 0 0 0.3 0
`wchnx 0 0 0.1 0.5 0.1 0 0.4 0.2
`wchnx 0 0 0 0.2 0 0.4 0 0.3
`wchnx 1 0 0.3 0 0.6 0 0.2 0
`wchnx 0 1 0.1 0.3 0.3 0.6 0.1 0.4
For all worlds such that w ç fwiz ^ fwch ^ io, T(w, ae, w0) is defined as
HHHHw0 fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio
ae
For all worlds such that w ç fwiz ^ fwch ^ io, T(w, ae, w0) is defined as
HHHHw0 fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio
ae
`wiz
`wchnx
`wchnx
`wchnx
`wchnx
`wchnx
`wchnx
`wiz
`wchnx
`wchnx
`wchnx
`wchnx
`wchnx
`wchnx
`wiz
`wchnx
`wchnx
`wchnx
`wchnx
`wchnx
`wchnx
worlds such that w ç fwiz ^ fwch ^ io</p>
      <p>HHHHw0 fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio fwiz fwchio
ae
For all worlds such that w ç fwiz ^ fwch ^ io, T (w, ae, w0) is defined as
The wizard’s observation function O(co, ae, w) is defined as</p>
    </sec>
    <sec id="sec-4">
      <title>4 Combining Actions and Events for Update</title>
      <sec id="sec-4-1">
        <title>4.1 Mixed Update via Strength of Endogeny</title>
        <p>We say that an observation is exogenous (w.r.t. an agent) when it is caused by an
event outside the control of the agent, and an observation is endogenous (w.r.t. an
agent) when the observation is due to a recent agent action.</p>
        <p>Suppose we know what observation to expect as an effect of an action just
executed by an agent. Call that observation . And suppose that the agent actually
perceives ↵ . Our reasoning is that the less similar ↵ is to , the more likely it is
that ↵ is an effect of some outside (exogenous) event, that is, not due to the agent’s
recent action. Intuitively, if the actual observation is completely unexpected, then
the agent should update its beliefs as if some environmental event had just occurred.
Also intuitively, if the actual observation completely matches what the agent expects
due to its most recent action, then the observation should be fully associated with
the action and update should proceed as in traditional state estimation (due to a
known action).</p>
        <p>We thus measure the degree of endogeny of observation ↵ as the degree of
similarity of ↵ to the expected observation . In this work, observations are
logical propositions. The question thus becomes, how can one compute the degree of
similarity between two propositions ↵ and ?</p>
        <p>We shall base the degree of similarity on the number of worlds on which the
two proposition correspond. This amount is adjusted by the total number of worlds
under consideration. For instance, assume that Mod(↵ ) and Mod( ) overlap with
one world, and assume that Mod(↵ ) and Mod( ) also overlap with one world. Now
assume further that |Mod(↵ )| = 2, |Mod( )| = 2 and |Mod( )| = 4. Then we should
consider more similar to ↵ than .</p>
        <p>Let the similarity between two proposition be define as follows.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Definition 1.</title>
        <p>S(↵ , ) =. |Mod(↵ ) \ Mod( )| .</p>
        <p>|Mod(↵ ) [ Mod( )|
Notice that S is symmetrical, that is, S(↵ , ) = S( , ↵ ) for all ↵ , 2 ⌦ .</p>
        <p>Another approach towards defining similarity explicitly counts non-corresponding
worlds, and normalizes the final count to fall within 0 and 1: The count is the number
of -worlds in Mod(↵ ) minus the number of -worlds in Mod(¬↵ ). The general
approach for this kind of normalization is to subtract the minimum possible count
(which could be negative in this case) from the total count, and then divide it by the
difference between the minimum and maximum counts.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Definition 2. Let</title>
        <p>be the number of -worlds in Mod( ). Then</p>
        <p>.</p>
        <p>S0(↵ , ) =
↵
¬↵</p>
        <p>(
|Mod(↵ )| (
¬↵ )
¬↵ )
.</p>
        <p>Note that the maximum count is |Mod(↵ )| and the minimum is
¬↵ .</p>
        <p>Proposition 1. S(↵ , ) = S0(↵ , ).</p>
        <p>
          ↵ ¬↵ ( ¬¬↵↵ )) = |Mod(↵ )↵|+ ¬↵ . And note that ↵ = |Mod(↵ )\
PMroodo(f. N)|oatnedthtahtaSt0|(M↵o,d()↵ =)| |+Mod(¬↵↵)| =( |Mod(↵ )| + |(Mod( ) \ Mod(¬↵ ))| = |Mod(↵ ) [
Mod( )|.
190I1t,ttuornmseoausutrtehatthe||XX s[\iYYm|| ,ilwarhiteyreofXpalanndtsYbayrethseet(sn,owr masadliezfiedn)ednubmybPearuloJfaccocmarmdoinn
features they display [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].2
        </p>
        <p>As an example, suppose the vocabulary is { fwiz, fwch, io}. Note that3
– Mod( fwiz ^ fwch) = {111, 110},
– Mod( fwch) = {111, 110, 011, 010},
– Mod(io) = {111, 101, 011, 001}.</p>
        <p>Thus, S( fwiz ^ fwch, fwch) = 2/4 = 1/2 and S( fwiz ^ fwch, io) = 1/5 and S( fwiz ^
fwch, ¬ fwch) = 0/4 = 0 and S( fwiz ^ fwch, fwiz ^ fwch) = 2/2 = 1.</p>
        <p>In the context of this work, an agent does not expect a particular observation,
given an executed action in a belief state. Rather, an agent has different degrees
of belief for each of the observations it could possibly receive. The probability of
observation is</p>
        <p>P r( | a, B) = X X B(w)T (w, a, w0)O( , a, w0).</p>
        <p>w02 W w2 W</p>
        <p>We are now ready to define the expected endogeny of observation
after executing action a in belief state B.
received
2 The measure is apparently known by different names: Jaccard index, Intersection over Union
and Jaccard similarity coefficient (https://en.wikipedia.org/wiki/Jaccard_index).
3 Here, worlds are represented by strings corresponding to their truth assignments.</p>
        <p>EEndo( , a, B) =. X P r( | a, B)S( , ).</p>
        <p>2 ⌦</p>
        <p>Expected exogeny of observation received after executing action a in belief
state B will be defined as 1 EEndo( , a, B).</p>
        <p>Definition 4. Let B be a belief state, a an action in A and
endogenous update operator endo is defined as
be a sentence in L. The
Baen,do =. ¶(w0, p) | w0 2 W, p =
⇥O( , a, w0) X B(w)T (w, a, w0)⇤©,
w2 W
where
:= 1/ X O( , a, w0) X B(w)T (w, a, w0).</p>
        <p>w02 W w2 W
Here and in the rest of the paper, denotes a normalizing factor.</p>
        <p>Definition 5. Let B be a belief state and
operator exo is defined as
be a sentence in L. The exogenous update
Bexo =. ¶(w0, p) | w0 2 W, p = ⇥ X O( , e, w0) X B(w)E(e, w)T (w, e, w0)⇤©.</p>
        <p>e2 X w2 W
An agent uses exo when it knows that observation is not (directly) due to the
effects of one of its (recent) actions. The agent assumes that is due to some event
in the environment, but because of the uncertainty about which events occur in
which worlds, it maintains a probability distribution E over event occurrences, given
a world (for all worlds).</p>
        <p>Given Definition 3, we can generalize (endogenous) state estimation to
incorporate the effects of (exogenous) events occurring in nature. We define Ba, as the
mixture of endogenous update and exogenous update using expected endogeny as
the trade-off factor.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Definition 6.</title>
        <p>Ba, =. ¶(w0, p) | w0 2 W,</p>
        <p>p = EEndo( , a, B) · Baen,do(w0) + (1 EEndo( , a, B)) · Bexo(w0)©.</p>
        <p>We now compute B`wiz,co and B`wiz,¬co . It turns out that
– EEndo(co, `wiz, B) = 0.28,
– Bendo
– B`ewxioz,c=o =h0h .01.45,20,.04.40,90,.00.23,40,.00.30,50,.00.70,00,.03.10,00,.00.00,00,.00.00i 0, i ,
co
– EEndo(¬co, `wiz, B) = 0.72,
– Bendo
– B`¬ewxciooz,¬=co h =0.2h00.,103.0,02.,302.1,0,0.2.104,0,0.3.242,0,0.0.007,0,0.0.100,0,0.0.106,0i..00i ,</p>
        <sec id="sec-4-4-1">
          <title>Therefore, we get and</title>
          <p>B`wiz,co = h 0.24, 0.34, 0.11, 0.04, 0.05, 0.22, 0.00, 0.00i
B`wiz,¬co = h 0.15, 0.24, 0.17, 0.29, 0.06, 0.02, 0.03, 0.04i .</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>4.2 Hybrid Endogenous-Exogenous Update</title>
        <p>Another approach is to think of actions and events truly occurring simultaneously.
Let T (w, a, e, w0) be the probability of a transition from world w to world w0, given
action a is performed and event e occurs. And let O( , a, e, w0) be the probability of
perceiving in w0, given action a is performed and event e occurs. Update could
then be defined as follows.</p>
      </sec>
      <sec id="sec-4-6">
        <title>Definition 7.</title>
        <p>BaH, =. ¶(w0, p) | w0 2 W, p =
⇥ X O( , a, e, w0) X B(w)E(e, w)T (w, a, e, w0)⇤©.</p>
        <p>e2 X w2 W</p>
        <p>If one assumes that actions and events are independent, then T (w, a, e, w0)
can be replaced with T (w, a, w0)T (w, e, w0), and O( , a, e, w0) can be replaced with
O( , a, w0)O( , e, w0), and normalizing factor is adjusted accordingly.4</p>
        <p>With this method, using the crying orcs scenario, we get
and</p>
        <p>BH</p>
        <p>`wiz,co = h 0.60, 0.34, 0.05, 0.01, 0.00, 0.00, 0.00, 0.00i
BH</p>
        <p>`wiz,¬co = h 0.26, 0.07, 0.19, 0.48, 0.00, 0.00, 0.00, 0.00i</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.3 Using Causal Networks</title>
        <p>The reader might have felt uncomfortable with the difference in the chain of effects
of the wizard’s fire and the witch’s fire. The wizard’s fire causes the orcs to leave
the forest, which in turn cause them to cry out, but there is no ‘exit forest’ event (x)
explicitly mentioned. On the other hand, the witch’s ‘fire lighting’ (`wch) is an explicit
event and so is ‘exit forest’, but without a causal relationship being modeled. One
would expect that fire lighting causes forest exiting (at least stochasticly).</p>
        <p>
          To resolve these inconsistencies, we propose to use a network of actions and
primitive events (elements of G) as nodes, with edges indicating causal relationships.
(See, for instance, [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] on causal modeling.) More precisely, we propose a directed
acyclic graph with three properties:
1. action nodes may not have parent nodes (no causes),
2. there is a special end node representing the final world reached after all chains
of events have occurred and
3. there is a path from every node to the end node.
        </p>
        <p>The crying orcs scenario could be modeled as in Figure 1(a) or 1(b). The arcs
show the causal relationships. Figure 1(b) is a made-up network capturing more
elaborate causal relationships. Note that some actions (e.g. a1) do not cause any
events, and some events (e.g. ✏ 1) are not caused by an action.</p>
        <p>Such causal nets are also more compact representations than modeling E(e, w)
for every e 2 X . There is likely an opportunity to represent such networks even more
4 Note that if B and C are independent, then P r(A | B, C) = P r(A | B)P r(A | C)/P r(A).
(c) An example causal net.
(a) One causal net
for the crying orcs
scenario.</p>
        <p>(b) Another causal
net for the crying
orcs scenario.
compactly by reasoning/conditioning over features (F ) instead of worlds (W ), which
is left to do in future.</p>
        <p>The causal net belief update operation finds the probability of each world w0 in the
new belief state by employing the given causal net to calculate the probability of the
end node, given the action performed in the current belief state and the subsequent
observation perceived in w0.</p>
      </sec>
      <sec id="sec-4-8">
        <title>Definition 8.</title>
        <p>.</p>
        <p>BaC, = {(w0, p) | w0 2 W, p = · CausalNet(a, , B, End, w0)}.</p>
        <p>We define CausalNet(a, , B, ✏ 0, w0) with Algorithm 1. It requires that event
likelihoods, transition probabilities and observation probabilities are defined in terms of
primitive events ✏ 2 G. Hence, we define
– E(✏ , w) := Pe2 X ,eç✏ E(e, w),
– T (w, ✏ , w0) := P
– O( , ✏ , w) := P e2 X ,eç✏ T (w, e, w0),</p>
        <p>e2 X ,eç✏ O( , e, w).</p>
        <p>
          These functions involving primitive events could be specified directly and compactly
using factored representations [
          <xref ref-type="bibr" rid="ref3 ref4">4, 3</xref>
          ]. Compact specification is, however, not our
focus in this paper. To keep the algorithm concise, we define E(a, w) = 1 for every
action a and world w.
        </p>
        <p>Using this method requires no more information than the causal network and the
probabilistic models used in the other two methods. Note that conditional probability
distributions/tables are not used as in Bayesian networks; probabilistic information
is taken from the agent’s environment models.</p>
        <p>Our biggest concern with this procedure is that weighting the final probability
(sum) by the observation probabilities (line 9) may not be the best approach. Our
intuition is that is an aggregate of all effect signals caused by action a and all
primitive events (to the degree that they occurred). One would thus like to weight</p>
      </sec>
      <sec id="sec-4-9">
        <title>Algorithm 1: CausalNet</title>
        <p>Input: a, , B, ✏ 0, w0
1 sum 0;
2 for ✏ in Parents(✏ 0) do
3 for w in W do
4 if ✏ is an orphan or ✏ == a then
5 sum sum + B(w)E(✏ , w)T (w, ✏ , w0)
BCa</p>
        <p>`wiz,co = h 0.00, 0.55, 0.00, 0.02, 0.00, 0.43, 0.00, 0.00i
BCa
`wiz,¬co = h 0.00, 0.29, 0.00, 0.19, 0.00, 0.32, 0.00, 0.20i
BCb</p>
        <p>`wiz,co = 0.00, 0.55, 0.00, 0.02, 0.00, 0.42, 0.0, 0.00
BCb</p>
        <p>`wiz,¬co = h 0.00, 0.27, 0.00, 0.22, 0.00, 0.30, 0.00, 0.22i
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>
        Without it being labeled as endogenous, the state estimation function has been the
basic belief state update function in POMDP theory since the 60’s [
        <xref ref-type="bibr" rid="ref1 ref11 ref12">1, 12, 11</xref>
        ]. The
endo operator is essentially the update operation used in partially observable Markov
decision process (POMDP) theory, except that it takes propositional sentences as
observations instead of elements from ⌦ , and POMDPs operate over primitive states,
not logical worlds.
      </p>
      <p>
        The first well-known formal definition of classical (qualitative/non-probabilistic)
belief update was proposed by Katsuno and Mendelzon [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. They propose a family
of preorders { w| w 2 W } for their operator, where each  w is a reflexive, transitive
relation over W . Each such relation is interpreted as follows: if u  w v then u is at
least as plausible a change relative to w as is v; that is, situation w would more readily
evolve into u than it would into v. They proposed a set of rationality postulates and
proved that their update operator satisfies the postulates.
      </p>
      <p>
        Shapiro and Pagnucco [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] propose an approach based on the situation calculus,
where exogenous actions are presumed present, which may influence what is sensed.
They assume sensors to be exact and actions to be deterministic.
      </p>
      <p>Our framework assumes that every action performed by the agent is immediately
followed by an observation. Their framework assumes that an (endogenous) action
is either a sensing action (with only epistemic and no ontic effects) or a non-sensing
action (with only ontic and no epistemic effects).</p>
      <p>Agent actions (whether sensing or not) may be interspersed with exogenous
actions. When what is sensed contradicts what the agent believes, the occurrence
of exogenous actions is hypothesised to correct discrepancies between belief and
sensing.</p>
      <p>
        Whereas our framework assumes that endogenous and exogenous actions (and
their corresponding effects/observations) occur more or less simultaneously (hence,
the uncertainty in the source of the subsequent observation), in the framework of
Shapiro and Pagnucco [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], actions always occur separately, in sequences. In their
framework, update involves only the ontic effects of actions; and revision occurs
when observations due to sensing is unexpected. So they use revision, even for
belief change due to ontic (exogenous) actions. We categorize any belief change due
to ontic actions as update. Here, we see again (as is still often the case) that the
difference between revision and update is unclear.
      </p>
      <p>Finally, the main difference between the two frameworks is that ours has a model
for the likelihood of exogenous actions (which we call events) occurring. Due to our
framework assuming that actions are stochastic and observations are noisy, whereas
their actions are deterministic and observations exact, it makes a comparison of the
two frameworks difficult.</p>
      <p>
        Van Ditmarsch and Kooi [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] present of modal logic for reasoning about multiple
agents performing epistemic and ontic actions. To quote them, “In the literature
update models are also called action models. Here we follow [...] and call them
update models, since no agency seems to be involved.”
      </p>
      <p>Agency of agents in their logic is not modeled, that is, events (including ontic
events) may or may not be associated with the actions of agents. In a sense, in their
framework. all events are exogenous with some uncertainty about what event actually
occurred (each agent has a different uncertainty model). It might be possible to
simulate the mixing of exogenous and endogenous events/actions in our framework
with their multi-pointed update (update given a set of events).</p>
      <p>
        However, besides the fact that Van Ditmarsch and Kooi [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] work with Kripke
models to represent uncertainty (whereas we use probabilities), it is hard to see
whether their logic could essentially represent the uncertainty about the source of
ontic effects. In any case, they seem not to be interested or aware of the possibility. A
comparison is left for future work.
      </p>
      <p>There are many more logics, formalisms and frameworks which contain both
notions of ontic actions and epistemic observation. It seems that in most of those,
there is the potential to confuse or conflate the idea that an observation is epistemic
in nature or has a physical source. But, as far as we know, none of them attempts to
model the mixture of the source of an observation from physical events (exogenous
or endogenous).</p>
      <p>
        Rens [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] combines probabilistic belief (belief state) update and revision by
trading off between the two via a measure of ontic strength (the agent’s confidence
that the observation is due to a physical happening). In that work, Rens assumes that
the agent is passive, that is, that the agent does not perform actions, it only receives
observations. His update operator is defined as
      </p>
      <p>
        B⇧ =. ¶(w0, p) | w0 2 W, p = O( , w0) X X T (w, e, w0)E(e, w)B(w)©,
w2 W e2 X
which is inspired by Boutilier [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] who mentioned that event-driven update can be
defined as (w0, p) | w0 2 W, p = Pw2 W Pe2 X T (w, e, w0)E(e, w)B(w) . Note that Rens’s
observation function excludes an action/event argument. And Rens and Boutilier
take the set of events to be atomic objects, not derived from a set of primitive events
like G.
      </p>
      <p>
        Rens [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] defined an operator very similar to Ba, for use in a stochastic belief
change framework:
      </p>
      <p>Ba√, =. ¶(w0, p) | w0 2 W, p = Eng( , w0, a)Baen,do(w0) + (1 Eng( , w0, a))B⇧ (w0)©,
where 2 L is a propositional sentence and Eng( , w0, a) is the agent’s confidence
that perceived in w0 was caused by action a. In other words, Eng has the same
meaning as EEndo but is not derived, but assumed given by the framework designer.</p>
      <p>
        Dynamic Bayesian networks (DBNs) model physical change by specifying how
a BN changes due to an action. A DBN is a pair hB 0, B! i , where B0 is a BN
representing an initial belief state and B! is a BN representing how probabilities
of events change from one time-slice to the next [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We suspect that our causal
networks will be more compact than DBNs, at least, graphically. A formal comparison
of the two graphical models is left for the future.
      </p>
      <p>
        Although Lang’s work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is not directly applicable to ours in terms of ‘mixing’
endogenous and exogenous update, it does unpack and highlight several important
characteristics of update. Lang also discusses the relationship between update and
revision. His insights are on the fundamental nature of update and might guide our
future efforts in this area.
      </p>
      <p>
        The work of Halpern [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Pearl [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] should also be consulted when the causal
net approach is developed.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6 Conclusion</title>
      <p>In this paper, we have made a few suggestions on how to take into account both an
agent’s action and some other events that might occur simultaneously to update the
agent’s belief state. With respect to our crying orcs scenario, the update operation
results in very different belief states of the wizard agent. Table 6 places the updated
– Do the three methods simply reflect different but reasonable update strategies?
– Are some of the methods fundamentally flawed?
– Is it possible or even worth it to combine ideas used in the different methods?
Developing the method employing causal networks seems especially promising.
However, a major question with that method is where and how to employ the
observation weight (probability) in the calculations.</p>
      <p>Note that most terms in the methods defined in this paper take the form
B(w)E(ae, w)T (w, ae, w0)O( , ae, w0).
(1)
If any one of the functions is zero, the whole term is zero. Knowing this, there are
two things to watch out for. One, for an update operator to be well behaved, there
must be at least one world w0 such that none of the functions in (1) evaluates to zero.
Two, if either E(ae, w) or T (w, ae, w0) is known to be zero at w, then the other function
value needs not be known/specified at w, and if either T (w, ae, w0) or O( , ae, w0) is
known to be zero at w0, then the other function value needs not be known/specified
at w0.</p>
      <p>Rationality postulates for qualitative belief update have been proposed [8, e.g.],
but we have found it difficult to translate them into probabilistic versions. We would
like to tackle this in future.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>Gavin Rens was supported by a Clause Leon Foundation postdoctoral fellowship while
conducting this research. This work is based on research supported in part by the
National Research Foundation of South Africa (Grant number UID 98019). Thomas
Meyer has received funding from the European Union’s Horizon 2020 research and
innovation programme under the Marie Sklodowska-Curie grant agr. No. 690974.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aström</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Optimal control of Markov decision processes with incomplete state estimation</article-title>
          .
          <source>Journal of Mathematical Analysis and Applications</source>
          <volume>10</volume>
          ,
          <fpage>174</fpage>
          -
          <lpage>205</lpage>
          (
          <year>1965</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Boutilier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A unified model of qualitative belief change: A dynamical systems perspective</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>98</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>281</fpage>
          -
          <lpage>316</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Boutilier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Decision-theoretic planning: Structural assumptions and computational leverage</article-title>
          .
          <source>Journal of Artif. Intell. Research (JAIR) 11</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>94</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Boutilier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poole</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Computing optimal policies for partially observable decision processes using compact representations</article-title>
          .
          <source>In: Proceedings of the Thirteenth Natl. Conf. on Artif. Intell. (AAAI-96)</source>
          . pp.
          <fpage>1168</fpage>
          -
          <lpage>1175</lpage>
          . AAAI Press, Menlo Park, CA (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ditmarsch</surname>
            ,
            <given-names>H.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kooi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Semantic results for ontic and epistemic change</article-title>
          . In: G. Bonanno, W.v.d.H.,
          <string-name>
            <surname>Wooldridge</surname>
          </string-name>
          , M. (eds.)
          <article-title>Logic and the Foundations of Game and Decision Theory (LOFT 7), Texts in Logic and Games</article-title>
          , vol.
          <volume>3</volume>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>117</lpage>
          . Amsterdam University Press, Amsterdam, Netherlands (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Halpern</surname>
          </string-name>
          , J.: Actual Causality. MIT Press, Massachusetts/England (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jaccard</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Étude comparative de la distribution florale dans une portion des Alpes et des Jura</article-title>
          .
          <source>Bulletin de la Société Vaudoise des Sciences Naturelles</source>
          <volume>37</volume>
          ,
          <fpage>547</fpage>
          -
          <lpage>579</lpage>
          (
          <year>1901</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Katsuno</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendelzon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>On the difference between updating a knowledge base and revising it</article-title>
          .
          <source>In: Proceedings of the Second Intl. Conf. on Principles of Knowledge Representation and Reasoning</source>
          . pp.
          <fpage>387</fpage>
          -
          <lpage>394</lpage>
          (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
          </string-name>
          , N.:
          <article-title>Probabilistic Graphical Models: Principles and Techniques</article-title>
          . The MIT Press, Cambridge, MA and London, England (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lang</surname>
          </string-name>
          , J.:
          <article-title>Belief update revisited</article-title>
          . In: de Mantaras, R.L. (ed.)
          <source>Proceedings of the Twentieth Intl. Joint Conf. on Artif. Intell. (IJCAI-07)</source>
          . pp.
          <fpage>2517</fpage>
          -
          <lpage>2522</lpage>
          . AAAI Press, Menlo Park, CA (
          <year>January 2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lovejoy</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>A survey of algorithmic methods for partially observed Markov decision processes</article-title>
          .
          <source>Annals of Operations Research</source>
          <volume>28</volume>
          ,
          <fpage>47</fpage>
          -
          <lpage>66</lpage>
          (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Monahan</surname>
          </string-name>
          , G.:
          <article-title>A survey of partially observable Markov decision processes: Theory, models, and algorithms</article-title>
          .
          <source>Management Science</source>
          <volume>28</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pearl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <source>Causality: Models, Reasoning, and Inference</source>
          . Cambridge University Press, New York, NY, USA (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pearl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mackenzie</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The Book of Why: The New Science of Cause and Effect</article-title>
          . Basic Books, New York, USA (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Rens</surname>
          </string-name>
          , G.:
          <article-title>On stochastic belief revision and update and their combination</article-title>
          . In: Kern-Isberner,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Wassermann</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the Sixteenth Intl. Workshop on Non-Monotonic Reasoning (NMR)</source>
          . pp.
          <fpage>123</fpage>
          -
          <lpage>132</lpage>
          . Technical University of Dortmund (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Rens</surname>
          </string-name>
          , G.:
          <article-title>A stochastic belief change framework with an observation stream and defaults as expired observations</article-title>
          . In: Booth,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Casini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Klarman</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , Richard,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Varzinczak</surname>
          </string-name>
          , I. (eds.)
          <source>Proceedings of the Third Intl. Workshop on Defeasible and Ampliative Reasoning (DARe-2016). CEUR Workshop Proceedings</source>
          , ceur-ws.
          <source>org (August</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Shapiro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pagnucco</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Iterated belief change and exogenous actions in the situation calculus</article-title>
          .
          <source>In: Proceedings of the Sixteenth European Conf. on Artif. Intell. (ECAI-04)</source>
          . pp.
          <fpage>878</fpage>
          -
          <lpage>882</lpage>
          . IOS Press, Amsterdam (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>