=Paper= {{Paper |id=Vol-1626/DARe-16_2 |storemode=property |title=A Stochastic Belief Change Framework with an Observation Stream and Defaults as Expired Observations |pdfUrl=https://ceur-ws.org/Vol-1626/DARe-16_2.pdf |volume=Vol-1626 |authors=Gavin Rens |dblpUrl=https://dblp.org/rec/conf/ecai/Rens16a }} ==A Stochastic Belief Change Framework with an Observation Stream and Defaults as Expired Observations== https://ceur-ws.org/Vol-1626/DARe-16_2.pdf
                         A Stochastic Belief Change Framework
                              with an Observation Stream
                         and Defaults as Expired Observations
                                                                     Gavin Rens1


Abstract. A framework for an agent to change its probabilistic be-            decay of the truthfulness of individual observations, and how to inte-
liefs after a stream of noisy observations is received is proposed. Ob-       grate ‘expired’ and ‘prevalent’ observations into the agent’s beliefs.
servations which are no longer relevant, become default assumptions           More precisely, observations which are no longer immediately rel-
until overridden by newer, more prevalent observations. A distinc-            evant become default assumptions until overridden by newer, more
tion is made between background and foreground beliefs. Agent ac-             prevalent observations. Consider the following three scenarios.
tions and environment events are distinguishable and form part of
the agent model. It is left up to the agent designer to provide an en-         Scenario 1 An acquaintance (Bianca) shows you the new, expen-
vironment model; a submodel of the agent model. An example of an              sive iSung-8 smartphone she bought, with a very particular cover
environment model is provided in the paper, and an example scenario           design. A year later, you visit Bianca at her home, where only her
is based on it. Given the particular form of the agent model, several         teenage son lives in addition. You see two phones on the kitchen
‘patterns of cognition’ can be identified. An argument is made for            counter, an iSung-8 with the particular cover design you remember
four particular patterns.                                                     from a year ago and an iSung-9. Is the likelihood greater that the
                                                                              iSung-8 or the iSung-9 is Bianca’s?
1     MOTIVATION
                                                                               Scenario 2 You are at the airport in a city away from home and
My intention with this research is to design a framework with which           you expect to land in your home city (Cape Town) in three hours’
an agent can deal with uncertainty about its observations and actions,        time. You hear someone waiting for the same flight say ‘It is raining
and refresh its beliefs in a relatively sophisticated way.                    in Cape Town’. Is the likelihood less that it will still be raining if
   Partially observable Markov decision processes (POMDPs) [1, 25,            your flight is delayed by 3 hours than if the flight was not delayed?
23] are adequate for many stochastic domains, and they have the
supporting theory to update agents’ belief states due to a chang-              Scenario 3 Your neighbour tells you he needs to visit the dentist
ing world. But POMDPs are lacking in two aspects with respect                 urgently. You know that he uses the dentist at the Wonder-mall. A
to intelligent agents, namely, (i) the ability to maintain and rea-           month later, you see your neighbour at the Wonder-mall. Is he there
son with background knowledge (besides the models inherent in                 to see the dentist?
POMDP structures) and (ii) the theory to revise beliefs due to in-
formation acquisition. Traditionally, belief update consists of bring-           What these three scenarios have in common is that the answers to
ing an agent’s knowledge base up to date when the world described             the questions make use of the persistence of truth of certain pieces of
by the knowledge base changes, that is, it is a ‘change-recording’            information. After a period has elapsed, the veracity of some kinds of
operation, whereas belief revision is used when an agent obtains new          information dissipates. For instance, in Scenario 1, one might attach
information about a static world, that is, it is a ‘knowledge-changing’       an ‘expiry date’ to the information that the particular iSung-8 phone
operation [19]. I shall use the generic term belief change to including       is Bianca’s. So, by the time you visit her, the truth of that information
belief update and belief revision.                                            is much weaker, in fact, it has become defeasible by then. Hence,
   A different perspective on the proposed framework is from the area         we may easily argue that Bianca gave her old phone to her son and
of classical belief change [12, 13, 17]. The belief change commu-             she bought the iSung-9 for herself. However, if you had visited her
nity has not given much attention to dealing with uncertainty, espe-          one month after she showed you her new iSung-8 and you saw it
cially not stochastic uncertainty. Hence, integrating POMDP theory            together with the newer iSung-9 on the counter, you would probably
into belief change methods could be beneficial. And besides bring-            rather assume that the iSung-9 was Bianca’s son’s. In Scenario 2,
ing probability theory, POMDPs also bring decision theory, that is,           you could expect it to be raining in Cape Town when you get there in
the theory for reasoning about actions and their utility.                     three hours (because spells of rain usually last for four hours in Cape
   However, I go one or two steps farther than a straightforward in-          Town), but if your flight is delayed, there will be no rain or only
tegration of POMDPs and belief change theory. The framework also              drizzle when you land. The information ‘It is raining in Cape Town’
includes the notion of a stream of observations, the modeling of the          has a lifespan of four hours. With respect to Scenario 3, one would
1 Centre for Artificial Intelligence Research, University of KwaZulu-Natal,   expect a person who says they must visit the dentist urgently to visit
    School of Mathematics, Statistics and Computer Science and CSIR Meraka,   the dentist within approximately seven days. So your neighbour is
    South Africa, email: gavinrens@gmail.com                                  probably not at Wonder-mall to see the dentist. Hence ‘Neighbour
must visit dentist’ should be true for no longer than seven days, after        Let a belief state b be a probability distribution
                                                                                                                          P       over all the worlds
which, the statement becomes defeasibly true.                               in W . That is, b P: W → [0, 1], such that w∈W b(w) = 1. For all
   In this paper, I attempt to formalise some of these ideas. Several       ψ ∈ L, b(ψ) := w∈W,w ψ b(w).
simplification are made; two main simplifications are (i) all informa-         Let Lpc be some probability constraint language which has atoms
tion packets (evidence/observations) have a meaningful period for           constraining the probability of some propositional sentence being
which they can be thought of as certainly true and (ii) the transition      true, and contains all formulae which can be formed with the atoms
of a piece of information from certainly true to defeasibly true is im-     in combination with logical connectives. If C ⊂ Lpc is a set of
mediate. Consider the following pieces of information.                      formulae, then b satisfies C (denoted b          C) iff ∀β ∈ C, b sat-
                                                                            isfies the constraints posed by β. I denote Π as the set of all be-
1. Bianca is an acquaintance of mine.                                       lief states over the set of possible worlds W . Let ΠC be the set of
2. It will rain in Cape Town this week.                                     belief states which satisfy the probability constraints in C. That is,
3. My dentist is Dr. Oosthuizen.                                            ΠC := {c ∈ Π | c C}. Later, the notion of the theory of a set of
                                                                            belief states will be useful:
   The first statement is problematic because, for instance, Bianca
might gradually become a friend. With respect to the second state-                       Th(ΠC ) := {ψ ∈ Lpc | b ∈ ΠC , b           ψ}.
ment, it is easy to set the ‘expiry period’ to coincide with the end of
                                                                               I build on Voorbraak’s [36] partial probability theory (PPT), which
the week. One might feel that it is difficult to assign a truth period to
                                                                            allows probability assignments to be partially determined, and where
‘My dentist is Dr. Oosthuizen’ due to lack of information. A person
                                                                            there is a distinction between probabilistic information based on (i)
typically does not have the same dentist life-long, but one can usually
                                                                            hard background evidence and (ii) some assumptions. An epistemic
not predict accurately when one will get a new dentist. On the other
                                                                            state in PPT is defined as the quadruple hΩ, B, A, Ci, where Ω is a
hand, if, for instance, one knows exactly when one is moving to a
                                                                            sample space, B ⊂ Lpc is a set of probability constraints, A ⊂ Lpc
new city, then one can give a meaningful truth period, and the transi-
                                                                            is a sets of assumptions and C ⊆ W “represents specific information
tion of the piece of information from certainly true to defeasibly true
                                                                            concerning the case at hand” (an observation or evidence).
is immediate.
                                                                               Voorbraak mentions that he will only consider conditioning where
   Many of these issues are studied in temporal logics [10, 11]. The
                                                                            the evidence does not contradict the current beliefs. He defines the set
focus of the present work, however, is more on belief change with a
                                                                            of belief states corresponding to the conditionalized PPT epistemic
simple temporal aspect (and the integration of POMDP theory). One
                                                                            state as {b(· | C) ∈ Π | b ∈ ΠB∪A , b(C) > 0}.
may do well in future research to attempt combining the results of
                                                                               Voorbraak proposes constraining as an alternative to conditioning:
the present work with established work on temporal logics. One pa-
                                                                            Let ψ ∈ Lpc be a probability constraint. Then, constraining ΠB on
per particularly relevant to the present work presents a probabilistic
                                                                            ψ produces ΠB∪{ψ} . Note that expanding a belief set reduces the
temporal logic capable of modeling reasoning about evidence [7].
                                                                            number of models (worlds) and expanding a PPT epistemic state with
   Expired observations are continually aggregated into the agent’s
                                                                            extra constraints also reduces the number of models (belief states /
set of default assumptions. Prevalent (unexpired) observations re-
                                                                            probability functions).
main in the agent’s ‘current memory stream’. Whenever the agent
wants to perform some reasoning task, it combines the prevalent ob-           In the context of belief sets, it is possible to obtain any [... epis-
servations with its fixed beliefs, then modifies its changed beliefs          temic] state from the ignorant [... epistemic] state by a series of
with respect to its default assumptions, and reasons with respect to          expansions. In PPT, constraining, but not conditioning, has the
this final set of beliefs. However, the agent always reverts back to the      analogous property. This is one of the main reasons we prefer
original fixed beliefs (hence, “fixed”). The default assumptions keep         constraining and not conditioning to be the probabilistic ver-
changing as memories fade, that is, as observations expire.                   sion of expansion. [36, p. 4]
   The rest of this paper unfolds as follows. In the next section, I re-
                                                                              Voorbraak provides the following example [36].
view the three formalisms on which the framework is mainly based,
namely, partial probability theory, POMDPs and the hybrid stochas-          Example 1. Consider a robot which has to recharge his battery. This
tic belief change framework. Section 3 presents the formal definition       can be done in two rooms, let us call them room 1 and 2. The rooms
of the framework and Section 4 explains how the framework com-              are equally far away from the robot. An example of generic informa-
ponents interact and change when used for reasoning. A discussion           tion might be: “the door of room 1 is at least 40% of the time open”.
about the possible patterns of cognition within the framework is pre-       Suppose there is no other information available, and let pi denote
sented in Section 5. Then Section 6 provides an extensive example,          the probability of door i being open. Then B = {p1 ≥ 0.4}. Since
showing some of the computations which would be required in prac-           doors are typically sometimes open and sometimes closed, it might
tice. The paper ends with some final remarks and pointers to related        be reasonable to include 0 < pi < 1 in A. However, such addi-
work.                                                                       tional assumptions should be invoked only when they are necessary,
                                                                            for example, in case no reasonable decision can be made without as-
                                                                            sumptions. A good example of specific evidence is information about
2    FORMAL FOUNDATIONS                                                     the state of the doors obtained by the sensors of the robot.
Let L be a finite classical propositional language. A world is a log-          The word assumption is not very informative: An assumption may
ical model which evaluates every propositional variable to true or          be highly entrenched (indefeasible), for instance, about the laws of
false, and by extension, evaluates every propositional sentence in L        physics, or an assumption may be very tentative (defeasible) like
to true or false. Given n propositional atoms, there are 2n conceiv-        hearing gossip about some character trait of a new staff member.
able worlds. Let W be a set of possible worlds – a subset of the            However, implicit in the word default, is the notion of defeasibil-
conceivable worlds. The fact that w ∈ W satisfied ψ ∈ L (or is a            ity; default information is information which holds until stronger ev-
model for ψ) is denoted by w ψ.                                             idence defeats it. I shall refer to the set B as background knowledge
and assume it to be indefeasible, and the set F as foreground knowl-             models the probability of a transition to world w0 , given the occur-
edge and assume it to be defeasible. Hence, referring to Voorbraak’s             rence of event e in world w, an event likelihood function where
Example 1, p1 ≥ 0.4 would be in F and 0 < pi < 1 would be in                     E(e, w) = P (e | w) is the probability of the occurrence of event
B. Indeed, in PPT, “it is intended to be understood that the conclu-             e in w, an observation function where O(z, w) models the proba-
sions warranted by [A ∪ B] depend on the assumptions represented                 bility of observing z in w, and where Str (z, w) is the agent’s ontic
in A,” [37]. I interpret this to mean that background knowledge (A)              strength for z perceived in w.
dominates foreground knowledge (B).                                                 I proposed a way of trading off the probabilistic update and prob-
   It is, however, conceivable that background knowledge should be               abilistic revision, using the notion of ontic strength. The argument is
defeasible and that new (foreground) evidence should weigh stronger              that an agent could reason with a range of degrees for information be-
due its recency and applicability. In the proposed framework, a com-             ing ontic (the effect of a physical action or occurrence) or epistemic
promise is attempted: background knowledge is ‘dominated’ by new                 (purely informative). It is assumed that the higher the informations
evidence at the time of reasoning, but after reasoning, the new evi-             degree of being ontic, the lower the epistemic status of that informa-
dence is ‘forgotten’. However, evidence is not completely forgotten:             tion. “An agent has a certain sense of the degree to which a piece of
as it becomes less applicable after some time, it gets assimilated into          received information is due to a physical action or event in the world.
the foreground knowledge as default information.                                 This sense may come about due to a combination of sensor readings
   I also use elements of partially observable Markov decision pro-              and reasoning. If the agent performs an action and a change in the
cess (POMDP) theory [1, 25, 23]. In a POMDP, the agent can only                  local environment matches the expected effect of the action, it can be
predict with a likelihood in which state it will end up after perform-           quite certain that the effect is ontic information,” [29, p. 129].
ing an action. And due to imperfect sensors, an agent must maintain                 The hybrid stochastic change of belief state b due to new informa-
a probability distribution over the set of possible states.                      tion z with ontic strength (denoted b  z) is defined as [29]
   Formally, a POMDP is a tuple hS, A, T, R, Ω, Oi with a finite                                 n
set of states S = {s1 , s2 , . . . , sn }, a finite set of actions A =                b  z := (w, p) | w ∈ W, p =
{a1 , a2 , . . . , ak }, the state-transition function, where T (s, a, s0 ) is
                                                                                                   1                                             o
the probability of being in s0 after performing action a in state s, the                              (1 − Str (z, w))b∗z (w) + Str (z, w)bz (w) ,
                                                                                                   γ
reward function, where R(a, s) is the reward gained for executing a
while in state s, a finite set of observations Ω = {z1 , z2 , . . . , zm };      where ∗ is some probabilistic belief revision operator,  is some prob-
and the observation function, where O(a, z, s0 ) is the probability of           abilistic belief update operator and γ is a normalizing factor so that
observing z in state s0 resulting from performing action a in some                           
                                                                                 P
                                                                                     w∈W z (w) = 1.
                                                                                           b
other state. An initial belief state b0 over all states in S is assumed             The principle of maximum entropy [18, 27, 34, 26, 20] says that it
given.                                                                           is reasonable to represent a set of belief states ΠC by the member of
   To update the agent’s beliefs about the world, a state estimation             ΠC which is most entropic or least biased (w.r.t. information theory)
function SE (b, a, z) = bSE     a,z is defined as                                of all the members of ΠC :

                        O(a, z, s0 ) s∈S T (s, a, s0 )b(s)                                           ME (ΠC ) := arg max H(c),
                                    P
             SE    0
            ba,z (s ) =                                    ,                                                           c∈ΠC
                                      P r(z | a, b)
                                                                                                    P
                                                                                 where H(c) := − w∈W c(w) ln c(w).
where a is an action performed in ‘current’ belief-state b, z is the
                                  0                                                 The principle of minimum cross-entropy [22, 5] is used to select
resultant observation and bSE
                            a,z (s ) denotes the probability of the agent
                                                                                 the belief state c ∈ ΠY ‘most similar’ to a given belief state b ∈ ΠX ;
being in state s0 in ‘new’ belief-state bSE
                                         a,z . Note that P r(z | a, b) is a
                                                                                 the principle minimizes the directed divergence between b and c with
normalizing constant.
                                                                                 respect to entropy. Directed divergence is defined as
   Let the planning horizon h (also called the look-ahead depth) be
the number of future steps the agent plans ahead each time it selects                                            X               c(w)
its next action. V ∗ (b, h) is the optimal value of future courses of                               R(c, b) :=         c(w) ln        .
                                                                                                                 w∈W
                                                                                                                                 b(w)
actions the agent can take with respect to a finite horizon h starting
in belief-state b. This function assumes that at each step, the action           R(c, b) is undefined when b(w) = 0 while c(w) > 0; when c(w) =
which will maximize the state’s value will be selected. V ∗ (b, h) is            0, R(c, b) = 0, because limx→0 ln(x) = 0.
defined as                                                                          The knowledge and reasoning structure proposed in this paper is
          h             X                                           i            presented next. It builds on the work of Voorbraak [36] and Rens [29]
    max ρ(a, b) + γ         P r(z | a, b)V ∗ (SE (b, a, z), h − 1) ,             and includes elements of POMDP theory.
    a∈A
                         z∈Ω

                               P
where ρ(a, b) is defined as s∈S R(a, s)b(s), 0 < γ ≤ 1 is a fac-                 3   FRAMEWORK DEFINITION
tor to discount the value of future rewards and P r(z | a, b) de-
notes the probability of reaching belief-state bSE                               Let Lprob be a probabilistic language over L defined as Lprob :=
                                                    a,z = SE (b, a, z).
While V ∗ denotes the optimal state value, function Q      ∗                     {φ[`, u] | φ ∈ L, `, u ∈ [0, 1], ` ≤ u}. A sentence of the form
                            ∗                            P denotes the           φ[`, u] means the likelihood of proposition φ is greater than or equal
optimal action value: Q (a, b, h) = ρ(a, b) + γ z∈Z P r(z |
a, b)V ∗ (SE (b, a, z), h − 1) is the value of executing a in the cur-           to ` and less than or equal to u. Let N = {0, 1, 2, . . .}.
rent belief-state, plus the total expected value of belief-states reached           b satisfies formula φ[`, u] (denoted b φ[`, u]) iff ` ≤ b(φ) ≤ u.
thereafter.                                                                      Definition    1.    An     agent    maintains            a   structure
   I also build on my stochastic belief change model [29], which is              hW, B, F, A, Evt, Z, Prs, Eng, Mi, where
a structure hW, Evt, T, E, O, Str i, with a set of possible worlds W ,
a set of events Evt, an event-transition function where T (w, e, w0 )            • W is a set of possible worlds;
• B ⊂ Lprob is a background belief base of fixed assumptions;                       [29] defines the (exogenous) update of b with z as
• F ⊂ Lprob is a foreground belief base of default assumptions;
                                                                                         bz := (w, p) | w ∈ W, p =
                                                                                               
• A is a set of (agent) actions, including a special action null ;
• Evt is a set of (environment) events;                                                         1          X X
                                                                                                  O(z, w)             E(e, w0 )T (w0 , e, w)b(w0 ) ,
• Z is the observation stream, a set of observation triples: Z :=                              γ           0    e∈Evt
                                                                                                            w ∈W
  {(a1 , t1 , z1 ), (a2 , t2 , z2 ), . . . , (ak , tk , zk )}, where ai ∈ A, ti ∈
  N, zi ∈ L, and such that ∀ti , tj ∈ N, i = j iff ti = tj (i.e., no                where γ is a normalizing factor and O(z, w) can be interpreted as
  more than one action and observation occur at a time-point);                      O(z, w, null ). But HSBC does not involve agent actions; HSBC as-
• Prs : L × W → N is a persistence function, where Prs(z, w)                        sumes that agents passively receive information. Hence, when an
  indicates how long z is expected to be true from the time it is                   agent is assumed to act and the actions are known, the POMDP state
  received, given the ‘context’ of w; it is a total function over L ×               estimation function can be employed for belief state update.
  W;                                                                                   Only for the propose of illustrating how the framework can be
• Eng : L × W × A → [0, 1], where Eng(z, w, a) is the agent’s                       used, the following belief change procedure is defined.
  confidence that z perceived in w was caused by action a (i.e., that
                                                                                             b
                                                                                              a,z :={(w, p) | w ∈ W, p =
  z has an endogenous source);
• M is a model of the environment, and any auxiliary information                                    Exg(z, w, a)bz (w) + Eng(z, w, a)bSE
                                                                                                                                       a,z (w)},         (1)
  required by the definition of the particular belief change operation
  ().                                                                              where Exg(z, w, a) := 1 − Eng(z, w, a) is the confidence that z is
                                                                                    exogenous in w, given a was executed.
Definition 2. The expected persistence of z perceived in belief state
b is                                                                                4     OPERATIONAL SEMANTICS
                              X
            ExpPrs(z, b) :=        Prs(z, w) × b(w).
                                                                                    The agent is always living at time-point N . Time units remain the
                                    w∈W
                                                                                    same and must be chosen to suit the domain of interest, for instance,
  But the agent will reason with respect to a set of belief states                  milliseconds, minutes, hours, etc. The first point in time is N = 0. N
ΠC , hence, ExpPrs(z, ΠC ) must be defined. One such definition                     indicated the end of the N -th time unit, in other words, at point N ,
employs the principle of maximum entropy:                                           exactly N × u time has passed, where u is the time unit employed.
                                                                                       It will be assumed that no observation can be made at time-point
Definition 3.                                                                       0. If the agent designer feels that it is unreasonable for the agent to
                                      X                                             have to wait until N = 1 before the first observation may be made,
         ExpPrs ME (z, ΠC ) :=              Prs(z, w) × bME (w),                    then the time unit chosen for the particular domain is too large.
                                     w∈W                                               Expired observations are continually aggregated into the agent’s
                                                                                    set of default assumptions F (foreground beliefs). Prevalent (unex-
where bME = ME (ΠC ).                                                               pired) observations remain in the agent’s ‘current memory stream’ Z.
                                                                                    Whenever the agent wants to perform some reasoning task, (i) it com-
Definition 4. An observation triple (a, i, z) ∈ Z, has expired at                   bines the prevalent observations with its fixed beliefs B, then modi-
point s if ExpPrs(z, b) < s − i.                                                    fies its changed beliefs with respect to its default assumptions, or (ii)
                                                                                    modifies its fixed beliefs B with respect to its default assumptions,
   Let b
        a,z be the change of belief state b by a and z. My intention is
                                                                                    then combines the prevalent observations with its changed beliefs –
that “change” is a neutral term, not necessarily indicating revision or
                                                                                    and then reasons with respect to this final set of beliefs. However,
update. Next, I propose one instantiation of ba,z . Let the environment
                                                                                    the agent always reverts back to the original fixed beliefs (hence,
model M = hE, T, Oi, where
                                                                                    “fixed”). And the default assumptions keep changing as memories
• E : Evt ×W → [0, 1] is the event function. E(e, w) = P (e | w),                   fade, that is, as observations expire.
  the probability of the occurrence of event e in w;                                   Because any initial conditions specified are, generally, not ex-
• T : W ×(A∪Evt)×W → [0, 1] is a transition         function such that              pected to hold after the initial time-point, it does not make sense
  for every æ ∈ A ∪ Evt and w ∈ W , w0 ∈W T (w, æ, w0 ) = 1,
                                          P                                         to place them in B. The only other option is to place them in F ,
  where T (w, æ, w0 ) models the probability of a transition to world               which is allowed to change. The following guiding principle is thus
  w0 , given the execution of action / occurrence of event æ in world               provided to agent designers.
  w;
• O : W × W × A → [0, 1] is an observation              function such                    The agent’s initial beliefs (i.e., the system conditions at point
  that for every w ∈ W and a ∈ A, wz ∈W O(wz , w, a) =
                                            P
                                                                                        N = 0) must be specified in F .
  1, where O(wz , w, a) models the probability of observing φz
  (a complete theory for wz ) in w and where O(z, w, a) :=
                  z                                                                    Let C ⊂ Lprob be a belief base. At this early stage of research,
  P
     wz ∈W O(w , w, a), for all z ∈ L.
       wz z                                                                         I shall suggest only three definitions of  on a set of belief states.
                                                                                    The first is the most naı̈ve approach (denoted NV ). It is suitable for
Observation z may be due to an exogenous event (originating and
                                                                                    theoretical investigations.
produced outside the agent) or an endogenous action (originating and
produced within the agent). It is up to the agent designer to decide,                                ΠC NV a, z := {b          C
                                                                                                                      a,z | b ∈ Π }.
for each observation, whether it is exogenous or endogenous, given
the action and world.                                                                 A more practical approach is to reduce ΠC to a representative
   The hybrid stochastic belief change (HSBC) formalism of Rens                     belief state, employing the principle of maximum entropy (denoted
ME ).                                                                                 Closest MCE chooses the belief state b ∈ ΠX for which the di-
              Π   C
                      ME a, z := {b              C
                                    a,z | b = ME (Π )}.                             rected divergence from belief state c ∈ ΠY with respect to entropy
   A third and final approach which I shall mention here is, in a                   is least (considering all c ∈ ΠY ). This is an instance of the minimum
sense, a compromise between the naı̈ve and maximum entropy ap-                      cross-entropy inference.
proaches. It is actually a family of methods which will thus not be                    From here onwards, I shall not analyse their definitions;
defined precisely. It is the approach (denoted FS ) which finds a fi-              Closest(·) will be used to refer to the abstract function.
nite (preferably, relatively small) proper subset ΠFS of ΠC which is                   Reasoning is done with respect to the set of belief states ΠN . I
somehow representative of ΠC , and then applies  to the individual                 look at two patterns of cognition to determine ΠN .
belief states in ΠFS :
                                                                                    Definition 6.
              Π C
                      FS a, z := {b
                                    a,z | b ∈ Π
                                               FS
                                                  ⊂ ΠC }.                                         (ΠB set Z) ∩ ΠF
                                                                                                                 set if (Π set Z) ∩ Πset 6= ∅
                                                                                                                           B             F
                                                                                               
                                                                                    (ΠN )1 :=
                                                                                                  {Closest(Π set Z, Πset )} otherwise,
                                                                                                            B            F
    Only ME and FS will be considered in the sequel, when
it comes to belief change over a set. Let set denote one                           or
of these two operators. Then ΠC set σ can be defined,
                                                                                                          (ΠB ∩ ΠFset ) set Z if Π ∩ Πset 6= ∅
                                                                                                                                    B   F
                                                                                                      
where σ is any stream of observation triples, where σ =                                   (ΠN )2 :=
{(a1 , t1 , z1 ), (a2 , t2 , z2 ), . . . , (ak , tk , zk )} and t1 < t2 < · · · <                         {{Closest(Π , Πset )} set Z} otherwise.
                                                                                                                       B    F

tk :
                                                                                       The idea behind the definition of (ΠN )1 is that (pertinent) ob-
                             ΠC ← ΠC set a1 , z1                                   servations in the stream (Z) dominate background beliefs (B), but
                                                                                    Z-modified beliefs (ΠB set Z) dominate foreground beliefs (F ).
                    and then ΠC ← ΠC set a2 , z2 . . .
                                                                                    The idea behind the definition of (ΠN )2 is that foreground beliefs
                    and then ΠC ← ΠC set ak , zk .                                 (F ) have slightly higher status than in (ΠN )1 because they modify
                                                                                    background beliefs (when consistent with B) before (pertinent) ob-
   Let Expired be derived from the triples in Z which have just ex-                 servations in the stream (Z), but the stream finally dominates.
pired (at point N ). That is,

     Expired := {(a, i, z) ∈ Z | ExpPrs(z, ΠN −1 ) < N − i},                        5      PATTERNS OF COGNITION
where ΠN is defined below. At each time-point, F is refreshed with                  In this section, I shall explore the properties of ‘patterns of cogni-
all the expired observation triples in the order they appear in Z. In               tion’ based on (ΠN )1 and (ΠN )2 . I shall argue that there are four
other words, at each point in time, F ← Th(ΠF set Expired ).                       reasonable candidates. First, a few axioms:
As soon as the foreground has been refreshed, the expired triples are
removed from the observation stream: Z ← Z \ Expired . I shall use                  1. {b} ∩ ΠC = {b} or ∅.
the notation ΠF                                                                     2. {b} ∩ {c} 6= ∅ ⇐⇒ b = c.
               set to clarify which operation was used to arrive at
the current ΠF .                                                                    3. ΠC ME Z always results in a singleton set.
   A function which selects a belief state in one set which is in some              4. Closest({b}, ΠC ) = b.
sense closest to another set of belief states will shortly be required.
                                                                                    And one guiding principle:
The following definition suggests three such functions.

Definition 5.
                                                                                           In a given pattern of cognition, set is instantiated to the
 Closest   absolute     X    Y
                      (Π , Π ) :=       arg min
                                                      X
                                                            (b(w) − c(w))    2           same operator when it appears in the same relative position.
                                     b: b∈ΠX ,c∈ΠY w∈W

or                                                                                       For instance, in (ΠN )1 ,
                                           X
 Closest ME (ΠX , ΠY ) := arg min               (b(w) − ME (ΠY )(w))2 ,                       (ΠB ME Z) ∩ ΠF
                                                                                                            FS if (Π ME Z) ∩ ΠFS 6= ∅
                                                                                                                     B             F
                                  b∈ΠX    w∈W                                                 {Closest(Π ME Z, ΠFS )} otherwise,
                                                                                                        B          F

or
          Closest MCE (ΠX , ΠY ) :=            arg min      R(c, b),                is allowed, but not
                                           b: b∈ΠX ,c∈ΠY
                                                                                              (ΠB ME Z) ∩ ΠF
                                                                                                            FS if (Π ME Z) ∩ ΠFS 6= ∅
                                                                                                                     B             F

where X, Y ∈ Lprob , ME (ΠY ) is the most entropic/least biased                               {Closest(Π FS Z, ΠFS )} otherwise,
                                                                                                        B         F

belief state in ΠY and R(c, b) is the directed divergence of c with
respect to b.                                                                       Or in (ΠN )2 ,

   Closest absolute simply chooses the belief state b ∈ ΠX which                                   (ΠB ∩ ΠFFS ) FS Z if Π ∩ ΠFS 6= ∅
                                                                                                                           B   F

minimizes the sum of the differences between probabilities of worlds                               {Closest(Π , ΠFS )} FS Z otherwise.
                                                                                                             B    F

of b and some belief state in c ∈ ΠY (considering all c ∈ ΠY ).
   Closest ME picks the belief state cME ∈ ΠY with maximum en-                      is allowed, but not
tropy (i.e., least biased w.r.t. information in ΠY ), and then chooses
                                                                                                   (ΠB ∩ ΠFME ) FS Z if Π ∩ ΠFS 6= ∅
                                                                                                                           B    F
the belief state b ∈ ΠX which minimizes the sum of the differences
between probabilities of worlds of b and cME .                                                     {Closest(Π , ΠFS )} FS Z otherwise.
                                                                                                             B    F
   Given these axioms and the guiding principle, four reasonable can-                  • Eng(z, w, null) = 0, ∀z ∈ Ω, ∀w ∈ W (Observations after null
 didates are                                                                             actions are never endogenous).
                (ΠB FS Z) ∩ ΠF                                                        • Eng(eo, w, evict) = 0, ∀w ∈ W (Orcs in the forest should
                                 FS if (Π FS Z) ∩ ΠFS 6= ∅
                                           B                F
             
 (ΠN )11 :=                                                                              never be observed due to eviction).
                {Closest(Π FS Z, ΠFS )} otherwise,
                            B           F
                                                                                       • Eng(eō, w, evict) = 0.75 if w      ¯ = 0.5 if w
                                                                                                                             `,               ` (One can
                                                                                         be more confident that the orcs are not in the forest due to the
 (ΠN )12 := {Closest(ΠB FS Z, ΠF
                                set )}.                                                 eviction, when it is dark).
                                                                                       • Eng(`, w, evict) = 0, ∀w ∈ W (Lightness is independent of
                   (ΠB ∩ ΠFFS ) set Z if Π ∩ ΠFS 6= ∅
                                            B   F                                        eviction).
               
 (ΠN )21 :=
                   {Closest(Π , ΠFS )} set Z otherwise.
                             B    F
                                                                                       I am not entirely comfortable with this model of endogeny; see the
                                                                                       concluding section for a short discussion.
                           set )  Z}.
 (ΠN )22 := {Closest(ΠB , ΠF                                                              Let
 Note that in the definitions of (ΠN )12 , (ΠN )21 and (ΠN )22 , set                  • Prs(eo) = {(`, 2), (`,¯ 4)} (Once the orcs are observed in the for-
 may be instantiated as either of the two operators, and in (ΠN )22 ,                   est, one can rely on them remaining there for at least two hours
 is actually the ‘plain’ belief change operator.                                         when light and four hours when dark).
    I now justify each of the four patterns of cognition.                              • Prs(eō) = {(>, 1)} (One can rely on the orcs remaining outside
11 If (ΠB FS Z) were (ΠB ME Z), by axiom 3, the result is a                            the forest for one hour once one finds out that they are outside).
   singleton set, and by axioms 1 and 4, the information in ΠF is                      • Prs(`) = {(>, 5)} (One can rely on light for five hours once light
   ignored, including initial conditions or their later effects. If ΠF                   is perceived; a storm may darken things within five ours).
                                                                     FS                       ¯ = {(>, 3)} (There might be a daytime storm, which
                                                                                       • Prs(`)
   were ΠF  ME , by axiom 1, the information in (Π set Z) is ig-
                                                       B

   nored, or worse, the intersection is empty.                                           may clear up within three hours from when it stared and was per-
12 With this pattern, the issues of intersection (axioms 1 and 2) need                   ceived).
   not be dealt with. If ΠB FS Z were ΠB ME Z, by axiom 3, the                        To define the belief change operator  as in (1), M = hE, T, Oi
   result is a singleton set, and by axiom 4, the information in ΠF is                 must be defined as follows.
   ignored. Whether ΠF                             F        F
                          set is instantiated as ΠME or ΠFS , the set
   still has an influence on Π FS Z due to the latter typically not
                                 B                                                     • If w          `,
   being a singleton.                                                                       – E(rise, w) = 0,
21 If (ΠB ∩ ΠF                   B      F
                 FS ) were (Π ∩ ΠME ), by axioms 1, the informa-
                                                                                            – E(set, w) = 1/12,
   tion in Π is ignored. Whether set Z is instantiated as ME Z or
              B

   FS Z, the result will typically not be trivial – all information is                     – E(storm, w) = 3/12.
   taken into account, although in different ways.                                          – E(null, w) = 8/12.
22 Either of the two instantiations of ΠF   set accommodate the infor-
   mation in ΠB and in ΠF . Moreover, the issues of intersection                       • If w          ¬`,
   (axioms 1 and 2) need not be dealt with, and due to the simpler                          – E(rise, w) = 1/12,
   pattern, the second belief change operation can also be simplified,                      – E(set, w) = 0,
   that is, in this pattern, ME Z and FS Z both reduce to Z.
                                                                                            – E(storm, w) = 3/12.
 6    AN EXAMPLE                                                                            – E(null, w) = 8/12.
                                                                                       • T (w, null, w) = 1 ∀w ∈ W (for the action and event).
 To keep things as simple as possible for this introductory report, I
                                                                                       • T (w, evict, w0 ) = 0 if w ¬e ∨ ¬o, else if w `,
 shall illustrate (ΠN )22 instantiated as
                                                                                            – = 0.1 if w0              e ∧ o ∧ `,
                                             ME )  Z.
                     Closest absolute (ΠB , ΠF
                                                                                                                  0
                                                                                            – = 0.9 if w               e ∧ ¬o ∧ `,
     Consider the following scenario. The elves live in the forest.
 Sometimes, orcs (monsters) wander into the forest to hunt or seek                          else if w        ¬`,
 shelter from a storm. Elves don’t like orcs. The elves can either do                       – = 0.1 if w0              e ∧ o ∧ ¬`,
 nothing (action null) or they can evict the orcs (action evict). Orcs                      – = 0.9 if w          0
                                                                                                                       e ∧ ¬o ∧ ¬`.
 tend to stay in the forest for more hours the darker it is. The sun rises
 (event rise) and sets (event set) once a day (duh). Now and then,                     • T (w, rise, w0 ) = 1 if w ¬` and w0 ` and truth values of e
 there is a storm (event storm), but most of the time, nothing happens                   and o are invariant.
 (event null). The vocabulary will be {e, o, `} meaning, respectively,                 • T (w, set, w0 ) = 1 if w   ` and w0  ¬` and truth values of e
 the elves are in the forest, the orcs are in the forest, and it is light (it            and o are invariant.
 is daytime and there isn’t a storm).                                                  • T (w, storm, w0 ) = 1 if w0 ¬` and truth values of e and o are
     Henceforth, I might write αβ instead of α∧β, and ᾱ instead of ¬α.                  invariant.
 For ease of reading, the possible worlds will be ordered, and a belief                The         probabilities             for     O(z, w, null)           are      as        follows.
 state {(eo`, p1 ), (eo`,  ¯ p2 ), (eō`, p3 ), (eō`,
                                                    ¯ p4 ), (ēo`, p5 ), (ēo`,
                                                                             ¯ p6 ),   XX          w        eo`       eo`¯    eō`    eō`¯   ēo`   ēo`¯    ēō`    ēō`¯
                    ¯ p8 )} will be abbreviated as hp1 , p2 , p3 , p4 , p5 , p6 ,
 (ēō`, p7 ), (ēō`,                                                                  z      XX
                                                                                             eo             0.8       0.8      0.1    0.1     0.1    0.1        0       0
 p7 , p8 i.                                                                                  eō            0.1       0.1      0.8    0.8      0      0        0.1     0.1
     Only observations eo, eō, `, `¯ are considered.                                         `              1         0        1      0       1      0         1       0
     Let                                                                                      `¯             0         1        0      1       0      1         0       1
  The     probabilities     for    O(z, w, evict)         are     as       follows.   It turns out that γ = 0.968, resulting in 0.56/0.968 = 0.579.
XXXw eo`             eo`¯   eō`   eō`¯   ēo`   ēo`¯   ēō`   ēō`¯                  The agent believes to a high degree what it perceived (eo). The
 z X
    eo         0.8   0.8     0       0      0      0       0        0                 reason why it believes to a higher degree that it is dark than light, is
    eō        0.2   0.2    0.8     0.8     0      0      0.2      0.2                due to the relatively high chance of a storm (which darkens things)
     `          1     0      1       0      1      0       1        0                 and a small chance of the sun setting. This scenario is a bit synthetic:
     `¯         0     1      0       1      0      1       0        1                 the agent has not yet perceived that it is light. In a realistic situa-
   Initially, the foreground belief base will specify the initial condi-              tion, the agent will always sense the brightness of the environment,
tion,                                                                                 disallowing a high degree of belief in darkness.
                        F = {`[1, 1], eo[.7, .9]},                                        At N       =     5, PExpired       =     {(null, 1, eo)} because
that is, it is completely light and there is a 70% − 90% belief that
                                                                                      ExpPrs(eo, b1 ) =         w∈W Prs(eo, w) × b1 (w) = 3.2 <
                                                                                      N − 1 = 4. PAnd (evict, 4, eō) 6∈ Expired because
the elves and orcs are in the forest. And the background belief base
                                                                                      ExpPrs(eō, b1 ) = w∈W Prs(eō, w) × b1 (w) = 1 6< N − 4 = 1.
demands that the probability that elves are outside the forest while
orcs are inside is never more than 10%,                                                   Therefore, (Π5 )22       =        Closest absolute (ΠB , ΠFME ) 
                                                                                      {(evict, 4, eō)} = Closest       absolute
                                                                                                                                 (ΠB , {b1  null, eo}) 
                            B = {ēo[0, 0.1]}.                                        {(evict, 4, eō)} = Closest absolute (ΠB , {b2 })  {(evict, 4, eō)}.
                                                                                          b2 ∈ ΠB , hence, Closest absolute (ΠB , {b2 }) = b2 . (Π5 )22 is thus
  Let us start by analyzing the effect of observation stream
                                                                                      b2  evict, eō.
                 Z = {(null, 1, eo), (evict, 4, eō)}                                     Recall that Eng(eō, w, evict) equals 0.75 if w          ¯ but 0.5 if
                                                                                                                                                  `,
                                                                                      w `. And recall that
at different time-points.
   At N = 1, (Π1 )22 = Closest absolute (ΠB , ΠF                  ME )                    b                                          SE
                                                                                             a,z (w) = Exg(z, w, a)bz (w) + Eng(z, w, a)ba,z (w).
{(null, 1, eo)} (given Prs(eo) = {(`, 2), (`,        ¯ 4)}, (null, 1, eo)
has not yet expired). ΠF                    F
                         ME = ME (Π ) = h.7, 0, .1, 0, .1, 0, .1, 0i.                So, for instance,
                B               F
Due to Π             and ΠME             being mutually consistent,
Closest absolute (ΠB , ΠFME ) is simply h.7, 0, .1, 0, .1, 0, .1, 0i ∈
                                                                                                 (b2 )            ¯
                                                                                                      evict,eō (eo`)
ΠB . Denote h.7, 0, .1, 0, .1, 0, .1, 0i as b1 .                                                                 ¯ evict)(b2 )eō (eo`)
                                                                                                    = Exg(eō, eo`,                   ¯
   Eng(eo, w, null) = 0 for all w ∈ W . Therefore, b1                                                              ¯ evict)bevict,eō (eo`)
                                                                                                                             SE           ¯
                                                                                                      + Eng(eō, eo`,
{(null, 1, eo)} = b1  null, eo = (b1 )eo , which was calculated
to be h.386, .579, .007, .01, .007, .01, 0, 0i (denoted b2 henceforth).                             = 0.25(b2 )eō (eo`)
                                                                                                                       ¯ + 0.75(b2 )SE           ¯
                                                                                                                                    evict,eō (eo`)
I shall only show how (b1 )eo (eo`)¯ = 0.579 is calculated. Note that,
due to the transition functions for the four events being invariant with
respect to e and o, the only departure worlds we need to consider are                   At N = 6, Expired = {(null, 1, eo), (evict, 4, eō)}
(abusing notion) eo` and eo`:  ¯ (b1 )eo (eo`)
                                              ¯ =                                     because (null, 1, eo) had already expired at N = 5, and
    1                      X      X                                                   ExpPrs(eō, (b2 )evict,eō ) = 1 < N − 4 = 2.
      O(eo, eo`,¯ null)                 b1 (w0 )E(e, w0 )T (w0 , e, eo`)
                                                                      ¯
                                                                                                                                     ME )  {} =
                                                                                        Therefore, (Π6 )22 = Closest absolute (ΠB , ΠF
    γ
                          w0 ∈W e∈Evt                                                 Closest absolute (ΠB , {(b2 )evict,eō }).
        1        X
     = (0.8)           b1 (w0 ) × [E(null, w0 )T (w0 , null, eo`)   ¯
        γ         0
                w ∈W
                                                                                      7   CONCLUDING REMARKS
          + E(rise, w0 )T (w0 , rise, eo`)
                                        ¯
                    0      0          ¯                                               I believe that it is important to have the facility to reason about both
          + E(set, w )T (w , set, eo`)
                                                                                      (exogenous) events and (endogenous) actions; I am definitely not the
          + E(storm, w0 )T (w0 , storm, eo`)]
                                          ¯                                           first to propose a framework with both notions [28, 33, 9, 6, e.g.].
                                                                                      The framework also has two belief bases, one to represent fixed,
                                                                                      background beliefs and one to accommodate defeasible information
              1                                                                       (observations which have become ‘stale’, but not necessarily false).
                                                           ¯
                     
          =     (0.8) b1 (eo`)[E(null, eo`)T (eo`, null, eo`)                            Inherent to the framework is that the agent’s knowledge may be in-
              γ
                                               ¯                                      complete. There is much work on dealing with ignorance or missing
                + E(rise, eo`)T (eo`, rise, eo`)
                                                                                      information [14, 16, 37, 38, 21, 30, e.g.].
                                           ¯
               + E(set, eo`)T (eo`, set, eo`)                                            What makes the proposed framework potentially significant is its
                                                ¯
               + E(storm, eo`)T (eo`, storm, eo`)]                                    generality, applicable to many domains and agent-designer require-
                       ¯            ¯      ¯          ¯
               + b1 (eo`)[E(null, eo`)T (eo`, null, eo`)                              ments. I want to amplify the point that the belief change operations
                                                                                      used in this paper are only suggestions and used due to personal fa-
                            ¯ (eo`,
               + E(rise, eo`)T     ¯ rise, eo`)
                                              ¯
                                                                                      miliarity with them – the researcher / agent designer is given the flex-
                          ¯      ¯         ¯
               + E(set, eo`)T (eo`, set, eo`)                                         ibility to suggest or design their own operators to suit their needs.
                           ¯ (eo`,
                                 ¯ storm, eo`)]
                                            ¯                                            Another feature of this framework which potentially adds to its
                                                
             + E(storm, eo`)T
           1                                                                         significance, is the nature of the observation stream (with expiring
          = (0.8) (0.7)[8/12 + 0 + 1/12 + 3/12]                                       observations) and how it interacts with the dual belief base approach.
           γ
                                                                                     We saw the rich potential of patterns of cognition, which can be
             + (0)[8/12 + 0 + 0 + 3/12]                                               simultaneously confusing and empowering to the framework user.
           1                                                                          However, some of the confusion was cleared up in Section 5. My
          = 0.56.
           γ                                                                          feeling is that this observation-stream-dual-belief-base system holds
much potential for investigating deep questions in how to model con-      Except for evidence persistence, there are probably hundreds of pa-
tinual observation and cognition within a belief change setting.          pers and article on combinations of these aspects. I could not yet
   One line of research that should be made is to generalize the dual-    find any work dealing with the persistence of veracity of new ev-
belief-base approach: What would happen if three, four, more belief       idence/observations, as presented in the present paper. Besides the
bases are employed, each accommodating a different degree of en-          work already cited in this paper, the following may be used as a bib-
trenchment of given and received information? Would such a gen-           liography to better place the present work in context, and to point
eralization reduce to the known systems of belief change (which in-       to methods, approaches and techniques not covered in the proposed
clude notions of preference, plausibility, entrenchment, etc.)?           framework, which could possibly be added to it.
   Closely related to the discussion in the previous paragraph is that
keeping the belief base B fixed is quite a strong stance. In reality,     • Probabilistic logics for reasoning with defaults and for belief
only the most stubborn people will never change their core views            change or learning [15, 24].
even a little bit. Such stubbornness indicates an inability to grow,      • Nonmonotonic reasoning systems with optimum entropy infer-
that is, an inability to improve one’s reasoning and behaviour. In the      ence as central concept [4, 2, 3].
current framework, the plasticity of the assumptions, although im-        • Dynamic epistemic logics for reasoning about probabilities [35,
portant for accommodating and aggregating recent observations, are          32].
always dominated by B. In future versions, I would like to make B
more amenable to learning, while minding sound principles of belief       ACKNOWLEDGEMENTS
change in logic and cognitive psychology.
   Perhaps one of the most difficult aspects of using the proposed        I would like to thank Edgar Jembere for his comments on a draft of
framework is to specify the persistence function Prs(·). However,         this paper.
the specification is made easier by the property of the operational
semantics that: expired observations keep on having an influence of
the agent’s reasoning until (if) they are ‘overridden’ by the process
of refreshing set F . This means that the agent designer should rather
err by specifying less persistence of observations when s/he is uncer-
tain about the period to specify. In other words, the agent designer is
advised to specify the longest period an observation is guaranteed to
persist.
   I also found it challenging to thinking about how to model how
endogenous evict is, for the different worlds and observations. For
instance, if Eng(eō, w, evict) = 0.75 when w `,       ¯ should it con-
                              ¯
strain the values that Eng(`, w, evict) can take? And if it is im-
possible for the elves to evict while they are outside the forest, can
Eng(z, w, evict) be greater than 0 if w         ¬e? There seem to be
several inter-related issues in the modeling of endogeny, including
action executability, perceivability and consistency among related
observations. I did not focus on these issues here.
   It would be straightforward to add a utility function (e.g., a
POMDP reward function) to the environment model M. Existing
planning and decision-making algorithms and methods can then be
used together with the proposed framework. Instead of using the
POMDP state estimation function during POMDP planning, for in-
stance, a more general belief change operator () could be used in
planning under uncertainty, where the operator’s definition depends
on the elements of the proposed framework. Little work on planning
and decision-making with underspecified knowledge exists [31], [36,
Sec. 5].
   The fundamental distinction between focusing and belief revision
when dealing with generic knowledge has been made by Dubois and
Prade [8]: “Revision amounts to modifying the generic knowledge
when receiving new pieces of generic knowledge (or the factual ev-
idence when obtaining more factual information), while focusing is
just applying the generic knowledge to the reference class of situa-
tions which exactly corresponds to all the available evidence gath-
ered on the case under consideration.” The distinction seems very
relevant to general systems for knowledge management in dynamic
environments.
   This paper touches on several aspects of computational reason-
ing, including stochasticity, imprecision/ignorance, knowledge en-
trenchment, default knowledge, physical change and belief update,
new evidence and belief revision, and the persistence of evidence.
REFERENCES                                                                                plementation’, International Journal of Approximate Reasoning, 44(3),
                                                                                          301–321, (2007).
                                                                                   [25]   G. Monahan, ‘A survey of partially observable Markov decision pro-
 [1] K. Aström, ‘Optimal control of Markov decision processes with incom-                cesses: Theory, models, and algorithms’, Management Science, 28(1),
     plete state estimation’, Journal of Mathematical Analysis and Applica-               1–16, (1982).
     tions, 10, 174–205, (1965).                                                   [26]   J. Paris, The Uncertain Reasoner’s Companion: A Mathematical Per-
 [2] C. Beierle and G. Kern-Isberner, ‘On the modelling of an agent’s epis-               spective, Cambridge University Press, Cambridge, 1994.
     temic state and its dynamic changes’, Electronic Communications of the        [27]   J. Paris and A. Vencovsk, ‘In defense of the maximum entropy infer-
     European Association of Software Science and Technology, 12, (2008).                 ence process’, Intl. Journal of Approximate Reasoning, 17(1), 77–103,
 [3] C. Beierle and G. Kern-Isberner, ‘Towards an agent model for belief                  (1997).
     management’, in Advances in Multiagent Systems, Robotics and Cyber-           [28]   D. Poole, ‘Decision theory, the situation calculus and conditional
     netics: Theory and Practice. (Volume III), eds., G. Lasker and J. Pfalz-             plans’, Linköping Electronic Articles in Computer and Information Sci-
     graf, IIAS, Tecumseh, Canada, (2009).                                                ence, 8(3), (1998).
 [4] R. Bourne and S. Parsons, ‘Extending the maximum entropy approach             [29]   G. Rens, ‘On stochastic belief revision and update and their combina-
     to variable strength defaults’, Ann. Math. Artif. Intell., 39(1–2), 123–             tion’, in Proceedings of the Sixteenth Intl. Workshop on Non-Monotonic
     146, (2003).                                                                         Reasoning (NMR), eds., G. Kern-Isberner and R. Wassermann, pp. 123–
 [5] I. Csiszár, ‘I-divergence geometry of probability distributions and min-            132. Technical University of Dortmund, (2016).
     imization problems’, Annals of Probability, 3, 146–158, (1975).               [30]   G. Rens, T. Meyer, and G. Casini, ‘Revising incompletely specified
 [6] F. Dupin de Saint-Cyr and J. Lang, ‘Belief extrapolation (or how to                  convex probabilistic belief bases’, in Proceedings of the Sixteenth Intl.
     reason about observations and unpredicted change)’, Artif. Intell., 175,             Workshop on Non-Monotonic Reasoning (NMR), eds., G. Kern-Isberner
     (2011).                                                                              and R. Wassermann, pp. 133–142. Technical University of Dortmund,
 [7] D. Doder, Z. Marković, Z. Ognjanović, A. Perović, and M. Rašković,              (2016).
     A Probabilistic Temporal Logic That Can Model Reasoning about Evi-            [31]   G. Rens, T. Meyer, and G. Lakemeyer, ‘A modal logic for the decision-
     dence, 9–24, Lecture Notes in Computer Science, Springer Berlin Hei-                 theoretic projection problem’, in Proceedings of the Seventh Intl. Conf.
     delberg, Berlin, Heidelberg, 2010.                                                   on Agents and Artif. Intell. (ICAART), Revised Selected Papers, eds.,
 [8] D. Dubois and H. Prade, Focusing vs. belief revision: A fundamen-                    B. Duval, J. Van den Herik, S. Loiseau, and J. Filipe, LNAI, pp. 3–19.
     tal distinction when dealing with generic knowledge, 96–107, Springer                Springer Verlaag, (2015).
     Berlin Heidelberg, Berlin, Heidelberg, 1997.                                  [32]   J. Sack, ‘Extending probabilistic dynamic epistemic logic’, Synthese,
 [9] A. Ferrein, C. Fritz, and G. Lakemeyer, ‘On-line decision-theoretic                  169, 124–257, (2009).
     Golog for unpredictable domains’, in KI 2004: Advances in Artif. In-          [33]   S. Shapiro and M. Pagnucco, ‘Iterated belief change and exogenous
     tell., volume 238/2004, 322–336, Springer Verlag, Berlin / Heidelberg,               actions in the situation calculus’, in Proceedings of the Sixteenth Eu-
     (2004).                                                                              ropean Conf. on Artif. Intell. (ECAI-04), pp. 878–882, Amsterdam,
[10] D. Gabbay, I. Hodkinson, and M. Reynolds, Temporal Logic: Mathe-                     (2004). IOS Press.
     matical Foundations and Computational Aspects, volume 1, Clarendon            [34]   J. Shore and R. Johnson, ‘Axiomatic derivation of the principle of max-
     Press, Oxford, 1994.                                                                 imum entropy and the principle of minimum cross-entropy’, Informa-
[11] D. Gabbay, M. Reynolds, and M. Finger, Temporal Logic: Mathemat-                     tion Theory, IEEE Transactions on, 26(1), 26–37, (Jan 1980).
     ical Foundations and Computational Aspects, volume 2, Oxford Uni-             [35]   J. Van Benthem, J. Gerbrandy, and B. Kooi, ‘Dynamic update with
     versity Press, Oxford, 2000.                                                         probabilities’, Studia Logica, 93(1), 67–96, (2009).
[12] P. Gärdenfors, Knowledge in Flux: Modeling the Dynamics of Epis-             [36]   F. Voorbraak, ‘Partial Probability: Theory and Applications’, in Pro-
     temic States, MIT Press, Massachusetts/England, 1988.                                ceedings of the First Intl. Symposium on Imprecise Probabilities and
[13] P. Gärdenfors, Belief Revision, volume 29 of Cambridge Tracts in                    Their Applications, pp. 360–368, (1999).
     Theoretical Computer Science, Cambridge University Press, Mas-                [37]   F. Voorbraak, ‘Probabilistic belief change: Expansion, conditioning and
     sachusetts/England, 1992.                                                            constraining’, in Proceedings of the Fifteenth Conf. on Uncertainty in
[14] H. Geffner and B. Bonet, ‘High-level planning and control with incom-                Artif. Intell., UAI’99, pp. 655–662, San Francisco, CA, USA, (1999).
     plete information using POMDPs’, in Proceedings of the Fall AAAI                     Morgan Kaufmann Publishers Inc.
     Symposium on Cognitive Robotics, pp. 113–120, Seattle, WA, (1998).            [38]   A. Yue and W. Liu, ‘Revising imprecise probabilistic beliefs in the
     AAAI Press.                                                                          framework of probabilistic logic programming.’, in Proceedings of
[15] M. Goldszmidt and J. Pearl, ‘Qualitative probabilities for default rea-              the Twenty-third AAAI Conf. on Artif. Intell. (AAAI-08), pp. 590–596,
     soning, belief revision, and causal modeling’, Artificial Intelligence, 84,          (2008).
     57–112, (1996).
[16] A. Grove and J. Halpern, ‘Updating sets of probabilities’, in Proceed-
     ings of the Fourteenth Conf. on Uncertainty in Artif. Intell., UAI’98, pp.
     173–182, San Francisco, CA, USA, (1998). Morgan Kaufmann.
[17] S. Hansson, A textbook of belief dynamics: theory change and database
     updating, Kluwer Academic, Dortrecht, The Netherlands, 1999.
[18] E. Jaynes, ‘Where do we stand on maximum entropy?’, in The Maxi-
     mum Entropy Formalism, 15–118, MIT Press, (1978).
[19] H. Katsuno and A. Mendelzon, ‘On the difference between updating
     a knowledge base and revising it’, in Proceedings of the Second Intl.
     Conf. on Principles of Knowledge Representation and Reasoning, pp.
     387–394, (1991).
[20] G. Kern-Isberner, ‘Characterizing the principle of minimum cross-
     entropy within a conditional-logical framework’, Artif. Intell., 98(12),
     169 – 208, (1998).
[21] G. Kern-Isberner, ‘Linking iterated belief change operations to non-
     monotonic reasoning’, in Proceedings of the Eleventh Intl. Conf. on
     Principles of Knowledge Representation and Reasoning, pp. 166–176,
     Menlo Park, CA, (2008). AAAI Press.
[22] S. Kullback, Information theory and statistics, volume 1, Dover, New
     York, 2nd edn., 1968.
[23] W. Lovejoy, ‘A survey of algorithmic methods for partially observed
     Markov decision processes’, Annals of Operations Research, 28, 47–
     66, (1991).
[24] T. Lukasiewicz, ‘Nonmonotonic probabilistic logics under variable-
     strength inheritance with overriding: Complexity, algorithms, and im-