A Stochastic Belief Change Framework with an Observation Stream and Defaults as Expired Observations Gavin Rens1 Abstract. A framework for an agent to change its probabilistic be- decay of the truthfulness of individual observations, and how to inte- liefs after a stream of noisy observations is received is proposed. Ob- grate ‘expired’ and ‘prevalent’ observations into the agent’s beliefs. servations which are no longer relevant, become default assumptions More precisely, observations which are no longer immediately rel- until overridden by newer, more prevalent observations. A distinc- evant become default assumptions until overridden by newer, more tion is made between background and foreground beliefs. Agent ac- prevalent observations. Consider the following three scenarios. tions and environment events are distinguishable and form part of the agent model. It is left up to the agent designer to provide an en- Scenario 1 An acquaintance (Bianca) shows you the new, expen- vironment model; a submodel of the agent model. An example of an sive iSung-8 smartphone she bought, with a very particular cover environment model is provided in the paper, and an example scenario design. A year later, you visit Bianca at her home, where only her is based on it. Given the particular form of the agent model, several teenage son lives in addition. You see two phones on the kitchen ‘patterns of cognition’ can be identified. An argument is made for counter, an iSung-8 with the particular cover design you remember four particular patterns. from a year ago and an iSung-9. Is the likelihood greater that the iSung-8 or the iSung-9 is Bianca’s? 1 MOTIVATION Scenario 2 You are at the airport in a city away from home and My intention with this research is to design a framework with which you expect to land in your home city (Cape Town) in three hours’ an agent can deal with uncertainty about its observations and actions, time. You hear someone waiting for the same flight say ‘It is raining and refresh its beliefs in a relatively sophisticated way. in Cape Town’. Is the likelihood less that it will still be raining if Partially observable Markov decision processes (POMDPs) [1, 25, your flight is delayed by 3 hours than if the flight was not delayed? 23] are adequate for many stochastic domains, and they have the supporting theory to update agents’ belief states due to a chang- Scenario 3 Your neighbour tells you he needs to visit the dentist ing world. But POMDPs are lacking in two aspects with respect urgently. You know that he uses the dentist at the Wonder-mall. A to intelligent agents, namely, (i) the ability to maintain and rea- month later, you see your neighbour at the Wonder-mall. Is he there son with background knowledge (besides the models inherent in to see the dentist? POMDP structures) and (ii) the theory to revise beliefs due to in- formation acquisition. Traditionally, belief update consists of bring- What these three scenarios have in common is that the answers to ing an agent’s knowledge base up to date when the world described the questions make use of the persistence of truth of certain pieces of by the knowledge base changes, that is, it is a ‘change-recording’ information. After a period has elapsed, the veracity of some kinds of operation, whereas belief revision is used when an agent obtains new information dissipates. For instance, in Scenario 1, one might attach information about a static world, that is, it is a ‘knowledge-changing’ an ‘expiry date’ to the information that the particular iSung-8 phone operation [19]. I shall use the generic term belief change to including is Bianca’s. So, by the time you visit her, the truth of that information belief update and belief revision. is much weaker, in fact, it has become defeasible by then. Hence, A different perspective on the proposed framework is from the area we may easily argue that Bianca gave her old phone to her son and of classical belief change [12, 13, 17]. The belief change commu- she bought the iSung-9 for herself. However, if you had visited her nity has not given much attention to dealing with uncertainty, espe- one month after she showed you her new iSung-8 and you saw it cially not stochastic uncertainty. Hence, integrating POMDP theory together with the newer iSung-9 on the counter, you would probably into belief change methods could be beneficial. And besides bring- rather assume that the iSung-9 was Bianca’s son’s. In Scenario 2, ing probability theory, POMDPs also bring decision theory, that is, you could expect it to be raining in Cape Town when you get there in the theory for reasoning about actions and their utility. three hours (because spells of rain usually last for four hours in Cape However, I go one or two steps farther than a straightforward in- Town), but if your flight is delayed, there will be no rain or only tegration of POMDPs and belief change theory. The framework also drizzle when you land. The information ‘It is raining in Cape Town’ includes the notion of a stream of observations, the modeling of the has a lifespan of four hours. With respect to Scenario 3, one would 1 Centre for Artificial Intelligence Research, University of KwaZulu-Natal, expect a person who says they must visit the dentist urgently to visit School of Mathematics, Statistics and Computer Science and CSIR Meraka, the dentist within approximately seven days. So your neighbour is South Africa, email: gavinrens@gmail.com probably not at Wonder-mall to see the dentist. Hence ‘Neighbour must visit dentist’ should be true for no longer than seven days, after Let a belief state b be a probability distribution P over all the worlds which, the statement becomes defeasibly true. in W . That is, b P: W → [0, 1], such that w∈W b(w) = 1. For all In this paper, I attempt to formalise some of these ideas. Several ψ ∈ L, b(ψ) := w∈W,w ψ b(w). simplification are made; two main simplifications are (i) all informa- Let Lpc be some probability constraint language which has atoms tion packets (evidence/observations) have a meaningful period for constraining the probability of some propositional sentence being which they can be thought of as certainly true and (ii) the transition true, and contains all formulae which can be formed with the atoms of a piece of information from certainly true to defeasibly true is im- in combination with logical connectives. If C ⊂ Lpc is a set of mediate. Consider the following pieces of information. formulae, then b satisfies C (denoted b C) iff ∀β ∈ C, b sat- isfies the constraints posed by β. I denote Π as the set of all be- 1. Bianca is an acquaintance of mine. lief states over the set of possible worlds W . Let ΠC be the set of 2. It will rain in Cape Town this week. belief states which satisfy the probability constraints in C. That is, 3. My dentist is Dr. Oosthuizen. ΠC := {c ∈ Π | c C}. Later, the notion of the theory of a set of belief states will be useful: The first statement is problematic because, for instance, Bianca might gradually become a friend. With respect to the second state- Th(ΠC ) := {ψ ∈ Lpc | b ∈ ΠC , b ψ}. ment, it is easy to set the ‘expiry period’ to coincide with the end of I build on Voorbraak’s [36] partial probability theory (PPT), which the week. One might feel that it is difficult to assign a truth period to allows probability assignments to be partially determined, and where ‘My dentist is Dr. Oosthuizen’ due to lack of information. A person there is a distinction between probabilistic information based on (i) typically does not have the same dentist life-long, but one can usually hard background evidence and (ii) some assumptions. An epistemic not predict accurately when one will get a new dentist. On the other state in PPT is defined as the quadruple hΩ, B, A, Ci, where Ω is a hand, if, for instance, one knows exactly when one is moving to a sample space, B ⊂ Lpc is a set of probability constraints, A ⊂ Lpc new city, then one can give a meaningful truth period, and the transi- is a sets of assumptions and C ⊆ W “represents specific information tion of the piece of information from certainly true to defeasibly true concerning the case at hand” (an observation or evidence). is immediate. Voorbraak mentions that he will only consider conditioning where Many of these issues are studied in temporal logics [10, 11]. The the evidence does not contradict the current beliefs. He defines the set focus of the present work, however, is more on belief change with a of belief states corresponding to the conditionalized PPT epistemic simple temporal aspect (and the integration of POMDP theory). One state as {b(· | C) ∈ Π | b ∈ ΠB∪A , b(C) > 0}. may do well in future research to attempt combining the results of Voorbraak proposes constraining as an alternative to conditioning: the present work with established work on temporal logics. One pa- Let ψ ∈ Lpc be a probability constraint. Then, constraining ΠB on per particularly relevant to the present work presents a probabilistic ψ produces ΠB∪{ψ} . Note that expanding a belief set reduces the temporal logic capable of modeling reasoning about evidence [7]. number of models (worlds) and expanding a PPT epistemic state with Expired observations are continually aggregated into the agent’s extra constraints also reduces the number of models (belief states / set of default assumptions. Prevalent (unexpired) observations re- probability functions). main in the agent’s ‘current memory stream’. Whenever the agent wants to perform some reasoning task, it combines the prevalent ob- In the context of belief sets, it is possible to obtain any [... epis- servations with its fixed beliefs, then modifies its changed beliefs temic] state from the ignorant [... epistemic] state by a series of with respect to its default assumptions, and reasons with respect to expansions. In PPT, constraining, but not conditioning, has the this final set of beliefs. However, the agent always reverts back to the analogous property. This is one of the main reasons we prefer original fixed beliefs (hence, “fixed”). The default assumptions keep constraining and not conditioning to be the probabilistic ver- changing as memories fade, that is, as observations expire. sion of expansion. [36, p. 4] The rest of this paper unfolds as follows. In the next section, I re- Voorbraak provides the following example [36]. view the three formalisms on which the framework is mainly based, namely, partial probability theory, POMDPs and the hybrid stochas- Example 1. Consider a robot which has to recharge his battery. This tic belief change framework. Section 3 presents the formal definition can be done in two rooms, let us call them room 1 and 2. The rooms of the framework and Section 4 explains how the framework com- are equally far away from the robot. An example of generic informa- ponents interact and change when used for reasoning. A discussion tion might be: “the door of room 1 is at least 40% of the time open”. about the possible patterns of cognition within the framework is pre- Suppose there is no other information available, and let pi denote sented in Section 5. Then Section 6 provides an extensive example, the probability of door i being open. Then B = {p1 ≥ 0.4}. Since showing some of the computations which would be required in prac- doors are typically sometimes open and sometimes closed, it might tice. The paper ends with some final remarks and pointers to related be reasonable to include 0 < pi < 1 in A. However, such addi- work. tional assumptions should be invoked only when they are necessary, for example, in case no reasonable decision can be made without as- sumptions. A good example of specific evidence is information about 2 FORMAL FOUNDATIONS the state of the doors obtained by the sensors of the robot. Let L be a finite classical propositional language. A world is a log- The word assumption is not very informative: An assumption may ical model which evaluates every propositional variable to true or be highly entrenched (indefeasible), for instance, about the laws of false, and by extension, evaluates every propositional sentence in L physics, or an assumption may be very tentative (defeasible) like to true or false. Given n propositional atoms, there are 2n conceiv- hearing gossip about some character trait of a new staff member. able worlds. Let W be a set of possible worlds – a subset of the However, implicit in the word default, is the notion of defeasibil- conceivable worlds. The fact that w ∈ W satisfied ψ ∈ L (or is a ity; default information is information which holds until stronger ev- model for ψ) is denoted by w ψ. idence defeats it. I shall refer to the set B as background knowledge and assume it to be indefeasible, and the set F as foreground knowl- models the probability of a transition to world w0 , given the occur- edge and assume it to be defeasible. Hence, referring to Voorbraak’s rence of event e in world w, an event likelihood function where Example 1, p1 ≥ 0.4 would be in F and 0 < pi < 1 would be in E(e, w) = P (e | w) is the probability of the occurrence of event B. Indeed, in PPT, “it is intended to be understood that the conclu- e in w, an observation function where O(z, w) models the proba- sions warranted by [A ∪ B] depend on the assumptions represented bility of observing z in w, and where Str (z, w) is the agent’s ontic in A,” [37]. I interpret this to mean that background knowledge (A) strength for z perceived in w. dominates foreground knowledge (B). I proposed a way of trading off the probabilistic update and prob- It is, however, conceivable that background knowledge should be abilistic revision, using the notion of ontic strength. The argument is defeasible and that new (foreground) evidence should weigh stronger that an agent could reason with a range of degrees for information be- due its recency and applicability. In the proposed framework, a com- ing ontic (the effect of a physical action or occurrence) or epistemic promise is attempted: background knowledge is ‘dominated’ by new (purely informative). It is assumed that the higher the informations evidence at the time of reasoning, but after reasoning, the new evi- degree of being ontic, the lower the epistemic status of that informa- dence is ‘forgotten’. However, evidence is not completely forgotten: tion. “An agent has a certain sense of the degree to which a piece of as it becomes less applicable after some time, it gets assimilated into received information is due to a physical action or event in the world. the foreground knowledge as default information. This sense may come about due to a combination of sensor readings I also use elements of partially observable Markov decision pro- and reasoning. If the agent performs an action and a change in the cess (POMDP) theory [1, 25, 23]. In a POMDP, the agent can only local environment matches the expected effect of the action, it can be predict with a likelihood in which state it will end up after perform- quite certain that the effect is ontic information,” [29, p. 129]. ing an action. And due to imperfect sensors, an agent must maintain The hybrid stochastic change of belief state b due to new informa- a probability distribution over the set of possible states. tion z with ontic strength (denoted b  z) is defined as [29] Formally, a POMDP is a tuple hS, A, T, R, Ω, Oi with a finite n set of states S = {s1 , s2 , . . . , sn }, a finite set of actions A = b  z := (w, p) | w ∈ W, p = {a1 , a2 , . . . , ak }, the state-transition function, where T (s, a, s0 ) is 1 o the probability of being in s0 after performing action a in state s, the (1 − Str (z, w))b∗z (w) + Str (z, w)bz (w) , γ reward function, where R(a, s) is the reward gained for executing a while in state s, a finite set of observations Ω = {z1 , z2 , . . . , zm }; where ∗ is some probabilistic belief revision operator,  is some prob- and the observation function, where O(a, z, s0 ) is the probability of abilistic belief update operator and γ is a normalizing factor so that observing z in state s0 resulting from performing action a in some  P w∈W z (w) = 1. b other state. An initial belief state b0 over all states in S is assumed The principle of maximum entropy [18, 27, 34, 26, 20] says that it given. is reasonable to represent a set of belief states ΠC by the member of To update the agent’s beliefs about the world, a state estimation ΠC which is most entropic or least biased (w.r.t. information theory) function SE (b, a, z) = bSE a,z is defined as of all the members of ΠC : O(a, z, s0 ) s∈S T (s, a, s0 )b(s) ME (ΠC ) := arg max H(c), P SE 0 ba,z (s ) = , c∈ΠC P r(z | a, b) P where H(c) := − w∈W c(w) ln c(w). where a is an action performed in ‘current’ belief-state b, z is the 0 The principle of minimum cross-entropy [22, 5] is used to select resultant observation and bSE a,z (s ) denotes the probability of the agent the belief state c ∈ ΠY ‘most similar’ to a given belief state b ∈ ΠX ; being in state s0 in ‘new’ belief-state bSE a,z . Note that P r(z | a, b) is a the principle minimizes the directed divergence between b and c with normalizing constant. respect to entropy. Directed divergence is defined as Let the planning horizon h (also called the look-ahead depth) be the number of future steps the agent plans ahead each time it selects X c(w) its next action. V ∗ (b, h) is the optimal value of future courses of R(c, b) := c(w) ln . w∈W b(w) actions the agent can take with respect to a finite horizon h starting in belief-state b. This function assumes that at each step, the action R(c, b) is undefined when b(w) = 0 while c(w) > 0; when c(w) = which will maximize the state’s value will be selected. V ∗ (b, h) is 0, R(c, b) = 0, because limx→0 ln(x) = 0. defined as The knowledge and reasoning structure proposed in this paper is h X i presented next. It builds on the work of Voorbraak [36] and Rens [29] max ρ(a, b) + γ P r(z | a, b)V ∗ (SE (b, a, z), h − 1) , and includes elements of POMDP theory. a∈A z∈Ω P where ρ(a, b) is defined as s∈S R(a, s)b(s), 0 < γ ≤ 1 is a fac- 3 FRAMEWORK DEFINITION tor to discount the value of future rewards and P r(z | a, b) de- notes the probability of reaching belief-state bSE Let Lprob be a probabilistic language over L defined as Lprob := a,z = SE (b, a, z). While V ∗ denotes the optimal state value, function Q ∗ {φ[`, u] | φ ∈ L, `, u ∈ [0, 1], ` ≤ u}. A sentence of the form ∗ P denotes the φ[`, u] means the likelihood of proposition φ is greater than or equal optimal action value: Q (a, b, h) = ρ(a, b) + γ z∈Z P r(z | a, b)V ∗ (SE (b, a, z), h − 1) is the value of executing a in the cur- to ` and less than or equal to u. Let N = {0, 1, 2, . . .}. rent belief-state, plus the total expected value of belief-states reached b satisfies formula φ[`, u] (denoted b φ[`, u]) iff ` ≤ b(φ) ≤ u. thereafter. Definition 1. An agent maintains a structure I also build on my stochastic belief change model [29], which is hW, B, F, A, Evt, Z, Prs, Eng, Mi, where a structure hW, Evt, T, E, O, Str i, with a set of possible worlds W , a set of events Evt, an event-transition function where T (w, e, w0 ) • W is a set of possible worlds; • B ⊂ Lprob is a background belief base of fixed assumptions; [29] defines the (exogenous) update of b with z as • F ⊂ Lprob is a foreground belief base of default assumptions; bz := (w, p) | w ∈ W, p =  • A is a set of (agent) actions, including a special action null ; • Evt is a set of (environment) events; 1 X X O(z, w) E(e, w0 )T (w0 , e, w)b(w0 ) , • Z is the observation stream, a set of observation triples: Z := γ 0 e∈Evt w ∈W {(a1 , t1 , z1 ), (a2 , t2 , z2 ), . . . , (ak , tk , zk )}, where ai ∈ A, ti ∈ N, zi ∈ L, and such that ∀ti , tj ∈ N, i = j iff ti = tj (i.e., no where γ is a normalizing factor and O(z, w) can be interpreted as more than one action and observation occur at a time-point); O(z, w, null ). But HSBC does not involve agent actions; HSBC as- • Prs : L × W → N is a persistence function, where Prs(z, w) sumes that agents passively receive information. Hence, when an indicates how long z is expected to be true from the time it is agent is assumed to act and the actions are known, the POMDP state received, given the ‘context’ of w; it is a total function over L × estimation function can be employed for belief state update. W; Only for the propose of illustrating how the framework can be • Eng : L × W × A → [0, 1], where Eng(z, w, a) is the agent’s used, the following belief change procedure is defined. confidence that z perceived in w was caused by action a (i.e., that b a,z :={(w, p) | w ∈ W, p = z has an endogenous source); • M is a model of the environment, and any auxiliary information Exg(z, w, a)bz (w) + Eng(z, w, a)bSE a,z (w)}, (1) required by the definition of the particular belief change operation (). where Exg(z, w, a) := 1 − Eng(z, w, a) is the confidence that z is exogenous in w, given a was executed. Definition 2. The expected persistence of z perceived in belief state b is 4 OPERATIONAL SEMANTICS X ExpPrs(z, b) := Prs(z, w) × b(w). The agent is always living at time-point N . Time units remain the w∈W same and must be chosen to suit the domain of interest, for instance, But the agent will reason with respect to a set of belief states milliseconds, minutes, hours, etc. The first point in time is N = 0. N ΠC , hence, ExpPrs(z, ΠC ) must be defined. One such definition indicated the end of the N -th time unit, in other words, at point N , employs the principle of maximum entropy: exactly N × u time has passed, where u is the time unit employed. It will be assumed that no observation can be made at time-point Definition 3. 0. If the agent designer feels that it is unreasonable for the agent to X have to wait until N = 1 before the first observation may be made, ExpPrs ME (z, ΠC ) := Prs(z, w) × bME (w), then the time unit chosen for the particular domain is too large. w∈W Expired observations are continually aggregated into the agent’s set of default assumptions F (foreground beliefs). Prevalent (unex- where bME = ME (ΠC ). pired) observations remain in the agent’s ‘current memory stream’ Z. Whenever the agent wants to perform some reasoning task, (i) it com- Definition 4. An observation triple (a, i, z) ∈ Z, has expired at bines the prevalent observations with its fixed beliefs B, then modi- point s if ExpPrs(z, b) < s − i. fies its changed beliefs with respect to its default assumptions, or (ii) modifies its fixed beliefs B with respect to its default assumptions, Let b a,z be the change of belief state b by a and z. My intention is then combines the prevalent observations with its changed beliefs – that “change” is a neutral term, not necessarily indicating revision or and then reasons with respect to this final set of beliefs. However, update. Next, I propose one instantiation of ba,z . Let the environment the agent always reverts back to the original fixed beliefs (hence, model M = hE, T, Oi, where “fixed”). And the default assumptions keep changing as memories • E : Evt ×W → [0, 1] is the event function. E(e, w) = P (e | w), fade, that is, as observations expire. the probability of the occurrence of event e in w; Because any initial conditions specified are, generally, not ex- • T : W ×(A∪Evt)×W → [0, 1] is a transition function such that pected to hold after the initial time-point, it does not make sense for every æ ∈ A ∪ Evt and w ∈ W , w0 ∈W T (w, æ, w0 ) = 1, P to place them in B. The only other option is to place them in F , where T (w, æ, w0 ) models the probability of a transition to world which is allowed to change. The following guiding principle is thus w0 , given the execution of action / occurrence of event æ in world provided to agent designers. w; • O : W × W × A → [0, 1] is an observation function such The agent’s initial beliefs (i.e., the system conditions at point that for every w ∈ W and a ∈ A, wz ∈W O(wz , w, a) = P N = 0) must be specified in F . 1, where O(wz , w, a) models the probability of observing φz (a complete theory for wz ) in w and where O(z, w, a) := z Let C ⊂ Lprob be a belief base. At this early stage of research, P wz ∈W O(w , w, a), for all z ∈ L. wz z I shall suggest only three definitions of  on a set of belief states. The first is the most naı̈ve approach (denoted NV ). It is suitable for Observation z may be due to an exogenous event (originating and theoretical investigations. produced outside the agent) or an endogenous action (originating and produced within the agent). It is up to the agent designer to decide, ΠC NV a, z := {b C a,z | b ∈ Π }. for each observation, whether it is exogenous or endogenous, given the action and world. A more practical approach is to reduce ΠC to a representative The hybrid stochastic belief change (HSBC) formalism of Rens belief state, employing the principle of maximum entropy (denoted ME ). Closest MCE chooses the belief state b ∈ ΠX for which the di- Π C ME a, z := {b C a,z | b = ME (Π )}. rected divergence from belief state c ∈ ΠY with respect to entropy A third and final approach which I shall mention here is, in a is least (considering all c ∈ ΠY ). This is an instance of the minimum sense, a compromise between the naı̈ve and maximum entropy ap- cross-entropy inference. proaches. It is actually a family of methods which will thus not be From here onwards, I shall not analyse their definitions; defined precisely. It is the approach (denoted FS ) which finds a fi- Closest(·) will be used to refer to the abstract function. nite (preferably, relatively small) proper subset ΠFS of ΠC which is Reasoning is done with respect to the set of belief states ΠN . I somehow representative of ΠC , and then applies  to the individual look at two patterns of cognition to determine ΠN . belief states in ΠFS : Definition 6. Π C FS a, z := {b a,z | b ∈ Π FS ⊂ ΠC }. (ΠB set Z) ∩ ΠF set if (Π set Z) ∩ Πset 6= ∅ B F  (ΠN )1 := {Closest(Π set Z, Πset )} otherwise, B F Only ME and FS will be considered in the sequel, when it comes to belief change over a set. Let set denote one or of these two operators. Then ΠC set σ can be defined, (ΠB ∩ ΠFset ) set Z if Π ∩ Πset 6= ∅ B F  where σ is any stream of observation triples, where σ = (ΠN )2 := {(a1 , t1 , z1 ), (a2 , t2 , z2 ), . . . , (ak , tk , zk )} and t1 < t2 < · · · < {{Closest(Π , Πset )} set Z} otherwise. B F tk : The idea behind the definition of (ΠN )1 is that (pertinent) ob- ΠC ← ΠC set a1 , z1 servations in the stream (Z) dominate background beliefs (B), but Z-modified beliefs (ΠB set Z) dominate foreground beliefs (F ). and then ΠC ← ΠC set a2 , z2 . . . The idea behind the definition of (ΠN )2 is that foreground beliefs and then ΠC ← ΠC set ak , zk . (F ) have slightly higher status than in (ΠN )1 because they modify background beliefs (when consistent with B) before (pertinent) ob- Let Expired be derived from the triples in Z which have just ex- servations in the stream (Z), but the stream finally dominates. pired (at point N ). That is, Expired := {(a, i, z) ∈ Z | ExpPrs(z, ΠN −1 ) < N − i}, 5 PATTERNS OF COGNITION where ΠN is defined below. At each time-point, F is refreshed with In this section, I shall explore the properties of ‘patterns of cogni- all the expired observation triples in the order they appear in Z. In tion’ based on (ΠN )1 and (ΠN )2 . I shall argue that there are four other words, at each point in time, F ← Th(ΠF set Expired ). reasonable candidates. First, a few axioms: As soon as the foreground has been refreshed, the expired triples are removed from the observation stream: Z ← Z \ Expired . I shall use 1. {b} ∩ ΠC = {b} or ∅. the notation ΠF 2. {b} ∩ {c} 6= ∅ ⇐⇒ b = c. set to clarify which operation was used to arrive at the current ΠF . 3. ΠC ME Z always results in a singleton set. A function which selects a belief state in one set which is in some 4. Closest({b}, ΠC ) = b. sense closest to another set of belief states will shortly be required. And one guiding principle: The following definition suggests three such functions. Definition 5. In a given pattern of cognition, set is instantiated to the Closest absolute X Y (Π , Π ) := arg min X (b(w) − c(w)) 2 same operator when it appears in the same relative position. b: b∈ΠX ,c∈ΠY w∈W or For instance, in (ΠN )1 , X Closest ME (ΠX , ΠY ) := arg min (b(w) − ME (ΠY )(w))2 , (ΠB ME Z) ∩ ΠF FS if (Π ME Z) ∩ ΠFS 6= ∅ B F b∈ΠX w∈W {Closest(Π ME Z, ΠFS )} otherwise, B F or Closest MCE (ΠX , ΠY ) := arg min R(c, b), is allowed, but not b: b∈ΠX ,c∈ΠY (ΠB ME Z) ∩ ΠF FS if (Π ME Z) ∩ ΠFS 6= ∅ B F where X, Y ∈ Lprob , ME (ΠY ) is the most entropic/least biased {Closest(Π FS Z, ΠFS )} otherwise, B F belief state in ΠY and R(c, b) is the directed divergence of c with respect to b. Or in (ΠN )2 , Closest absolute simply chooses the belief state b ∈ ΠX which (ΠB ∩ ΠFFS ) FS Z if Π ∩ ΠFS 6= ∅ B F minimizes the sum of the differences between probabilities of worlds {Closest(Π , ΠFS )} FS Z otherwise. B F of b and some belief state in c ∈ ΠY (considering all c ∈ ΠY ). Closest ME picks the belief state cME ∈ ΠY with maximum en- is allowed, but not tropy (i.e., least biased w.r.t. information in ΠY ), and then chooses (ΠB ∩ ΠFME ) FS Z if Π ∩ ΠFS 6= ∅ B F the belief state b ∈ ΠX which minimizes the sum of the differences between probabilities of worlds of b and cME . {Closest(Π , ΠFS )} FS Z otherwise. B F Given these axioms and the guiding principle, four reasonable can- • Eng(z, w, null) = 0, ∀z ∈ Ω, ∀w ∈ W (Observations after null didates are actions are never endogenous). (ΠB FS Z) ∩ ΠF • Eng(eo, w, evict) = 0, ∀w ∈ W (Orcs in the forest should FS if (Π FS Z) ∩ ΠFS 6= ∅ B F  (ΠN )11 := never be observed due to eviction). {Closest(Π FS Z, ΠFS )} otherwise, B F • Eng(eō, w, evict) = 0.75 if w ¯ = 0.5 if w `, ` (One can be more confident that the orcs are not in the forest due to the (ΠN )12 := {Closest(ΠB FS Z, ΠF set )}. eviction, when it is dark). • Eng(`, w, evict) = 0, ∀w ∈ W (Lightness is independent of (ΠB ∩ ΠFFS ) set Z if Π ∩ ΠFS 6= ∅ B F eviction).  (ΠN )21 := {Closest(Π , ΠFS )} set Z otherwise. B F I am not entirely comfortable with this model of endogeny; see the concluding section for a short discussion. set )  Z}. (ΠN )22 := {Closest(ΠB , ΠF Let Note that in the definitions of (ΠN )12 , (ΠN )21 and (ΠN )22 , set • Prs(eo) = {(`, 2), (`,¯ 4)} (Once the orcs are observed in the for- may be instantiated as either of the two operators, and in (ΠN )22 ,  est, one can rely on them remaining there for at least two hours is actually the ‘plain’ belief change operator. when light and four hours when dark). I now justify each of the four patterns of cognition. • Prs(eō) = {(>, 1)} (One can rely on the orcs remaining outside 11 If (ΠB FS Z) were (ΠB ME Z), by axiom 3, the result is a the forest for one hour once one finds out that they are outside). singleton set, and by axioms 1 and 4, the information in ΠF is • Prs(`) = {(>, 5)} (One can rely on light for five hours once light ignored, including initial conditions or their later effects. If ΠF is perceived; a storm may darken things within five ours). FS ¯ = {(>, 3)} (There might be a daytime storm, which • Prs(`) were ΠF ME , by axiom 1, the information in (Π set Z) is ig- B nored, or worse, the intersection is empty. may clear up within three hours from when it stared and was per- 12 With this pattern, the issues of intersection (axioms 1 and 2) need ceived). not be dealt with. If ΠB FS Z were ΠB ME Z, by axiom 3, the To define the belief change operator  as in (1), M = hE, T, Oi result is a singleton set, and by axiom 4, the information in ΠF is must be defined as follows. ignored. Whether ΠF F F set is instantiated as ΠME or ΠFS , the set still has an influence on Π FS Z due to the latter typically not B • If w `, being a singleton. – E(rise, w) = 0, 21 If (ΠB ∩ ΠF B F FS ) were (Π ∩ ΠME ), by axioms 1, the informa- – E(set, w) = 1/12, tion in Π is ignored. Whether set Z is instantiated as ME Z or B FS Z, the result will typically not be trivial – all information is – E(storm, w) = 3/12. taken into account, although in different ways. – E(null, w) = 8/12. 22 Either of the two instantiations of ΠF set accommodate the infor- mation in ΠB and in ΠF . Moreover, the issues of intersection • If w ¬`, (axioms 1 and 2) need not be dealt with, and due to the simpler – E(rise, w) = 1/12, pattern, the second belief change operation can also be simplified, – E(set, w) = 0, that is, in this pattern, ME Z and FS Z both reduce to Z. – E(storm, w) = 3/12. 6 AN EXAMPLE – E(null, w) = 8/12. • T (w, null, w) = 1 ∀w ∈ W (for the action and event). To keep things as simple as possible for this introductory report, I • T (w, evict, w0 ) = 0 if w ¬e ∨ ¬o, else if w `, shall illustrate (ΠN )22 instantiated as – = 0.1 if w0 e ∧ o ∧ `, ME )  Z. Closest absolute (ΠB , ΠF 0 – = 0.9 if w e ∧ ¬o ∧ `, Consider the following scenario. The elves live in the forest. Sometimes, orcs (monsters) wander into the forest to hunt or seek else if w ¬`, shelter from a storm. Elves don’t like orcs. The elves can either do – = 0.1 if w0 e ∧ o ∧ ¬`, nothing (action null) or they can evict the orcs (action evict). Orcs – = 0.9 if w 0 e ∧ ¬o ∧ ¬`. tend to stay in the forest for more hours the darker it is. The sun rises (event rise) and sets (event set) once a day (duh). Now and then, • T (w, rise, w0 ) = 1 if w ¬` and w0 ` and truth values of e there is a storm (event storm), but most of the time, nothing happens and o are invariant. (event null). The vocabulary will be {e, o, `} meaning, respectively, • T (w, set, w0 ) = 1 if w ` and w0 ¬` and truth values of e the elves are in the forest, the orcs are in the forest, and it is light (it and o are invariant. is daytime and there isn’t a storm). • T (w, storm, w0 ) = 1 if w0 ¬` and truth values of e and o are Henceforth, I might write αβ instead of α∧β, and ᾱ instead of ¬α. invariant. For ease of reading, the possible worlds will be ordered, and a belief The probabilities for O(z, w, null) are as follows. state {(eo`, p1 ), (eo`, ¯ p2 ), (eō`, p3 ), (eō`, ¯ p4 ), (ēo`, p5 ), (ēo`, ¯ p6 ), XX w eo` eo`¯ eō` eō`¯ ēo` ēo`¯ ēō` ēō`¯ ¯ p8 )} will be abbreviated as hp1 , p2 , p3 , p4 , p5 , p6 , (ēō`, p7 ), (ēō`, z XX eo 0.8 0.8 0.1 0.1 0.1 0.1 0 0 p7 , p8 i. eō 0.1 0.1 0.8 0.8 0 0 0.1 0.1 Only observations eo, eō, `, `¯ are considered. ` 1 0 1 0 1 0 1 0 Let `¯ 0 1 0 1 0 1 0 1 The probabilities for O(z, w, evict) are as follows. It turns out that γ = 0.968, resulting in 0.56/0.968 = 0.579. XXXw eo` eo`¯ eō` eō`¯ ēo` ēo`¯ ēō` ēō`¯ The agent believes to a high degree what it perceived (eo). The z X eo 0.8 0.8 0 0 0 0 0 0 reason why it believes to a higher degree that it is dark than light, is eō 0.2 0.2 0.8 0.8 0 0 0.2 0.2 due to the relatively high chance of a storm (which darkens things) ` 1 0 1 0 1 0 1 0 and a small chance of the sun setting. This scenario is a bit synthetic: `¯ 0 1 0 1 0 1 0 1 the agent has not yet perceived that it is light. In a realistic situa- Initially, the foreground belief base will specify the initial condi- tion, the agent will always sense the brightness of the environment, tion, disallowing a high degree of belief in darkness. F = {`[1, 1], eo[.7, .9]}, At N = 5, PExpired = {(null, 1, eo)} because that is, it is completely light and there is a 70% − 90% belief that ExpPrs(eo, b1 ) = w∈W Prs(eo, w) × b1 (w) = 3.2 < N − 1 = 4. PAnd (evict, 4, eō) 6∈ Expired because the elves and orcs are in the forest. And the background belief base ExpPrs(eō, b1 ) = w∈W Prs(eō, w) × b1 (w) = 1 6< N − 4 = 1. demands that the probability that elves are outside the forest while orcs are inside is never more than 10%, Therefore, (Π5 )22 = Closest absolute (ΠB , ΠFME )  {(evict, 4, eō)} = Closest absolute (ΠB , {b1  null, eo})  B = {ēo[0, 0.1]}. {(evict, 4, eō)} = Closest absolute (ΠB , {b2 })  {(evict, 4, eō)}. b2 ∈ ΠB , hence, Closest absolute (ΠB , {b2 }) = b2 . (Π5 )22 is thus Let us start by analyzing the effect of observation stream b2  evict, eō. Z = {(null, 1, eo), (evict, 4, eō)} Recall that Eng(eō, w, evict) equals 0.75 if w ¯ but 0.5 if `, w `. And recall that at different time-points. At N = 1, (Π1 )22 = Closest absolute (ΠB , ΠF ME )  b  SE a,z (w) = Exg(z, w, a)bz (w) + Eng(z, w, a)ba,z (w). {(null, 1, eo)} (given Prs(eo) = {(`, 2), (`, ¯ 4)}, (null, 1, eo) has not yet expired). ΠF F ME = ME (Π ) = h.7, 0, .1, 0, .1, 0, .1, 0i. So, for instance, B F Due to Π and ΠME being mutually consistent, Closest absolute (ΠB , ΠFME ) is simply h.7, 0, .1, 0, .1, 0, .1, 0i ∈ (b2 ) ¯ evict,eō (eo`) ΠB . Denote h.7, 0, .1, 0, .1, 0, .1, 0i as b1 . ¯ evict)(b2 )eō (eo`) = Exg(eō, eo`, ¯ Eng(eo, w, null) = 0 for all w ∈ W . Therefore, b1  ¯ evict)bevict,eō (eo`) SE ¯ + Eng(eō, eo`, {(null, 1, eo)} = b1  null, eo = (b1 )eo , which was calculated to be h.386, .579, .007, .01, .007, .01, 0, 0i (denoted b2 henceforth). = 0.25(b2 )eō (eo`) ¯ + 0.75(b2 )SE ¯ evict,eō (eo`) I shall only show how (b1 )eo (eo`)¯ = 0.579 is calculated. Note that, due to the transition functions for the four events being invariant with respect to e and o, the only departure worlds we need to consider are At N = 6, Expired = {(null, 1, eo), (evict, 4, eō)} (abusing notion) eo` and eo`: ¯ (b1 )eo (eo`) ¯ = because (null, 1, eo) had already expired at N = 5, and 1 X X ExpPrs(eō, (b2 )evict,eō ) = 1 < N − 4 = 2. O(eo, eo`,¯ null) b1 (w0 )E(e, w0 )T (w0 , e, eo`) ¯ ME )  {} = Therefore, (Π6 )22 = Closest absolute (ΠB , ΠF γ w0 ∈W e∈Evt Closest absolute (ΠB , {(b2 )evict,eō }). 1 X = (0.8) b1 (w0 ) × [E(null, w0 )T (w0 , null, eo`) ¯ γ 0 w ∈W 7 CONCLUDING REMARKS + E(rise, w0 )T (w0 , rise, eo`) ¯ 0 0 ¯ I believe that it is important to have the facility to reason about both + E(set, w )T (w , set, eo`) (exogenous) events and (endogenous) actions; I am definitely not the + E(storm, w0 )T (w0 , storm, eo`)] ¯ first to propose a framework with both notions [28, 33, 9, 6, e.g.]. The framework also has two belief bases, one to represent fixed, background beliefs and one to accommodate defeasible information 1 (observations which have become ‘stale’, but not necessarily false). ¯  = (0.8) b1 (eo`)[E(null, eo`)T (eo`, null, eo`) Inherent to the framework is that the agent’s knowledge may be in- γ ¯ complete. There is much work on dealing with ignorance or missing + E(rise, eo`)T (eo`, rise, eo`) information [14, 16, 37, 38, 21, 30, e.g.]. ¯ + E(set, eo`)T (eo`, set, eo`) What makes the proposed framework potentially significant is its ¯ + E(storm, eo`)T (eo`, storm, eo`)] generality, applicable to many domains and agent-designer require- ¯ ¯ ¯ ¯ + b1 (eo`)[E(null, eo`)T (eo`, null, eo`) ments. I want to amplify the point that the belief change operations used in this paper are only suggestions and used due to personal fa- ¯ (eo`, + E(rise, eo`)T ¯ rise, eo`) ¯ miliarity with them – the researcher / agent designer is given the flex- ¯ ¯ ¯ + E(set, eo`)T (eo`, set, eo`) ibility to suggest or design their own operators to suit their needs. ¯ (eo`, ¯ storm, eo`)] ¯ Another feature of this framework which potentially adds to its  + E(storm, eo`)T 1  significance, is the nature of the observation stream (with expiring = (0.8) (0.7)[8/12 + 0 + 1/12 + 3/12] observations) and how it interacts with the dual belief base approach. γ  We saw the rich potential of patterns of cognition, which can be + (0)[8/12 + 0 + 0 + 3/12] simultaneously confusing and empowering to the framework user. 1 However, some of the confusion was cleared up in Section 5. My = 0.56. γ feeling is that this observation-stream-dual-belief-base system holds much potential for investigating deep questions in how to model con- Except for evidence persistence, there are probably hundreds of pa- tinual observation and cognition within a belief change setting. pers and article on combinations of these aspects. I could not yet One line of research that should be made is to generalize the dual- find any work dealing with the persistence of veracity of new ev- belief-base approach: What would happen if three, four, more belief idence/observations, as presented in the present paper. Besides the bases are employed, each accommodating a different degree of en- work already cited in this paper, the following may be used as a bib- trenchment of given and received information? Would such a gen- liography to better place the present work in context, and to point eralization reduce to the known systems of belief change (which in- to methods, approaches and techniques not covered in the proposed clude notions of preference, plausibility, entrenchment, etc.)? framework, which could possibly be added to it. Closely related to the discussion in the previous paragraph is that keeping the belief base B fixed is quite a strong stance. In reality, • Probabilistic logics for reasoning with defaults and for belief only the most stubborn people will never change their core views change or learning [15, 24]. even a little bit. Such stubbornness indicates an inability to grow, • Nonmonotonic reasoning systems with optimum entropy infer- that is, an inability to improve one’s reasoning and behaviour. In the ence as central concept [4, 2, 3]. current framework, the plasticity of the assumptions, although im- • Dynamic epistemic logics for reasoning about probabilities [35, portant for accommodating and aggregating recent observations, are 32]. always dominated by B. In future versions, I would like to make B more amenable to learning, while minding sound principles of belief ACKNOWLEDGEMENTS change in logic and cognitive psychology. Perhaps one of the most difficult aspects of using the proposed I would like to thank Edgar Jembere for his comments on a draft of framework is to specify the persistence function Prs(·). However, this paper. the specification is made easier by the property of the operational semantics that: expired observations keep on having an influence of the agent’s reasoning until (if) they are ‘overridden’ by the process of refreshing set F . This means that the agent designer should rather err by specifying less persistence of observations when s/he is uncer- tain about the period to specify. In other words, the agent designer is advised to specify the longest period an observation is guaranteed to persist. I also found it challenging to thinking about how to model how endogenous evict is, for the different worlds and observations. For instance, if Eng(eō, w, evict) = 0.75 when w `, ¯ should it con- ¯ strain the values that Eng(`, w, evict) can take? And if it is im- possible for the elves to evict while they are outside the forest, can Eng(z, w, evict) be greater than 0 if w ¬e? There seem to be several inter-related issues in the modeling of endogeny, including action executability, perceivability and consistency among related observations. I did not focus on these issues here. It would be straightforward to add a utility function (e.g., a POMDP reward function) to the environment model M. Existing planning and decision-making algorithms and methods can then be used together with the proposed framework. Instead of using the POMDP state estimation function during POMDP planning, for in- stance, a more general belief change operator () could be used in planning under uncertainty, where the operator’s definition depends on the elements of the proposed framework. Little work on planning and decision-making with underspecified knowledge exists [31], [36, Sec. 5]. The fundamental distinction between focusing and belief revision when dealing with generic knowledge has been made by Dubois and Prade [8]: “Revision amounts to modifying the generic knowledge when receiving new pieces of generic knowledge (or the factual ev- idence when obtaining more factual information), while focusing is just applying the generic knowledge to the reference class of situa- tions which exactly corresponds to all the available evidence gath- ered on the case under consideration.” The distinction seems very relevant to general systems for knowledge management in dynamic environments. This paper touches on several aspects of computational reason- ing, including stochasticity, imprecision/ignorance, knowledge en- trenchment, default knowledge, physical change and belief update, new evidence and belief revision, and the persistence of evidence. REFERENCES plementation’, International Journal of Approximate Reasoning, 44(3), 301–321, (2007). [25] G. Monahan, ‘A survey of partially observable Markov decision pro- [1] K. Aström, ‘Optimal control of Markov decision processes with incom- cesses: Theory, models, and algorithms’, Management Science, 28(1), plete state estimation’, Journal of Mathematical Analysis and Applica- 1–16, (1982). tions, 10, 174–205, (1965). [26] J. Paris, The Uncertain Reasoner’s Companion: A Mathematical Per- [2] C. Beierle and G. Kern-Isberner, ‘On the modelling of an agent’s epis- spective, Cambridge University Press, Cambridge, 1994. temic state and its dynamic changes’, Electronic Communications of the [27] J. Paris and A. Vencovsk, ‘In defense of the maximum entropy infer- European Association of Software Science and Technology, 12, (2008). ence process’, Intl. Journal of Approximate Reasoning, 17(1), 77–103, [3] C. Beierle and G. Kern-Isberner, ‘Towards an agent model for belief (1997). management’, in Advances in Multiagent Systems, Robotics and Cyber- [28] D. Poole, ‘Decision theory, the situation calculus and conditional netics: Theory and Practice. (Volume III), eds., G. Lasker and J. Pfalz- plans’, Linköping Electronic Articles in Computer and Information Sci- graf, IIAS, Tecumseh, Canada, (2009). ence, 8(3), (1998). [4] R. Bourne and S. Parsons, ‘Extending the maximum entropy approach [29] G. Rens, ‘On stochastic belief revision and update and their combina- to variable strength defaults’, Ann. Math. Artif. Intell., 39(1–2), 123– tion’, in Proceedings of the Sixteenth Intl. Workshop on Non-Monotonic 146, (2003). Reasoning (NMR), eds., G. Kern-Isberner and R. Wassermann, pp. 123– [5] I. Csiszár, ‘I-divergence geometry of probability distributions and min- 132. Technical University of Dortmund, (2016). imization problems’, Annals of Probability, 3, 146–158, (1975). [30] G. Rens, T. Meyer, and G. Casini, ‘Revising incompletely specified [6] F. Dupin de Saint-Cyr and J. Lang, ‘Belief extrapolation (or how to convex probabilistic belief bases’, in Proceedings of the Sixteenth Intl. reason about observations and unpredicted change)’, Artif. Intell., 175, Workshop on Non-Monotonic Reasoning (NMR), eds., G. Kern-Isberner (2011). and R. Wassermann, pp. 133–142. Technical University of Dortmund, [7] D. Doder, Z. Marković, Z. Ognjanović, A. Perović, and M. Rašković, (2016). A Probabilistic Temporal Logic That Can Model Reasoning about Evi- [31] G. Rens, T. Meyer, and G. Lakemeyer, ‘A modal logic for the decision- dence, 9–24, Lecture Notes in Computer Science, Springer Berlin Hei- theoretic projection problem’, in Proceedings of the Seventh Intl. Conf. delberg, Berlin, Heidelberg, 2010. on Agents and Artif. Intell. (ICAART), Revised Selected Papers, eds., [8] D. Dubois and H. Prade, Focusing vs. belief revision: A fundamen- B. Duval, J. Van den Herik, S. Loiseau, and J. Filipe, LNAI, pp. 3–19. tal distinction when dealing with generic knowledge, 96–107, Springer Springer Verlaag, (2015). Berlin Heidelberg, Berlin, Heidelberg, 1997. [32] J. Sack, ‘Extending probabilistic dynamic epistemic logic’, Synthese, [9] A. Ferrein, C. Fritz, and G. Lakemeyer, ‘On-line decision-theoretic 169, 124–257, (2009). Golog for unpredictable domains’, in KI 2004: Advances in Artif. In- [33] S. Shapiro and M. Pagnucco, ‘Iterated belief change and exogenous tell., volume 238/2004, 322–336, Springer Verlag, Berlin / Heidelberg, actions in the situation calculus’, in Proceedings of the Sixteenth Eu- (2004). ropean Conf. on Artif. Intell. (ECAI-04), pp. 878–882, Amsterdam, [10] D. Gabbay, I. Hodkinson, and M. Reynolds, Temporal Logic: Mathe- (2004). IOS Press. matical Foundations and Computational Aspects, volume 1, Clarendon [34] J. Shore and R. Johnson, ‘Axiomatic derivation of the principle of max- Press, Oxford, 1994. imum entropy and the principle of minimum cross-entropy’, Informa- [11] D. Gabbay, M. Reynolds, and M. Finger, Temporal Logic: Mathemat- tion Theory, IEEE Transactions on, 26(1), 26–37, (Jan 1980). ical Foundations and Computational Aspects, volume 2, Oxford Uni- [35] J. Van Benthem, J. Gerbrandy, and B. Kooi, ‘Dynamic update with versity Press, Oxford, 2000. probabilities’, Studia Logica, 93(1), 67–96, (2009). [12] P. Gärdenfors, Knowledge in Flux: Modeling the Dynamics of Epis- [36] F. Voorbraak, ‘Partial Probability: Theory and Applications’, in Pro- temic States, MIT Press, Massachusetts/England, 1988. ceedings of the First Intl. Symposium on Imprecise Probabilities and [13] P. Gärdenfors, Belief Revision, volume 29 of Cambridge Tracts in Their Applications, pp. 360–368, (1999). Theoretical Computer Science, Cambridge University Press, Mas- [37] F. Voorbraak, ‘Probabilistic belief change: Expansion, conditioning and sachusetts/England, 1992. constraining’, in Proceedings of the Fifteenth Conf. on Uncertainty in [14] H. Geffner and B. Bonet, ‘High-level planning and control with incom- Artif. Intell., UAI’99, pp. 655–662, San Francisco, CA, USA, (1999). plete information using POMDPs’, in Proceedings of the Fall AAAI Morgan Kaufmann Publishers Inc. Symposium on Cognitive Robotics, pp. 113–120, Seattle, WA, (1998). [38] A. Yue and W. Liu, ‘Revising imprecise probabilistic beliefs in the AAAI Press. framework of probabilistic logic programming.’, in Proceedings of [15] M. Goldszmidt and J. Pearl, ‘Qualitative probabilities for default rea- the Twenty-third AAAI Conf. on Artif. Intell. (AAAI-08), pp. 590–596, soning, belief revision, and causal modeling’, Artificial Intelligence, 84, (2008). 57–112, (1996). [16] A. Grove and J. Halpern, ‘Updating sets of probabilities’, in Proceed- ings of the Fourteenth Conf. on Uncertainty in Artif. Intell., UAI’98, pp. 173–182, San Francisco, CA, USA, (1998). Morgan Kaufmann. [17] S. Hansson, A textbook of belief dynamics: theory change and database updating, Kluwer Academic, Dortrecht, The Netherlands, 1999. [18] E. Jaynes, ‘Where do we stand on maximum entropy?’, in The Maxi- mum Entropy Formalism, 15–118, MIT Press, (1978). [19] H. Katsuno and A. Mendelzon, ‘On the difference between updating a knowledge base and revising it’, in Proceedings of the Second Intl. Conf. on Principles of Knowledge Representation and Reasoning, pp. 387–394, (1991). [20] G. Kern-Isberner, ‘Characterizing the principle of minimum cross- entropy within a conditional-logical framework’, Artif. Intell., 98(12), 169 – 208, (1998). [21] G. Kern-Isberner, ‘Linking iterated belief change operations to non- monotonic reasoning’, in Proceedings of the Eleventh Intl. Conf. on Principles of Knowledge Representation and Reasoning, pp. 166–176, Menlo Park, CA, (2008). AAAI Press. [22] S. Kullback, Information theory and statistics, volume 1, Dover, New York, 2nd edn., 1968. [23] W. Lovejoy, ‘A survey of algorithmic methods for partially observed Markov decision processes’, Annals of Operations Research, 28, 47– 66, (1991). [24] T. Lukasiewicz, ‘Nonmonotonic probabilistic logics under variable- strength inheritance with overriding: Complexity, algorithms, and im-