1. Introduction

AIC

Generative Logic Models for Data-Based Symbolic Reasoning

Hiroyuki Kido

0 0 Cardif University , Park Place, Cardif, CF10 3AT , UK

2022

8 0000 0002

Acquiring knowledge from data and reasoning with the obtained knowledge are both essential processes of successful logical systems. However, most current logical systems assume diferent algorithms for the two processes. The separation causes serious problems such as knowledge acquisition bottleneck, grounding and commonsense reasoning. This paper gives a simple probabilistic model unifying the two processes. It formalises how data generate models of formal logic and the models generate the truth values of logical formulae. The generated models and truth values are shown to be consistent with maximum likelihood estimation and Fenstad's theorem, respectively. Probabilistic reasoning on logical formulae is shown to be a reasonable alternative to a logical consequence relation and a paraconsistent consequence relation. This paper contributes to data-based reasoning with linear complexity.

eol>Bayesian learning Logical entailment Statistical estimation Reasoning from data Inverse interpretation

1. Introduction

Thanks to big data and computational power available today, Bayesian statistics plays an important role in various fields such as neuroscience, cognitive science and artificial intelligence (AI) [1]. Bayes’ theorem underlies most modern AI systems handling uncertainty such as selfdriving cars, robotics, medical diagnosis and language translation [2]. Bayesian brain hypothesis [3], free-energy principle [4] and predictive coding [5] argue that the brain unconsciously and actively predicts and perceives the world using the belief of states of the world. Bayes’ theorem is used here to explain how sensory inputs such as sight, sound, smell, taste and touch update the belief.

The generality of Bayesian statistics in intellectual phenomena makes us expect that there is a Bayesian algorithm and data structure for logical reasoning and that it can tackle fundamental assumptions of current existing systems. For example, Bayesian networks [6] including naive Bayes, probabilistic logic programming (PLP) [7] and Markov logic networks (MLN) [8] assume independence of knowledge or facts. However, the independence rarely holds in real data. Ordinary formal logic such as propositional logic, first-order logic and modal logic assume consistency of knowledge to avoid entailing everything from contradictions [9, 10]. However, contradictions are inevitable when one tries to scale up the knowledge base or describe subjects in detail. In addition to the above-mentioned methods, probabilistic logic [11] and conditional probabilistic logic [12] assume both statistical and logical machineries. The statistical machinery assigns each logical sentence a probability value or weight so that it reflects aspects of the world, whereas the logical machinery performs logical reasoning on the probabilistic knowledge so that conclusions preserve the uncertainty of premises. For example, Bayesian networks, naive Bayes and PLP assume maximum likelihood estimation or maximum a posteriori estimation for the statistical machinery. The probabilistic logic, conditional probabilistic logic and MLN assume a human expert to plays that role. Kolmogorov’s axioms [13] and Fenstad’s theorems [14] argue constraints that ought to be satisfied by the probability or weight assignment. However, some serious AI problems such as knowledge acquisition bottleneck, grounding, frame problems and commonsense reasoning [2, 15, 16] remain open without unifying the two machineries.

To tackle these assumptions of the current existing systems, we give a simple probabilistic model unifying the two machineries. We call the probabilistic model a generative logic model (GLM) as it formalises the process by which data generate models of formal logic and the models generate the truth values of logical formulae. Ordinary formal logic considers an interpretation on each model (denoted by ), which represents a state of the world. The interpretation is a function that maps each formula (denoted by ) to a truth value, which represents knowledge of the world. Given data (denoted by ), the most basic idea introduced in this paper is to consider the model and interpretation as likelihoods (|) and ( |), respectively. The model likelihood represents the model restricted by the data. Using the interpretation likelihood, Bayes’ theorem gives posterior (| ), which intuitively means an inverse interpretation that gives the probability that the model making formula true is . The likelihood and posterior cause Bayesian learning ( | ) = ∑︀ ( |)(| ), which gives the probability of the formula being true in the restricted models where the formula is true. This paper looks at statistical and logical properties of the Bayesian learning.

We show that probabilistic reasoning on GLM satisfies the Kolmogorov’s axioms (see Proposition 1) and a Fenstad’s theorem (see Equation ( 3 )), and is equivalent to maximum likelihood estimation (see Equation ( 4 )). These facts justify the statistical correctness of GLM. Moreover, we show that probabilistic reasoning on GLM is equivalent to the classical entailment when the premise is consistent (see Theorem 1). It is equivalent to the classical entailment with maximal consistent subsets with respect to set cardinality when the premise is inconsistent (see Theorem 5). These facts justify the logical correctness of GLM. We exemplify commonsense reasoning and counterfactual reasoning with GLM (see Sections 3.1 and 3.5).

The contributions of this paper are summarised as follows. First, this paper ofers an algorithm for data-based logical reasoning with linear complexity with respect to the number data. To the best of our knowledge, this is the first paper introducing the idea of generative models to formalise the process by which data generate models of formal logic and the models generate the truth values of logical formulae. Second, this paper shows that GLM cancels the fundamental three assumptions: independence of knowledge, consistency of knowledge and separation of statistical and logical machineries. In particular, the cancelation of the first assumption is due to our novel idea that GLM only models the dependency between models and logical sentences. This is diferent from the existing methods modelling the dependency between logical sentences.

This paper is organised as follows. Section 2 introduces a generative model for logical consequence relations. Section 3 shows logical and statistical correctness of the generative model. Section 4 briefly summarises the results.

2. Generative Logic Model

probability of , as follows.

The first task is to give a probabilistic representation of the process by which data generate models of formal logic. Let = {1, 2, ..., } be a multiset of data about states of the world. is a random variable whose realisations are data in . For all data ∈ , we define the ( = ) = 1 follows. represents a propositional or first-order language. For the sake of simplicity, we assume no function symbol or open formula in . ℳ

= {1, 2, ..., } is a set of models in formal logic. is assumed to be complete with respect to ℳ, and thus each data in belongs to a single model in ℳ. is a function that maps each data to such a single model. denotes the number of data that belongs to , i.e., = |{ ∈ | = ()}| where || for set denotes the cardinality of . is a random variable whose realisations are models in ℳ. For all models ∈ ℳ and data ∈ , we define the conditional probability of given , as ( = | = ) = {︃1 if = () 0 otherwise

The second task is to give a probabilistic representation of the process by which models generate the truth values of logical sentences. Ordinary formal logic considers an interpretation on each model. The interpretation is a function that maps each formula to a truth value, which given , as follows. represents knowledge of the world. We here introduce parameter extent to which each model is taken for granted in the interpretation. Concretely, denotes the probability that a formula is interpreted as being true (resp. false) in a model where it is true (resp. false). 1 − is therefore the probability that a formula is interpreted as being true (resp. false) in a model where it is false (resp. true). We assume that each formula is a random variable whose realisations are 0 and 1, denoting false and true, respectively. For all models ∈ ℳ and formulae ∈ , we define the conditional probability of each truth value of ∈ [0, 1] to represent the ( = 1| = ) = ( = 0| = ) = {︃ {︃ Here, J = 1K denotes the set of all models in which is true, and J = 0K the set of all models in which is false. The above expressions can be simply written as a Bernoulli distribution with parameter ∈ [0, 1], i.e.,

( | = ) = J K (1 − )1− J K . mined. In probability theory, this means that the truth values of any two formulae 1 and 2 are conditionally independent given a model , i.e., ( 1, 2| = ) = ( 1| = )( 2| = ). Note that the conditional independence holds not only for atomic formulae but for compound formulae as well.1 Let Δ = { 1, 2, ..., } be a multiset of formulae. We (Δ| = ) = ∏︁ ( | = ).

=1 Thus far, we have defined

() and ( |) as categorical distributions and (Δ| ) as Bernoulli distributions with parameter . Given a value of the parameter , they provide the full joint distribution over all of the random variables, i.e. (Δ, , ). We call {(Δ|, ), ( |), ()} a generative logic model (GLM). In sum, the generative logic model defines a data-driven interpretation by which the truth values of formulae are logically interpreted and probabilistically generated from models. The models are also probabilistically generated from data observed from the real world. The GLM meets the following important properties.

Proposition 1. The generative logic model satisfies Kolmogorov’s axioms.

Proposition 2. Let ∈ . ( = 0) = (¬ = 1) holds.

We also abbreviate = to and = to .

In the following, we therefore replace = 0 by ¬ = 1 and then abbreviate ¬ = 1 to ¬ . Example 1. Let and be two propositional symbols meaning ‘it is raining’ and ‘the grass is wet,’ respectively. Each row of Table 1 shows a diferent model, i.e., valuation. The last column shows how many data belongs to each model. Table 2 shows the likelihoods of the atomic propositions being true given a model. Given {(Δ|, = 1), ( |), ()}, we have (|) = ∑︀ =1 (|)(|) ∑︀

=1 (|)() ∑︀ =1 (|) ∑︀ =1 (|)() 1In contrast, independence ( 1, 2) = ( 1)( 2) generally holds for neither atomic formulae nor compound formulae. (1 − ) 140 + 120 + (1 − ) 110 + 130 3 2 + 3 = 0.6.

Example 2. Suppose that has only one 2-ary predicate symbol ‘’ and that the Herbrand universe for has only two constants {, }. There are four ground atoms, {(, ), (, ), (, ), (, )}, which result in 24 = 16 possible models. Each row of Table 3 shows a diferent model and the last column shows the number of data that belongs to the model. Models without data are abbreviated from the table. Given {(Δ|, = 1), ( |), ()}, we have

3. Correctness 3.1. Statistical Estimation

Fenstad [14] argues that the probability of a formula is the sum of the probabilities of the models where the formula is true. Let ∈ and ∈ ℳ. When has no function symbol or open formula, the first Fenstad theorem can have the following simpler form, where |= represents satisfies .

When one has no prior knowledge about the probability of models, the most frequently used method to estimate ( ) only from data is maximum likelihood estimation, which is given as follows. ( ) = arg max (|Φ), Φ ( 1 ) where Φ is the parameter of the categorical distribution ( ). Assuming that each data is independent given Φ, we have

(|Φ) = ∏︁ (|Φ) = 11 22 · · · −−11 (1 − 1 − 2 − · · · − − 1) .

=1 Φ maximises the likelihood if and only if it maximises the log likelihood, which is given as follows.

(Φ) = 1 log 1 + 2 log 2 + · · ·

+ − 1 log − 1 + log(1 − 1 − 2 − · · · − − 1) The maximum likelihood estimate is obtained by solving the following simultaneous equations, which are obtained by diferentiating the log likelihood with respect to each (1 ≤ ≤ − 1). (Φ) =

− 1 − 1 − 2 − · · · − The following is the solution to the simultaneous equations. − 1 = 0 Therefore, the maximum likelihood estimate for the -th model is just the ratio of the number of data in the model to the total number of data. Combining Equation ( 1 ) and the maximum likelihood estimate, we have Φ = ︂( 1 , 2 , ..., )︂

Now, let {(Δ|, = 1), ( |), ()} be a GLM such that = 1. We show that both the Fenstad theorem and maximum likelihood estimation justify the GLM. The Fenstad theorem justifies the GLM because probabilistic inference on the GLM satisfies Equation ( 1 ).

( ) = ∑︁ (, ) = ∑︁ ( |)() = ∑︁J K () = =1 =1 =1 ∑︁ =1:∈J K Maximum likelihood estimation also justifies the GLM because probabilistic inference on the GLM satisfies Equation ( 2 ).

= =1 =1

= ∑︁J K =1 ∑︁ =1:∈J K

We have shown that GLM not only follows the Fenstad’s theorem and maximum likelihood estimation but also treats their results as probabilistic reasoning in a unified way. This result justifies the correctness of GLM from a statistical point of view. ( 2 ) ( 4 )

1 0 0 × × × × × 2 0 1 3 1 0 4 1 1 × × × × data new data ×

3.2. Reasoning from Data

There are some practical advantages of the GLMs. The computational complexity of Equation ( 4 ) depends on , which is unbounded in predicate logic and exponentially increases in propositional logic with respect to the number of propositional symbols. However, Equation ( 4 ) can be transformed as follows for a linear complexity with respect to the number of data, i.e., . 1

= ∑︁J K() ( ) = ∑︁J K

=1 =1 In addition, Equation ( 4 ) has only a constant complexity for recalculation for new data. Let denote the probability calculated with data. +1( ) can be calculated using ( ) as follows.

+1( ) = = = = +1 ∑︁ ( |) ∑︁ (|)() =1 =1 ∑︁ ( |) ∑︁ (|)() + ∑︁ ( |)(|+1)(+1) =1 =1 =1 1 1

∑︁ ( |) ∑︁ (|) + ∑︁ ( |)(|+1) + 1 + 1 =1 =1 =1 ( ) + J K(+1)

+ 1

Finally, as demonstrated in the following example, Equation ( 6 ) is good at modelling the development of commonsense knowledge.

Example 3. Let ‘’ and ‘ ’ be two propositional symbols meaning ‘It is a bird.’ and ‘It flies.’, respectively. Each row of Table 4 shows a diferent model. Given the ten data shown in the fourth column, the probability that implies is calculated using Equation ( 5 ), as follows.

10 1 ( → ) = ∑︁J → K() 10 = 1

=1 It is obvious from the GLM that the counterintuitive knowledge that birds must fly comes from a lack of data. Indeed, taking into account the eleventh data shown in the last column, the probability ( 5 ) ( 6 ) is updated using Equation ( 6 ), as follows.

3.3. Logical Entailment

We showed in the last section that, given {(Δ|, = 1), ( |), ()}, ( ) is equivalent to the maximum likelihood estimate, i.e., for all ∈ ℳ,

() = ∑︁ (|)() = =1 .

Therefore, {(Δ|, = 1), ( |), ()} is equivalent to {(Δ|, = 1), ( )} when ( ) is the maximum likelihood estimate. For the sake of simplicity, we also call the latter a GLM and use it without distinction. To discuss logical properties of the GLM, we assume 0 ∈/ ( ) meaning that every model is possible, i.e., () ̸= 0, for all models. Recall that a set Δ of formulae entails a formula in classical logic, denoted by Δ |= , if is true in every model in which Δ is true, i.e., JΔK ⊆ J K. The following two theorems state that certain inference on the GLM is more cautious than classical entailment.

Theorem 1. Let ∈ and Δ ⊆ such that JΔK ̸= ∅. ( |Δ) = 1 if and only if Δ |= . Proof. Recall that, in formal logic, the fact that there is a model of Δ (or Δ has a model) is equivalent to the fact that there is a model in which every formula in Δ is true in . Dividing models into the models of Δ and the others, we have ∑︁ ()( |) |Δ| + = ∈JΔK

∑︁ () |Δ| + ∈JΔK

∑︁ ()( |)(Δ|) ∈/JΔK

∑︁ ()(Δ|) ∈/JΔK . (Δ|) = ∏︀ ∈Δ ( |) = ∏︀ ∈Δ J K (1 − )1− J K . For all ∈/ JΔK, there is ∈ Δ such that J K = 0. Therefore, (Δ|) = 0 when = 1, for all ∈/ J K Δ . We thus have .

Since 1J K 01− J K = 1100 = 1 if ∈ J K and 1J K 01− J K = 1001 = 0 if ∈/ J K, we have

∑︀∈J∈ΔJKΔ∩KJ K(()) .

Now, ∑︀∑︀∈J∈ΔJKΔ∩KJ K () ()

= 1 if J K ⊇ JΔK, i.e., Δ |= . (0.6, 0, 0.1, 0.3) in Example 1, (|) = 1 but {} ̸|= .

Example 4. Theorem 1 does not hold without assumption 0 ∈/ ( ). Given ( ) = versa.

K Proof. (⇒) If Δ = ∅ then Δ |= , for all , in classical logic. (⇐) We show a counterexample |= holds because , ¬ K = ∅ results in

J ( |, ¬ ) = ∑︀ ()( |)( |)(¬ |)

(1 − ) ∑︀ ()( |) = ∑︀ ()( |)(¬ |) (1 − ) ∑︀ () This is undefined due to division by zero when = 1.

Everything is entailed from a contradiction in the classical entailment. Certain inference on the GLM is more cautious than the classical entailment because the proof of Theorem 2 states that nothing is entailed from a contradiction. In the next section, we look at a GLM that entails something reasonable from contradictions.

3.4. Paraconsistency

Let {lim →1 (Δ|, ), ( )} be a GLM such that → 1 and 0 ∈/ ( ) where represents approaches 1. The following two theorems state that certain inference on the GLM → 1 is more cautious than classical entailment.

Theorem 3. Let ∈ and Δ ⊆ such that JΔK ̸= ∅. ( |Δ) = 1 if and only if Δ |= . Proof. lim →1 does not change the proof of Theorem 1. versa.

Theorem 4. Let ∈ and Δ ⊆ such that JΔK = ∅. If ( |Δ) = 1 then Δ |= , but not vice 1. Suppose ( ) < 1. We can show ( | ∧ ¬ ) < 1 as follows.

Proof. (⇒) Same as for Theorem 2. (⇐) We show a counterexample where Δ |= but ( |Δ) ̸= ( | ∧ ¬ ) = ∑︀ () lim →1 ( |) lim →1 ( ∧ ¬ |) ∑︀

() lim →1 ( ∧ ¬ |) = = lim →1 (1 − ) ∑︀

()( |) (1 − ) ∑︀

() →1 ∑︁ () lim ( |) = ( ) = lim →1 ∑︀ ()( |) ∑︀ () J ∧ ¬ K ⊆ J K

Therefore, ( | ∧ ¬ ) ̸= 1. Note that ∧ ¬ |= because J ∧ ¬ K = ∅ results in

To characterise the certain inference on the GLM, we define an approximate model using maximal consistent subsets with respect to set cardinality. Recall that a set of formulae is consistent if there is a model of the set.

Definition 1 (Approximate model). Let be a model and Δ ⊆ be an inconsistent set of formulae. is an approximate model of Δ if is a model of a maximal (w.r.t. set cardinality) consistent subset of Δ.

Theorem 5. Let Δ ⊆ and ∈ . ( |Δ) = 1 if and only if Δ′ |= , for all maximal (w.r.t. set cardinality) consistent subsets Δ′ of Δ.

Proof. We use notation ((Δ)) to denote the set of all approximate models of Δ. We also use notation |Δ| to denote the number of formulas in Δ that are true in , i.e. |Δ| = ∑︀ ∈ΔJ K. Dividing models into ((Δ)) and the others, we have

∑︀ ( |)()(Δ|) li→m1 ∑︀ ()(Δ|)

∑︁ ( |^)(^)(Δ|^) +

∑︁ (^)(Δ|^) +

∑︁ ( |)()(Δ|) ∈/((Δ))

∑︁ ()(Δ|) .

Now, (Δ|) can be developed as follows, for all (regardless of the membership of ((Δ))). (Δ|) = ∏︁ ( |) = ∏︁ J K (1 − )1− J K ∈Δ ∈Δ = ∑︀ ∈ΔJ K (1 − )∑︀ ∈Δ(1− J K) = |Δ| (1 − )|Δ|−| Δ| Therefore, ( |Δ) = lim →1 ++ where = = = =

∑︁ ^∈((Δ))

∑︁ ∈/((Δ))

∑︁ ^∈((Δ))

∑︁ From Definition 1, |Δ|^ has the same value, for all ^ ∈ ((Δ)). Therefore, the fraction can be simplified by dividing the denominator and numerator by (1 − )|Δ|−| Δ|^ . We thus have ( |Δ) = lim →1 ′′++′′ where

Applying the limit operation, we can cancel out ′ and ′ and have but 3 ̸|= .

3.5. Counterfactuals

⋃︀ J K ⊇ Therefore, ( |Δ) = 1 holds if J

K ⊇ ((Δ)). By definition, ∈ ((Δ)) if is a model of a maximal consistent subset of Δ w.r.t. set cardinality. Therefore, ∈ ((Δ)) if ∈ where Δ′ is a maximal consistent subset of Δ w.r.t. set cardinality. Therefore, ( |Δ) = 1 if Δ′ JΔ′K. In other words, for all maximal (w.r.t. set cardinality) consistent subsets Δ′ of Δ, J K ⊇ J

Δ′K, i.e., Δ′ |= .

Example 5. Let {, , → , ¬}, there are three maximal (w.r.t. set inclusion) consistent subsets, i.e., 1 = {, , → }, 2 = {, ¬} and 3 = { → , ¬}, and one maximal (w.r.t. set cardinality) consistent subset, i.e., 1. (|Δ) = 1 and 1 |= hold, → 1 and ( ) = (0.25, 0.25, 0.25, 0.25) in Example 1. Given Δ = a natural model of counterfactual reasoning.

Would England have won the match against Argentina at the 1986 World Cup if Diego Maradona had not used his hand to score the first goal?

Reasoning with this kind of false and imaginary conditional statement is often called counterfactual reasoning. Let {lim →1 (Δ|, ), ( )} be a GLM such that

→ 1. This section demonstrates that the certain inference on the GLM is , ∈ {0, 1}. They are, respectively, facts about whether our teammate Alice scored a goal or not, whether the game was played at home or not, whether the opponent was 0 (meaning Belgium) or 1 (meaning Brazil), and whether our team won or not. Now, we consider the following question: Our team lost the home game without Alice’s goal against Belgium, i.e., 1. Would we have won if Alice had scored a goal in this match? This question does not have a straightforward answer because it is a counterfactual with respect to the data. Indeed, the set of attributes, i.e., ( = 1, ℎ = 1, = 0), of the counterfactual does not appear in the data.

As long as the counterfactual does not exist in the data, it is reasonable to realise counterfactual reasoning based on the facts most similar to the counterfactual [17]. The counterfactual shares attributes (ℎ = 1, = 0) with 1, ( = 1, ℎ = 1) with 2, ( = 1, = 0) with 3 and ( = 1) with 4. The data thus indicates that 1, 2 and 3 are most similar to the counterfactual in terms of the number of shared attributes. Since the team won in 2 and 3, it is reasonable to conclude that, given the counterfactual, the probability of winning is 2/3. Here, readers might think that 1 should be excluded from the most similar facts because, in the counterfactual, we look at the situation in which Alice scored a goal. However, 1 contains important information because it is empirically true that the probability of winning with Alice’s goal is positively afected by the fact that we won without Alice’s goal and negatively afected by the fact that we lost without Alice’s goal.

Interestingly, the idea of counterfactual reasoning is naturally modelled by the GLM. The predictive probability of winning given the counterfactual is calculated as follows. (|, ℎ, ¬.) =

∑︀ (|)(ℎ|)(¬.|)(|)() li→m1 ∑︀ (|)(ℎ|)(¬.|)()

2(1 − )2 + 3(1 − ) + 3(1 − ) + (1 − )3 2 = lim

→1 2(1 − ) + 2(1 − ) + 2(1 − ) + (1 − )2 = 3 The denominator of the predictive probability turns out to equal the number of facts most similar to the counterfactual, i.e., 1, 2 and 3, whereas the numerator turns out to equal the number of wins from the three games, i.e., 2 and 3. Note that only the GLM with → 1 successfully formalises the idea of counterfactual reasoning.

Our approach for counterfactual reasoning essentially difers from Pearl [ 17] and Lewis [18]. Our approach is data-driven, whereas Pearl’s approach is model-driven in the sense that it assumes a causal diagram. Our approach is based on probability theory, whereas Lewis’s approach is based on the possible-worlds semantics. Although a formal comparison is dificult, Table 6 shows that there are some counterparts between the two approaches.

4. Conclusions

We introduced the idea of generative models to the interpretation of formal logic. The idea referred to as generative logic models accounts for the process by which data about states of the world generate models of formal logic and the models generate the truth values of logical formulae. We showed that it is a theory of reasoning that deals with several types of reasoning such as statistical reasoning, logical reasoning, paraconsistent reasoning and counterfactual reasoning.

∈ . We need to show the following three properties. 1. 0 ≤ ( = ) holds, for all ∈ {0, 1}. 2. ∑︀∈{0,1} ( = ) = 1 holds.

3. ( ∨ = ) = ( = ) + ( = ) − ( ∧ = ) holds, for all ∈ {0, 1}. ( 1 ) ( = ) = ∑︀ ( = |)(). Both ( = |) and () cannot be negative. ( 2 ) Since J = 0K = 1 − J = 1K, we have ( = 0|) + ( = 1|) = J =0K (1 − )1− J =0K + J =1K (1 − )1− J =1K = 1− J =1K (1 − )J =1K + J =1K (1 − )1− J =1K .

If J = 1K = 1 then ( = 0|) + ( = 1|) = (1 − ) + = 1. If J = 1K = 0 then ( = 0|) + ( = 1|) = + (1 − ) = 1. Therefore, we have ( = 0) + ( = 1) = ∑︁ ( = 0|)() + ∑︁ ( = 1|)()

= ∑︁ (){( = 0|) + ( = 1|)} = ∑︁ () = 1.

( 3 ) From ( 2 ), it is suficient to show only case = 1 because case = 0 can be developed as follows.

1 − ( ∨ = 1) = 1 − { ( = 1) + ( = 1) − ( ∧ = 1)} It is suficient to show ( ∨ = 1|) = ( = 1|) + ( = 1|) − ( ∧ = 1|), for all , since the following holds.

∑︁ ( ∨ = 1|)() = ∑︁{( = 1|) + ( = 1|) − ( ∧ = 1|)}() By case analysis, the right expressions can have either of the following four cases. (1 − ) + (1 − ) − (1 − ) = (1 − ) + − (1 − ) = + (1 − ) − (1 − ) = + − = 1 − where ( 7 ), ( 8 ), ( 9 ) and ( 10 ) are obtained in the cases (J = 1K = J = 1K = 0), (J = 1K = 0 and J = 1K = 1), (J = 1K = 1 and ∈ J = 1K = 0), and (J = 1K = J = 1K = 1), respectively. All of the results are consistent with the left expression, i.e., ( ∨ = 1|).

[1]

S. B.

McGrayne , The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy , Yale University Press, 2011 .

[2]

Russell ,

Norvig , Artificial Intelligence :

A Modern

Approach , Fourth Edition, Pearson Education, Inc., 2020 .

[3]

D. C.

Knill ,

Pouget , The bayesian brain: the role of uncertainty in neural coding and computation , Trends in Neurosciences 27 ( 2004 ) 712 - 719 .

[4]

Friston , The free-energy principle: a unified brain theory? , Nature Reviews Neuroscience 11 ( 2010 ) 127 - 138 .

[5]

Hohwy ,

Roepstorf ,

Friston , Predictive coding explains binocular rivalry: An epistemological review , Cognition 108 ( 2008 ) 687 - 701 .

[6]

Pearl , Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference , Morgan Kaufmann, 1988 .

[7]

Sato , A statistical learning method for logic programs with distribution semantics , in: Proc. 12th int. conf. on logic programming , 1995 , pp. 715 - 729 .

[8]

Richardson ,

Domingos , Markov logic networks , Machine Learning 62 ( 2006 ) 107 - 136 .

[9]

Priest , Paraconsistent Logic, volume 6 , handbook of philosophical logic, 2nd ed., Springer, 2002 , pp. 287 - 393 .

[10]

Carnielli ,

M. E.

Coniglio ,

Marcos , Logics of Formal Inconsistency, volume 14 , handbook of philosophical logic, 2nd ed., Springer, 2007 , pp. 1 - 93 .

[11]

N. J.

Nilsson , Probabilistic logic , Artificial Intelligence 28 ( 1986 ) 71 - 87 .

[12]

Rödder , Conditional logic and the principle of entropy , Artificial Intelligence 117 ( 2000 ) 83 - 106 .

[13]

A. N.

Kolmogorov , Foundations of the theory of probability , Chelsea Publishing Co., 1950 .

[14]

Fenstad , Representations of probabilities defined on first order languages , in: Sets, Models and Recursion Theory , volume 46 , Elsevier , 1967 , pp. 156 - 172 .

[15]

Davis , G. Marcus, Commonsense reasoning and commonsense knowledge in artificial intelligence , Communications of the ACM 58 ( 9 ) ( 2015 ) 92 - 103 .

[16]

Brewka , Nonmonotonic Reasoning: Logical Foundations of Commonsense , Cambridge University Press, 1991 .

[17]

Pearl , The Book of Why: The New Science of Cause and Efect , Allen Lane, 2018 .

[18]

Lewis , Counterfactuals, Harvard University Press, Cambridge, MA, 1973 .

Proposition 1 . Let ,