<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>AIC</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Generative Logic Models for Data-Based Symbolic Reasoning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hiroyuki Kido</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cardif University</institution>
          ,
          <addr-line>Park Place, Cardif, CF10 3AT</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>8</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Acquiring knowledge from data and reasoning with the obtained knowledge are both essential processes of successful logical systems. However, most current logical systems assume diferent algorithms for the two processes. The separation causes serious problems such as knowledge acquisition bottleneck, grounding and commonsense reasoning. This paper gives a simple probabilistic model unifying the two processes. It formalises how data generate models of formal logic and the models generate the truth values of logical formulae. The generated models and truth values are shown to be consistent with maximum likelihood estimation and Fenstad's theorem, respectively. Probabilistic reasoning on logical formulae is shown to be a reasonable alternative to a logical consequence relation and a paraconsistent consequence relation. This paper contributes to data-based reasoning with linear complexity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Bayesian learning</kwd>
        <kwd>Logical entailment</kwd>
        <kwd>Statistical estimation</kwd>
        <kwd>Reasoning from data</kwd>
        <kwd>Inverse interpretation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Thanks to big data and computational power available today, Bayesian statistics plays an
important role in various fields such as neuroscience, cognitive science and artificial intelligence
(AI) [1]. Bayes’ theorem underlies most modern AI systems handling uncertainty such as
selfdriving cars, robotics, medical diagnosis and language translation [2]. Bayesian brain hypothesis
[3], free-energy principle [4] and predictive coding [5] argue that the brain unconsciously and
actively predicts and perceives the world using the belief of states of the world. Bayes’ theorem
is used here to explain how sensory inputs such as sight, sound, smell, taste and touch update
the belief.</p>
      <p>The generality of Bayesian statistics in intellectual phenomena makes us expect that there is
a Bayesian algorithm and data structure for logical reasoning and that it can tackle fundamental
assumptions of current existing systems. For example, Bayesian networks [6] including naive
Bayes, probabilistic logic programming (PLP) [7] and Markov logic networks (MLN) [8] assume
independence of knowledge or facts. However, the independence rarely holds in real data.
Ordinary formal logic such as propositional logic, first-order logic and modal logic assume
consistency of knowledge to avoid entailing everything from contradictions [9, 10]. However,
contradictions are inevitable when one tries to scale up the knowledge base or describe subjects
in detail. In addition to the above-mentioned methods, probabilistic logic [11] and conditional
probabilistic logic [12] assume both statistical and logical machineries. The statistical machinery
assigns each logical sentence a probability value or weight so that it reflects aspects of the world,
whereas the logical machinery performs logical reasoning on the probabilistic knowledge so that
conclusions preserve the uncertainty of premises. For example, Bayesian networks, naive Bayes
and PLP assume maximum likelihood estimation or maximum a posteriori estimation for the
statistical machinery. The probabilistic logic, conditional probabilistic logic and MLN assume a
human expert to plays that role. Kolmogorov’s axioms [13] and Fenstad’s theorems [14] argue
constraints that ought to be satisfied by the probability or weight assignment. However, some
serious AI problems such as knowledge acquisition bottleneck, grounding, frame problems and
commonsense reasoning [2, 15, 16] remain open without unifying the two machineries.</p>
      <p>To tackle these assumptions of the current existing systems, we give a simple probabilistic
model unifying the two machineries. We call the probabilistic model a generative logic model
(GLM) as it formalises the process by which data generate models of formal logic and the models
generate the truth values of logical formulae. Ordinary formal logic considers an interpretation
on each model (denoted by ), which represents a state of the world. The interpretation is a
function that maps each formula (denoted by  ) to a truth value, which represents knowledge
of the world. Given data (denoted by ), the most basic idea introduced in this paper is to
consider the model and interpretation as likelihoods (|) and ( |), respectively. The
model likelihood represents the model restricted by the data. Using the interpretation likelihood,
Bayes’ theorem gives posterior (| ), which intuitively means an inverse interpretation that
gives the probability that the model making formula  true is . The likelihood and posterior
cause Bayesian learning ( | ) = ∑︀ ( |)(| ), which gives the probability of the
formula  being true in the restricted models where the formula  is true. This paper looks at
statistical and logical properties of the Bayesian learning.</p>
      <p>
        We show that probabilistic reasoning on GLM satisfies the Kolmogorov’s axioms (see
Proposition 1) and a Fenstad’s theorem (see Equation (
        <xref ref-type="bibr" rid="ref3">3</xref>
        )), and is equivalent to maximum likelihood
estimation (see Equation (
        <xref ref-type="bibr" rid="ref4">4</xref>
        )). These facts justify the statistical correctness of GLM. Moreover,
we show that probabilistic reasoning on GLM is equivalent to the classical entailment when the
premise is consistent (see Theorem 1). It is equivalent to the classical entailment with maximal
consistent subsets with respect to set cardinality when the premise is inconsistent (see Theorem
5). These facts justify the logical correctness of GLM. We exemplify commonsense reasoning
and counterfactual reasoning with GLM (see Sections 3.1 and 3.5).
      </p>
      <p>The contributions of this paper are summarised as follows. First, this paper ofers an algorithm
for data-based logical reasoning with linear complexity with respect to the number data. To
the best of our knowledge, this is the first paper introducing the idea of generative models to
formalise the process by which data generate models of formal logic and the models generate
the truth values of logical formulae. Second, this paper shows that GLM cancels the fundamental
three assumptions: independence of knowledge, consistency of knowledge and separation of
statistical and logical machineries. In particular, the cancelation of the first assumption is due
to our novel idea that GLM only models the dependency between models and logical sentences.
This is diferent from the existing methods modelling the dependency between logical sentences.</p>
      <p>This paper is organised as follows. Section 2 introduces a generative model for logical
consequence relations. Section 3 shows logical and statistical correctness of the generative
model. Section 4 briefly summarises the results.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Generative Logic Model</title>
      <p>probability of , as follows.</p>
      <p>The first task is to give a probabilistic representation of the process by which data generate
models of formal logic. Let  = {1, 2, ...,  } be a multiset of data about states of the world.
 is a random variable whose realisations are data in . For all data  ∈ , we define the
( = ) =
1

follows.
 represents a propositional or first-order language. For the sake of simplicity, we assume no
function symbol or open formula in . ℳ</p>
      <p>= {1, 2, ...,  } is a set of models in formal
logic.  is assumed to be complete with respect to ℳ, and thus each data in  belongs to a
single model in ℳ.  is a function that maps each data to such a single model.  denotes the
number of data that belongs to , i.e.,  = |{ ∈ | = ()}| where || for set 
denotes the cardinality of .  is a random variable whose realisations are models in ℳ. For
all models  ∈ ℳ and data  ∈ , we define the conditional probability of  given , as
( = | = ) =
{︃1 if  = ()
0 otherwise</p>
      <p>The second task is to give a probabilistic representation of the process by which models
generate the truth values of logical sentences. Ordinary formal logic considers an interpretation
on each model. The interpretation is a function that maps each formula to a truth value, which
given , as follows.
represents knowledge of the world. We here introduce parameter 
extent to which each model is taken for granted in the interpretation. Concretely,  denotes
the probability that a formula is interpreted as being true (resp. false) in a model where it is
true (resp. false). 1 −  is therefore the probability that a formula is interpreted as being true
(resp. false) in a model where it is false (resp. true). We assume that each formula is a random
variable whose realisations are 0 and 1, denoting false and true, respectively. For all models
 ∈
ℳ and formulae  ∈ , we define the conditional probability of each truth value of 
∈ [0, 1] to represent the
( = 1| = ) =
( = 0| = ) =
{︃
{︃


Here, J = 1K denotes the set of all models in which  is true, and J = 0K the set of all models
in which  is false. The above expressions can be simply written as a Bernoulli distribution
with parameter  ∈ [0, 1], i.e.,</p>
      <p>( | = ) =  J K (1 −  )1− J K .
mined. In probability theory, this means that the truth values of any two formulae  1 and
 2 are conditionally independent given a model , i.e., ( 1,  2| = ) = ( 1| =
)( 2| = ). Note that the conditional independence holds not only for atomic formulae
but for compound formulae as well.1 Let Δ = { 1,  2, ...,   } be a multiset of  formulae. We
(Δ| = ) = ∏︁ (  | = ).</p>
      <p>=1
Thus far, we have defined</p>
      <p>() and ( |) as categorical distributions and (Δ| )
as Bernoulli distributions with parameter  . Given a value of the parameter  , they
provide the full joint distribution over all of the random variables, i.e. (Δ, , ). We call
{(Δ|,  ), ( |), ()} a generative logic model (GLM). In sum, the generative logic
model defines a data-driven interpretation by which the truth values of formulae are logically
interpreted and probabilistically generated from models. The models are also probabilistically
generated from data observed from the real world. The GLM meets the following important
properties.</p>
      <p>Proposition 1. The generative logic model satisfies Kolmogorov’s axioms.</p>
      <p>Proposition 2. Let  ∈ . ( = 0) = (¬ = 1) holds.</p>
      <p>We also abbreviate  =  to  and  =  to .</p>
      <p>In the following, we therefore replace  = 0 by ¬ = 1 and then abbreviate ¬ = 1 to ¬ .
Example 1. Let  and  be two propositional symbols meaning ‘it is raining’ and ‘the
grass is wet,’ respectively. Each row of Table 1 shows a diferent model, i.e., valuation. The last
column shows how many data belongs to each model. Table 2 shows the likelihoods of the atomic
propositions being true given a model. Given {(Δ|,  = 1), ( |), ()}, we have
(|) =
∑︀
=1 (|)(|) ∑︀</p>
      <p>=1 (|)()
∑︀
=1 (|) ∑︀
=1 (|)()
1In contrast, independence ( 1,  2) = ( 1)( 2) generally holds for neither atomic formulae nor compound
formulae.
(1 −  ) 140 +  120 + (1 −  ) 110 +  130
3
2 + 3
= 0.6.</p>
      <p>Example 2. Suppose that  has only one 2-ary predicate symbol ‘’ and that the
Herbrand universe for  has only two constants {, }. There are four ground atoms,
{(, ), (, ), (, ), (, )}, which result in 24 = 16 possible
models. Each row of Table 3 shows a diferent model and the last column shows the number
of data that belongs to the model. Models without data are abbreviated from the table. Given
{(Δ|,  = 1), ( |), ()}, we have</p>
    </sec>
    <sec id="sec-3">
      <title>3. Correctness</title>
      <sec id="sec-3-1">
        <title>3.1. Statistical Estimation</title>
        <p>Fenstad [14] argues that the probability of a formula is the sum of the probabilities of the
models where the formula is true. Let  ∈  and  ∈ ℳ. When  has no function symbol or
open formula, the first Fenstad theorem can have the following simpler form, where  |= 
represents  satisfies  .</p>
        <p>
          When one has no prior knowledge about the probability of models, the most frequently used
method to estimate ( ) only from data is maximum likelihood estimation, which is given as
follows.
( ) = arg max (|Φ),
Φ
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where Φ is the parameter of the categorical distribution ( ). Assuming that each data is
independent given Φ, we have
        </p>
        <p>(|Φ) = ∏︁ (|Φ) = 11 22 · · · −−11 (1 − 1 − 2 − · · · − − 1) .</p>
        <p>=1
Φ maximises the likelihood if and only if it maximises the log likelihood, which is given as
follows.</p>
        <p>(Φ) = 1 log 1 + 2 log 2 + · · ·</p>
        <p>+ − 1 log − 1
+ log(1 − 1 − 2 − · · · −
− 1)
The maximum likelihood estimate is obtained by solving the following simultaneous equations,
which are obtained by diferentiating the log likelihood with respect to each (1 ≤  ≤  − 1).
(Φ) =</p>
        <p>
          − 1 − 1 − 2 − · · · −
The following is the solution to the simultaneous equations.
− 1
= 0
Therefore, the maximum likelihood estimate for the -th model is just the ratio of the number
of data in the model to the total number of data. Combining Equation (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) and the maximum
likelihood estimate, we have
Φ = ︂( 1 , 2 , ...,  )︂
        </p>
        <p>
          Now, let {(Δ|,  = 1), ( |), ()} be a GLM such that  = 1. We show that both
the Fenstad theorem and maximum likelihood estimation justify the GLM. The Fenstad theorem
justifies the GLM because probabilistic inference on the GLM satisfies Equation (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).
        </p>
        <p>
          ( ) = ∑︁ (,  ) = ∑︁ ( |)() = ∑︁J K () =
=1 =1 =1

∑︁
=1:∈J K
Maximum likelihood estimation also justifies the GLM because probabilistic inference on the
GLM satisfies Equation (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ).
        </p>
        <p>=
=1 =1</p>
        <p>=
∑︁J K 
=1

∑︁
=1:∈J K</p>
        <p>
          We have shown that GLM not only follows the Fenstad’s theorem and maximum likelihood
estimation but also treats their results as probabilistic reasoning in a unified way. This result
justifies the correctness of GLM from a statistical point of view.
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
        </p>
        <p>1 0 0 × × × × ×
2 0 1
3 1 0
4 1 1
× × ×
×
data
new data
×</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Reasoning from Data</title>
        <p>
          There are some practical advantages of the GLMs. The computational complexity of Equation
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) depends on  , which is unbounded in predicate logic and exponentially increases in
propositional logic with respect to the number of propositional symbols. However, Equation (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) can
be transformed as follows for a linear complexity with respect to the number of data, i.e., .
  1
        </p>
        <p>= ∑︁J K() 
( ) = ∑︁J K</p>
        <p>
          =1 =1
In addition, Equation (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) has only a constant complexity for recalculation for new data. Let
 denote the probability calculated with  data. +1( ) can be calculated using  ( ) as
follows.
        </p>
        <p>+1( ) =
=
=
=
 +1
∑︁ ( |) ∑︁ (|)()
=1 =1
  
∑︁ ( |) ∑︁ (|)() + ∑︁ ( |)(|+1)(+1)
=1 =1 =1
   1  1</p>
        <p>∑︁ ( |) ∑︁ (|)  + ∑︁ ( |)(|+1)  + 1
 + 1 =1 =1 =1
 ( ) + J K(+1)</p>
        <p>+ 1</p>
        <p>
          Finally, as demonstrated in the following example, Equation (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) is good at modelling the
development of commonsense knowledge.
        </p>
        <p>
          Example 3. Let ‘’ and ‘ ’ be two propositional symbols meaning ‘It is a bird.’ and ‘It flies.’,
respectively. Each row of Table 4 shows a diferent model. Given the ten data shown in the fourth
column, the probability that  implies   is calculated using Equation (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ), as follows.
        </p>
        <p>10 1
( →  ) = ∑︁J →  K() 10 = 1</p>
        <p>
          =1
It is obvious from the GLM that the counterintuitive knowledge that birds must fly comes from a
lack of data. Indeed, taking into account the eleventh data shown in the last column, the probability
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
is updated using Equation (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ), as follows.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Logical Entailment</title>
        <p>We showed in the last section that, given {(Δ|,  = 1), ( |), ()}, ( ) is equivalent
to the maximum likelihood estimate, i.e., for all  ∈ ℳ,</p>
        <p>() = ∑︁ (|)() =
=1
.</p>
        <p>Therefore, {(Δ|,  = 1), ( |), ()} is equivalent to {(Δ|,  = 1), ( )} when
( ) is the maximum likelihood estimate. For the sake of simplicity, we also call the latter
a GLM and use it without distinction. To discuss logical properties of the GLM, we assume
0 ∈/ ( ) meaning that every model is possible, i.e., () ̸= 0, for all models. Recall that
a set Δ of formulae entails a formula  in classical logic, denoted by Δ |=  , if  is true in
every model in which Δ is true, i.e., JΔK ⊆ J K. The following two theorems state that certain
inference on the GLM is more cautious than classical entailment.</p>
        <p>Theorem 1. Let  ∈  and Δ ⊆  such that JΔK ̸= ∅. ( |Δ) = 1 if and only if Δ |=  .
Proof. Recall that, in formal logic, the fact that there is a model of Δ (or Δ has a model) is
equivalent to the fact that there is a model  in which every formula in Δ is true in . Dividing
models into the models of Δ and the others, we have
∑︁ ()( |) |Δ| +
=
∈JΔK</p>
        <p>∑︁ () |Δ| +
∈JΔK</p>
        <p>∑︁ ()( |)(Δ|)
∈/JΔK</p>
        <p>∑︁ ()(Δ|)
∈/JΔK
.
(Δ|) = ∏︀ ∈Δ ( |) = ∏︀ ∈Δ  J K (1 −  )1− J K . For all  ∈/ JΔK, there is  ∈ Δ such
that J K = 0. Therefore, (Δ|) = 0 when  = 1, for all  ∈/ J K
Δ . We thus have
.</p>
        <p>Since 1J K 01− J K = 1100 = 1 if  ∈ J K and 1J K 01− J K = 1001 = 0 if  ∈/ J K, we
have</p>
        <p>∑︀∈J∈ΔJKΔ∩KJ K(()) .</p>
        <p>Now, ∑︀∑︀∈J∈ΔJKΔ∩KJ K ()
()</p>
        <p>= 1 if J K ⊇ JΔK, i.e., Δ |=  .
(0.6, 0, 0.1, 0.3) in Example 1, (|) = 1 but {} ̸|= .</p>
        <p>Example 4. Theorem 1 does not hold without assumption 0 ∈/ ( ). Given ( ) =
versa.</p>
        <p>Theorem 2. Let  ∈  and Δ ⊆  such that JΔK = ∅. If ( |Δ) = 1 then Δ |=  , but not vice
where Δ |=  but ( |Δ) is undefined. , ¬
J
, ¬ K ⊆ J K
 . Meanwhile, ( |, ¬ ) is given as follows.</p>
        <p>J</p>
        <p>K
Proof. (⇒) If Δ
= ∅ then Δ |=  , for all  , in classical logic. (⇐) We show a counterexample
 |=  holds because , ¬ K = ∅ results in</p>
        <p>J
( |, ¬ ) =
∑︀ ()( |)( |)(¬ |)</p>
        <p>(1 −  ) ∑︀ ()( |)
=
∑︀
 ()( |)(¬ |)
 (1 −  ) ∑︀
 ()
This is undefined due to division by zero when  = 1.</p>
        <p>Everything is entailed from a contradiction in the classical entailment. Certain inference on
the GLM is more cautious than the classical entailment because the proof of Theorem 2 states
that nothing is entailed from a contradiction. In the next section, we look at a GLM that entails
something reasonable from contradictions.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Paraconsistency</title>
        <p>Let {lim →1 (Δ|,  ), ( )} be a GLM such that 
→ 1 and 0 ∈/ ( ) where 
represents  approaches 1. The following two theorems state that certain inference on the GLM
→ 1
is more cautious than classical entailment.</p>
        <p>Theorem 3. Let  ∈  and Δ ⊆  such that JΔK ̸= ∅. ( |Δ) = 1 if and only if Δ |=  .
Proof. lim →1 does not change the proof of Theorem 1.
versa.</p>
        <p>Theorem 4. Let  ∈  and Δ ⊆  such that JΔK = ∅. If ( |Δ) = 1 then Δ |=  , but not vice
1. Suppose ( ) &lt; 1. We can show ( | ∧ ¬ ) &lt; 1 as follows.</p>
        <p>Proof. (⇒) Same as for Theorem 2. (⇐) We show a counterexample where Δ |=  but ( |Δ) ̸=
( | ∧ ¬ ) =
∑︀ () lim →1 ( |) lim →1 ( ∧ ¬ |)
∑︀</p>
        <p>() lim →1 ( ∧ ¬ |)
=
=
lim
 →1

(1 −  ) ∑︀</p>
        <p>()( |)
(1 −  ) ∑︀</p>
        <p>()
 →1
∑︁ () lim ( |) = ( )
= lim
 →1
∑︀ ()( |)
∑︀
 ()
J ∧ ¬ K ⊆ J K</p>
        <p>Therefore, ( | ∧ ¬ ) ̸= 1. Note that  ∧ ¬ |=  because J ∧ ¬ K = ∅ results in</p>
        <p>To characterise the certain inference on the GLM, we define an approximate model using
maximal consistent subsets with respect to set cardinality. Recall that a set of formulae is
consistent if there is a model of the set.</p>
        <p>Definition 1 (Approximate model). Let  be a model and Δ ⊆  be an inconsistent set of
formulae.  is an approximate model of Δ if  is a model of a maximal (w.r.t. set cardinality)
consistent subset of Δ.</p>
        <p>Theorem 5. Let Δ ⊆  and  ∈ . ( |Δ) = 1 if and only if Δ′ |=  , for all maximal (w.r.t.
set cardinality) consistent subsets Δ′ of Δ.</p>
        <p>Proof. We use notation ((Δ)) to denote the set of all approximate models of Δ. We also use
notation |Δ| to denote the number of formulas in Δ that are true in , i.e. |Δ| = ∑︀ ∈ΔJ K.
Dividing models into ((Δ)) and the others, we have</p>
        <p>∑︀ ( |)()(Δ|)
 li→m1 ∑︀ ()(Δ|)</p>
        <p>∑︁ ( |^)(^)(Δ|^) +</p>
        <p>∑︁ (^)(Δ|^) +</p>
        <p>∑︁ ( |)()(Δ|)
∈/((Δ))</p>
        <p>∑︁ ()(Δ|)
.</p>
        <p>Now, (Δ|) can be developed as follows, for all  (regardless of the membership of ((Δ))).
(Δ|) =
∏︁ ( |) = ∏︁  J K (1 −  )1− J K
 ∈Δ  ∈Δ
=  ∑︀ ∈ΔJ K (1 −  )∑︀ ∈Δ(1− J K) =  |Δ| (1 −  )|Δ|−| Δ|
Therefore, ( |Δ) = lim →1 ++ where
 =
 =
 =
 =</p>
        <p>∑︁
^∈((Δ))</p>
        <p>∑︁
∈/((Δ))</p>
        <p>∑︁
^∈((Δ))</p>
        <p>∑︁
From Definition 1, |Δ|^ has the same value, for all ^ ∈ ((Δ)). Therefore, the fraction can
be simplified by dividing the denominator and numerator by (1 −  )|Δ|−| Δ|^ . We thus have
( |Δ) = lim →1 ′′++′′ where</p>
        <p>Applying the limit operation, we can cancel out ′ and ′ and have
but 3 ̸|= .</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Counterfactuals</title>
        <p>⋃︀
J K ⊇
Therefore, ( |Δ) = 1 holds if J</p>
        <p>K ⊇ ((Δ)). By definition,  ∈ ((Δ)) if  is a model of a
maximal consistent subset of Δ w.r.t. set cardinality. Therefore,  ∈ ((Δ)) if  ∈
where Δ′ is a maximal consistent subset of Δ w.r.t. set cardinality. Therefore, ( |Δ) = 1 if
Δ′ JΔ′K. In other words, for all maximal (w.r.t. set cardinality) consistent subsets Δ′ of
Δ, J K ⊇ J</p>
        <p>Δ′K, i.e., Δ′ |=  .</p>
        <p>Example 5. Let 
{, ,  → , ¬}, there are three maximal (w.r.t. set inclusion) consistent subsets,
i.e., 1 = {, ,  → }, 2 = {, ¬} and 3 = { → , ¬}, and
one maximal (w.r.t. set cardinality) consistent subset, i.e., 1. (|Δ) = 1 and 1 |=  hold,
→ 1 and ( ) = (0.25, 0.25, 0.25, 0.25) in Example 1. Given Δ =
a natural model of counterfactual reasoning.</p>
        <p>Would England have won the match against Argentina at the 1986 World Cup if Diego Maradona
had not used his hand to score the first goal?</p>
        <p>Reasoning with this kind of false and imaginary
conditional statement is often called counterfactual reasoning. Let {lim →1 (Δ|,  ), ( )}
be a GLM such that</p>
        <p>→ 1. This section demonstrates that the certain inference on the GLM is
,  ∈ {0, 1}. They are, respectively, facts about whether our teammate Alice scored
a goal or not, whether the game was played at home or not, whether the opponent was 0
(meaning Belgium) or 1 (meaning Brazil), and whether our team won or not. Now, we consider
the following question: Our team lost the home game without Alice’s goal against Belgium, i.e.,
1. Would we have won if Alice had scored a goal in this match? This question does not have a
straightforward answer because it is a counterfactual with respect to the data. Indeed, the set
of attributes, i.e., ( = 1, ℎ = 1,  = 0), of the counterfactual does not appear
in the data.</p>
        <p>As long as the counterfactual does not exist in the data, it is reasonable to realise counterfactual
reasoning based on the facts most similar to the counterfactual [17]. The counterfactual shares
attributes (ℎ = 1,  = 0) with 1, ( = 1, ℎ = 1) with 2, ( =
1,  = 0) with 3 and ( = 1) with 4. The data thus indicates that 1, 2 and
3 are most similar to the counterfactual in terms of the number of shared attributes. Since
the team won in 2 and 3, it is reasonable to conclude that, given the counterfactual, the
probability of winning is 2/3. Here, readers might think that 1 should be excluded from the
most similar facts because, in the counterfactual, we look at the situation in which Alice scored
a goal. However, 1 contains important information because it is empirically true that the
probability of winning with Alice’s goal is positively afected by the fact that we won without
Alice’s goal and negatively afected by the fact that we lost without Alice’s goal.</p>
        <p>Interestingly, the idea of counterfactual reasoning is naturally modelled by the GLM. The
predictive probability of winning given the counterfactual is calculated as follows.
(|, ℎ, ¬.) =</p>
        <p>∑︀ (|)(ℎ|)(¬.|)(|)()
 li→m1 ∑︀ (|)(ℎ|)(¬.|)()</p>
        <p>2(1 −  )2 +  3(1 −  ) +  3(1 −  ) +  (1 −  )3 2
= lim</p>
        <p>→1  2(1 −  ) +  2(1 −  ) +  2(1 −  ) +  (1 −  )2 = 3
The denominator of the predictive probability turns out to equal the number of facts most
similar to the counterfactual, i.e., 1, 2 and 3, whereas the numerator turns out to equal
the number of wins from the three games, i.e., 2 and 3. Note that only the GLM with  → 1
successfully formalises the idea of counterfactual reasoning.</p>
        <p>Our approach for counterfactual reasoning essentially difers from Pearl [ 17] and Lewis
[18]. Our approach is data-driven, whereas Pearl’s approach is model-driven in the sense that
it assumes a causal diagram. Our approach is based on probability theory, whereas Lewis’s
approach is based on the possible-worlds semantics. Although a formal comparison is dificult,
Table 6 shows that there are some counterparts between the two approaches.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>We introduced the idea of generative models to the interpretation of formal logic. The idea
referred to as generative logic models accounts for the process by which data about states of
the world generate models of formal logic and the models generate the truth values of logical
formulae. We showed that it is a theory of reasoning that deals with several types of reasoning
such as statistical reasoning, logical reasoning, paraconsistent reasoning and counterfactual
reasoning.</p>
      <p>∈ . We need to show the following three properties.
1. 0 ≤ ( = ) holds, for all  ∈ {0, 1}.
2. ∑︀∈{0,1} ( = ) = 1 holds.</p>
      <p>
        3. ( ∨  = ) = ( = ) + ( = ) − ( ∧  = ) holds, for all  ∈ {0, 1}.
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) ( = ) = ∑︀ ( = |)(). Both ( = |) and () cannot be negative.
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Since J = 0K = 1 − J = 1K, we have
( = 0|) + ( = 1|) =  J =0K (1 −  )1− J =0K +  J =1K (1 −  )1− J =1K
=  1− J =1K (1 −  )J =1K +  J =1K (1 −  )1− J =1K .
      </p>
      <p>If J = 1K = 1 then ( = 0|) + ( = 1|) = (1 −  ) +  = 1. If J = 1K = 0 then
( = 0|) + ( = 1|) =  + (1 −  ) = 1. Therefore, we have
( = 0) + ( = 1) = ∑︁ ( = 0|)() + ∑︁ ( = 1|)()</p>
      <p>= ∑︁ (){( = 0|) + ( = 1|)} = ∑︁ () = 1.</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) From (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), it is suficient to show only case  = 1 because case  = 0 can be developed as
follows.
      </p>
      <p>1 − ( ∨  = 1) = 1 − { ( = 1) + ( = 1) − ( ∧  = 1)}
It is suficient to show ( ∨  = 1|) = ( = 1|) + ( = 1|) − ( ∧  = 1|), for
all , since the following holds.</p>
      <p>
        ∑︁ ( ∨  = 1|)() = ∑︁{( = 1|) + ( = 1|) − ( ∧  = 1|)}()
 
By case analysis, the right expressions can have either of the following four cases.
(1 −  ) + (1 −  ) − (1 −  ) =
(1 −  ) +  − (1 −  ) = 
 + (1 −  ) − (1 −  ) = 
 +  −  = 
1 − 
where (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ), (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ), (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ) and (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ) are obtained in the cases (J = 1K = J = 1K = 0), (J = 1K = 0
and J = 1K = 1), (J = 1K = 1 and  ∈ J = 1K = 0), and (J = 1K = J = 1K =
1), respectively. All of the results are consistent with the left expression, i.e., ( ∨ = 1|).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>McGrayne</surname>
          </string-name>
          ,
          <article-title>The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy</article-title>
          , Yale University Press,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Norvig</surname>
          </string-name>
          , Artificial Intelligence :
          <string-name>
            <given-names>A Modern</given-names>
            <surname>Approach</surname>
          </string-name>
          , Fourth Edition, Pearson Education, Inc.,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Knill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pouget</surname>
          </string-name>
          ,
          <article-title>The bayesian brain: the role of uncertainty in neural coding and computation</article-title>
          ,
          <source>Trends in Neurosciences</source>
          <volume>27</volume>
          (
          <year>2004</year>
          )
          <fpage>712</fpage>
          -
          <lpage>719</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Friston</surname>
          </string-name>
          ,
          <article-title>The free-energy principle: a unified brain theory?</article-title>
          ,
          <source>Nature Reviews Neuroscience</source>
          <volume>11</volume>
          (
          <year>2010</year>
          )
          <fpage>127</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hohwy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roepstorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Friston</surname>
          </string-name>
          ,
          <article-title>Predictive coding explains binocular rivalry: An epistemological review</article-title>
          ,
          <source>Cognition</source>
          <volume>108</volume>
          (
          <year>2008</year>
          )
          <fpage>687</fpage>
          -
          <lpage>701</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <article-title>Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference</article-title>
          , Morgan Kaufmann,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <article-title>A statistical learning method for logic programs with distribution semantics</article-title>
          ,
          <source>in: Proc. 12th int. conf. on logic programming</source>
          ,
          <year>1995</year>
          , pp.
          <fpage>715</fpage>
          -
          <lpage>729</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          ,
          <article-title>Markov logic networks</article-title>
          ,
          <source>Machine Learning</source>
          <volume>62</volume>
          (
          <year>2006</year>
          )
          <fpage>107</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Priest</surname>
          </string-name>
          , Paraconsistent Logic, volume
          <volume>6</volume>
          , handbook of philosophical logic, 2nd ed., Springer,
          <year>2002</year>
          , pp.
          <fpage>287</fpage>
          -
          <lpage>393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Carnielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Coniglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marcos</surname>
          </string-name>
          , Logics of Formal Inconsistency, volume
          <volume>14</volume>
          , handbook of philosophical logic, 2nd ed., Springer,
          <year>2007</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <article-title>Probabilistic logic</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>28</volume>
          (
          <year>1986</year>
          )
          <fpage>71</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Rödder</surname>
          </string-name>
          ,
          <article-title>Conditional logic and the principle of entropy</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>117</volume>
          (
          <year>2000</year>
          )
          <fpage>83</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Kolmogorov</surname>
          </string-name>
          ,
          <article-title>Foundations of the theory of probability</article-title>
          , Chelsea Publishing Co.,
          <year>1950</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fenstad</surname>
          </string-name>
          ,
          <article-title>Representations of probabilities defined on first order languages</article-title>
          ,
          <source>in: Sets, Models and Recursion Theory</source>
          , volume
          <volume>46</volume>
          ,
          <string-name>
            <surname>Elsevier</surname>
          </string-name>
          ,
          <year>1967</year>
          , pp.
          <fpage>156</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Davis</surname>
          </string-name>
          , G. Marcus,
          <article-title>Commonsense reasoning and commonsense knowledge in artificial intelligence</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>58</volume>
          (
          <issue>9</issue>
          ) (
          <year>2015</year>
          )
          <fpage>92</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Brewka</surname>
          </string-name>
          ,
          <source>Nonmonotonic Reasoning: Logical Foundations of Commonsense</source>
          , Cambridge University Press,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <source>The Book of Why: The New Science of Cause and Efect</source>
          , Allen Lane,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , Counterfactuals, Harvard University Press, Cambridge, MA,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>Proposition 1</article-title>
          . Let ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>