Causal Modeling with Probabilistic Simulation
                     Models?

                                    Duligur Ibeling

                        Stanford University, Stanford, CA, USA
                                duligur@stanford.edu


        Abstract. Recent authors have proposed analyzing conditional reason-
        ing through a notion of intervention on a simulation program, and have
        found a sound and complete axiomatization of the logic of conditionals
        in this setting. Here we extend this setting to the case of probabilistic
        simulation models. We give a natural definition of probability on formu-
        las of the conditional language, allowing for the expression of counter-
        factuals, and prove foundational results about this definition. We also
        find an axiomatization for reasoning about linear inequalities involving
        probabilities in this setting. We prove soundness, completeness, and NP-
        completeness of the satisfiability problem for this logic.

        Keywords: Counterfactuals · conditional reasoning · probabilistic pro-
        grams · conditional simulation.


1     Introduction

Accounts of subjunctive conditionals based on internal causal models offer an
alternative to approaches based on ranking possible worlds by similarity [9]. One
might, e.g., employ structural equation models (SEMs), i.e. systems of equations
connecting the values of relevant variables, as the causal model; the semantics of
conditionals are then based on a precise notion of intervention on the SEM [11].
Recently, some authors [8, 10, 4, 3, 1] have proposed using arbitrary programs,
rather than systems of equations, as causal models. This approach emphasizes
the procedural nature of many internal causal simulations over the purely declar-
ative SEMs.
    It is possible to define precisely this idea of programs as causal models and
to generalize the idea of intervention from SEMs to programs [8]. It is also
possible to give a sound and complete logic of conditionals in this setting [6].
However, these preliminary results have not fully explored the very important
case—from, e.g., the Bayesian Logic modeling language [10] and implicit in the
use of probabilistic programs as cognitive models [3]—of conditionals in a prob-
abilistic setting, via using stochastic programs as the underlying causal model.
?
    Thanks to Thomas Icard for helpful discussions. The author was supported by the
    Sudhakar and Sumithra Ravi Family Graduate Fellowship in the School of Engineer-
    ing at Stanford University for this work.
        D. Ibeling

In the present contribution we will establish foundational definitions and logical
results for this setting, thus extending the causal simulation framework to prob-
abilistic simulation programs. Probabilities over a causal modeling language are
defined and results showing that they may actually be interpreted as probabili-
ties are given. The probabilities are used to give the semantics of a language for
probabilistic reasoning, for which an axiomatization is given. The language and
axiomatization are extensions of an analogous probabilistic language considered
for the purely propositional case by [2]. Soundness and completeness of the axiom
system is proven, and the satisfiability problem is found to be NP-complete.


2     Probabilistic Simulation Models and the Logical
      Language
2.1   Simulation Models
We work toward the definition of a language L for expressing probabilities involv-
ing probabilistic simulation models. Probabilistic simulation models extend the
non-probabilistic1 causal simulation models of [8, 6]. Formally, a non-probabilistic
simulation model is a Turing machine2 , and a probabilistic simulation model is
a probabilistic Turing machine, i.e., a deterministic Turing machine (that of
course still has a read-write memory tape) given read access to a random bit
tape whose squares represent the results of independent fair coin flips. The use
of Turing machines is meant to allow for complete generality and encompasses,
e.g., both logic programming and imperative programming. We sometimes use
intuitive pseudocode in describing simulation models; such pseudocode is readily
convertible to Turing machine code.
    We suppose that simulation models are run initially from an empty tape.3
As a simulation model runs, it reads and writes the values of binary variables on
its tape squares. Eventually, the model either halts with some resultant tape, or
does not halt, depending on the results of the coin flips the model performs in
the course of its simulation. Every probabilistic simulation model thus induces
a distribution on these possible outcomes. We are interested not only in these
outcomes, but also in the dynamics and counterfactual information embodied in
the model. That is, we are interested in what would happen were we to hold the
1
  The use of “non-probabilistic” rather than “deterministic” is intended to prevent
  confusion of the probabilistic/non-probabilistic distinction with the deterministic
  Turing machine/non-deterministic Turing machine distinction. The former distinc-
  tion is about the presence of a source of randomness while the latter is about the
  number of possible halting executions.
2
  [6] does not require these machines to be deterministic, and isolates an additional
  logical principle that is valid when the machines are deterministic. However here
  we will suppose “non-probabilistic simulation model” always refers to one whose
  Turing machine is deterministic. This definition is more useful for comparison with
  the probabilistic case, in which all underlying machines are deterministic.
3
  [6] also includes an initial input tape in the definition of the model. This difference
  is inconsequential.
                       Causal Modeling with Probabilistic Simulation Models

values of the tape square variables fixed in a particular way that counterfactu-
ally differs from the actual values the squares take on—in the distribution over
outcomes that results under a particular intervention:
Definition 1 (Intervention [8]). Let S be a specification of binary values for
a finite number of tape squares: S = {xi }i∈I for a finite index set I ⊆ N. Then
the intervention IS is a computable function from Turing machines to Turing
machines specified in the following way. Given a machine T, the intervened ma-
chine IS (T) does the same thing as T but holds the variables in S to their fixed
values specified by S throughout the run. That is, IS (T) first writes xi to square
i for all i ∈ I, then runs T while ignoring any writes to any of the squares whose
indices are in I.
    Suppose one fixes the entire random bit tape to some particular sequence in
{0, 1}∞ . Then the counterfactual, as well as actual, behavior of a probabilistic
simulation model is completely non-probabilistic. We define first a basic language
that allows us to express facts about such behavior. Then we will define the
probability that a given probabilistic simulation model satisfies a formula of this
basic language. Our final language L uses these probabilities—it thus expresses
facts about the probabilities that counterfactual properties hold. In all logical
expressions we help ourselves to these standard notational conventions: α → β
abbreviates ¬α ∨ β, and α ↔ β denotes (α → β) ∧ (β → α).


2.2   The Basic Language

Syntax The basic, non-probabilistic language Lnon-prob is a propositional lan-
guage over conditionals. Formally:
Definition 2. Let X be a set of atoms {X1 , X2 , X3 , . . . } representing the values
of the memory tape variables and let Lprop be the propositional language formed
by closing X off under conjunction, disjunction, and negation.
    Let the intervention specification language Lint ⊂ Lprop be the language of
purely conjunctive, ordered formulas of unique literals,4 i.e., formulas of the form
li1 ∧ · · · ∧ lin for some n ≥ 0, where ij < ij+1 and each lij is either Xij or ¬Xij .
> abbreviates the “empty intervention” formula with n = 0. Let Lcond be the
conditional language of formulas of the form hαiβ for α ∈ Lint , β ∈ Lprop .
    The overall basic language Lnon-prob is the language formed by closing off the
formulas of Lcond 5 under conjunction, disjunction, and negation.
Every formula α ∈ Lint specifies an intervention Iα by giving a list of variables
to fix and which values they are to be fixed to. Given a subjunctive conditional
4
  The point being that such formulas are in one-to-one correspondence with specifi-
  cations of interventions, i.e., finite lists of variables along with the values each is to
  be held fixed to.
5
  Unlike [6], we do not admit the basic atoms X as atoms of L. There is no diffi-
  culty extending the semantics to such atoms, but allowing them would needlessly
  complicate the proof of Theorem 1.
          D. Ibeling

formula hαiβ ∈ Lcond , we call α the antecedent and β the consequent. We use
[α] for the dual of hαi, i.e., [α]β abbreviates ¬hαi(¬β). Note that hiϕ holds in a
program if the unmodified program halts with a tape making ϕ true.

Semantics The semantics of the basic language are defined from considering a
subjunctive conditional to be true in a simulation model when the program so
intervened upon as to make its antecedent hold halts with such values of the tape
variables as make its consequent hold. For example, consider a simple model that
checks if the first memory tape square X0 is 1 and if so writes a 1 into the second
tape square X1 , and otherwise simply halts. This program satisfies the formulas
hi¬X0 , hi¬X1 , but also the counterfactual formula hX0 i(X0 ∧ X1 ): holding the
first memory square fixed to 1 causes a write of the value 1 into the second tape
square, thus satisfying the consequent X0 ∧ X1 . Formally:
Definition 3. Let T be a non-probabilistic simulation model. Define T |=non-prob
hαiβ iff Iα (T) halts with a memory tape whose variable assignment satisfies β.
Now suppose T is probabilistic, and fix values for all squares on the random
bit tape to some sequence r ∈ {0, 1}∞ . Define T, r |= hαiβ iff Iα (T) when run
with its random bit tape fixed to r halts with a resultant memory tape satisfying
β. Define (in both cases) satisfaction of arbitrary formulas of Lnon-prob in the
familiar way by recursion.
In a sense, the validities of the non-probabilistic setting carry over to this setting,
as we will now show. For ϕ ∈ Lnon-prob , write |=non-prob ϕ if ϕ is valid in the class
of all non-probabilistic simulation models. We will see that all such formulas are
still valid for probabilistic simulation models, under Definition 3, once one fixes
the random bit tape to a particular sequence.
Lemma 1. |=non-prob ϕ if and only if, for all probabilistic simulation models T
and all r ∈ {0, 1}∞ , we have that T, r |= ϕ.
Proof. Suppose |=non-prob ϕ. Consider some probabilistic simulation model T
and sequence r ∈ {0, 1}∞ . ϕ is composed of Lcond -atoms, of the form hαiβ.
What is the behavior of Iα (T), r? Either Iα (T), r reads only a finite portion
of r or reads an unbounded portion of r (in the latter case, it also does not
halt). If only a finite portion is read, let N (a) be the maximal random bit tape
square reached of r. Let N be the maximum of the N (a) for all atoms a in ϕ,
clearly existent as ϕ has finite length. Construct a Turing machine T0 from T
that embeds the contents of r up to index N into its code, replacing any read
from r with its value. This is possible in a finite amount of code as we only have
to include values up to N in T0 .
    What if Iα (T), r ends up reading an unbounded portion of r? We note that
it is possible to write code in T0 to check if the machine is being run under an
α-fixing intervention—i.e., conditional code that runs under Iα (T0 ) and no other
intervention.6 Add such code to T0 , including an infinite loop conditional on an
6
    For the precise details of this construction, see [6]. Briefly, if one wants to check if
    some Xi is being held fixed by an intervention, one can try to toggle Xi ; this attempt
    will be successful iff Xi is not currently being fixed by an intervention.
                        Causal Modeling with Probabilistic Simulation Models

α-intervention for each case where Iα (T), r reads an unbounded portion of r.
Now, for all atoms hαiβ, T0 |=non-prob hαiβ iff T, r |= hαiβ. As this holds for any
atom of ϕ, and |=non-prob ϕ, we have that T, r |= ϕ as desired.
    Now, suppose that T, r |= ϕ for all probabilistic T, r. We want to see that
|=non-prob ϕ. Given a non-probabilistic T, convert T to a probabilistic TM T0 that
never reads from its random tape, and take any random tape r. Then T0 , r |= ϕ
so that T |=non-prob ϕ.                                                           t
                                                                                  u


2.3     Adding Probabilities

Syntax L is the language of linear inequalities over probabilities that formulas
of Lnon-prob hold. More precisely:
Definition 4. Let Lineq be the language of formulas of the form

                             a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c                        (1)

for some n ∈ N, and c, a1 , . . . , an ∈ Z, ϕ1 , . . . , ϕn ∈ Lnon-prob . Then L is the
language of propositional formulas formed by closing off Lineq under conjunction,
disjunction, and negation.
We sometimes write inequalities of a different form from (1) with the under-
standing that they can be readily converted into some L-formula. For example,
an inequality with a > sign is a negation of a Lineq -formula.


Semantics Let T be a probabilistic simulation model. We will shortly define a
probability PT : Lnon-prob → [0, 1]. Now suppose a given ϕ ∈ Lineq has the form
(1). Then T |= ϕ iff the inequality (1) holds when each P(ϕi ) factor takes the
value PT (ϕi ). Satisfaction T |= ϕ for arbitrary ϕ ∈ L is then defined familiarly
by recursion. Given ϕ ∈ Lnon-prob , the probability PT (ϕ) is simply the (standard)
measure of the set of infinite bit sequences r for which T, r |= ϕ. More formally:
let Σ be the σ-algebra on {0, 1}∞ generated by cylinder sets and µ be the
standard measure defined on Σ.7 Now let S(ϕ) = {r ∈ {0, 1}∞ : T, r |= ϕ}.
Then we define PT (ϕ) = µ(S(ϕ)). The following Lemma ensures that S(ϕ) is
always measurable, so that this definition is valid.

Lemma 2. For any ϕ ∈ Lnon-prob , we have S(ϕ) ∈ Σ.

Proof. Proof by induction on the structure of ϕ. If ϕ = ¬ψ, then S(ϕ) is the
complement of a set in Σ and hence is in Σ. The case of a conjunction or
disjunction is similar since Σ is closed under intersection and union. The base
case is that of the atoms. Consider an atom of the form hαiβ. If Iα (T) halts on
x with random bit tape fixed to r, then it does so reading only a finite portion
of r. Thus S(hαiβ) is the union of cylinder sets extending finite strings on which
Iα (T) halts with a result satisfying β, and hence is in Σ.                      t
                                                                                 u
7
    That is, as the product measure of Bernoulli(1/2) measures, as defined in, e.g., [3].
        D. Ibeling

This probability is coherent in the sense that it plays well with the logic of the
basic language:
Proposition 1. For any probabilistic T we have,

1. PT (ϕ) = 1 if |=non-prob ϕ for ϕ ∈ Lnon-prob
2. PT (ϕ) ≤ PT (ψ) whenever |=non-prob ϕ → ψ for ϕ, ψ ∈ Lnon-prob
3. PT (ϕ) = PT (ϕ ∧ ψ) + PT (ϕ ∧ ¬ψ) for all ϕ, ψ ∈ Lnon-prob

Proof. (1) holds since in this case, by Lemma 1, S(ϕ) = {0, 1}∞ . (2) holds
since in this case, S(ϕ) ⊆ S(ψ). Finally (3) holds by noting |=non-prob ϕ ↔
((ϕ ∧ ψ) ∨ (ϕ ∧ ¬ψ)), applying (2), and noting that S(ϕ ∧ ψ) and S(ϕ ∧ ¬ψ) are
disjoint.                                                                   t
                                                                            u

A corollary of part (2) is that logical equivalents under |=non-prob preserve prob-
ability.


2.4   The Case of Almost-Surely Halting Simulations

An interesting special case is that of the simulation models that halt almost-
surely, i.e., with probability 1 under every intervention. Call this class M↓ . Fol-
lowing the urging of [7] we have not restricted the definition of probabilistic sim-
ulation model to such models. We will see that from a logical point of view, this
case is a natural probabilistic analogue of the class M↓non-prob of non-probabilistic
simulation models that halt under every intervention. By this we mean that we
may prove an analogue to Lemma 1. Write |=↓non-prob ϕ if ϕ ∈ Lnon-prob is valid
in M↓non-prob . Note that Lemma 1 does not hold if one merely changes all the
preconditions to be halting/almost-surely halting: consider a probabilistic sim-
ulation model T that repeatedly reads random bits and halts at the first 1 it
discovers; this program is almost-surely halting. But if r is an infinite sequence
of 0s, then T, r 6|= hi>, even though |=↓non-prob hi>. Crucially, we must move to
the perspective of probability and measure to see the analogy:
Lemma 3. |=↓non-prob ϕ if and only if, for all T ∈ M↓ , we have T, r |= ϕ for all
r ∈ {0, 1}∞ except on a set of measure 0.

Proof. Suppose |=↓non-prob ϕ. We claim that for all T ∈ M↓ we have T, r |= ϕ for
all r except on a set of measure 0. Again, consider an atom hαiβ appearing in ϕ.
The set of r for which Iα (T), r does not halt has measure 0, given that T ∈ M↓ .
On each such r, the run of Iα (T), r must read infinitely many bits of r: otherwise,
the intervened machine would have a nonzero probability of not halting. Thus,
excluding such r, it is possible to repeat the construction of T0 from the proof
of Lemma 1 for hαiβ, and in doing this construction we are already ignoring all
cases where an unbounded portion of r is read. This means that we do not have
to include any infinite loops in T0 , and T0 will be always-halting. If we exclude
all the such r arising from all antecedents of atoms of ϕ, then we only exclude
a set of measure 0 since there are finitely many atoms. Except for such r, the
                       Causal Modeling with Probabilistic Simulation Models

construction works, and T0 has, as before, the same behavior as T. But since
|=↓non-prob ϕ, we have that T, r |= ϕ except on the excluded set of measure 0.
   For the opposite direction, let T ∈ M↓non-prob . We wish to show that T |=non-prob
ϕ. Convert T to an identical probabilistic simulation program T0 that never reads
from its random tape. We have T0 , r |= ϕ for all r but on a set of measure 0; in
particular, for at least one r. This implies T |=non-prob ϕ.                     t
                                                                                 u


3   Axiomatic Systems

We will now give an axiomatic system for reasoning in L and prove that it is
sound and complete with respect to probabilistic simulation models: it proves
all (completeness) and only (soundness) the formulas of L that hold for all prob-
abilistic simulation models. We will give an additional system that is sound
and complete for validities with respect to the almost-surely halting simulation
models M↓ .

Definition 5. Let AX be a set of rules and axioms formed by combining the
following three modules.

1. PC: propositional reasoning (tautologies and modus ponens) over atoms of
   L.
2. Prob: the following axioms:

                  NonNeg. P(ϕ) ≥ 0
                     Norm. P(>) = 1
                      Add. P(ϕ ∧ ψ) + P(ϕ ∧ ¬ψ) = P(ϕ)
                      Dist. P(ϕ) = P(ψ) whenever |=non-prob ϕ ↔ ψ


3. Ineq, an axiomatization (see [2]) for reasoning about linear inequalities:

            Zero. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c)
                   ⇔ (a1 P(ϕ1 ) + · · · + an P(ϕn ) + 0P(ϕn+1 ) ≤ c)
    Permutation. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ⇔ (aj1 P(ϕj1 ) + · · · + ajn P(ϕjn ) ≤ c)
                    when j1 , . . . , jn are a permutation of 1, . . . , n
        AddIneq. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ∧ (a01 P(ϕ1 ) + · · · + a0n P(ϕn ) ≤ c0 )
                   ⇒ ((a1 + a01 )P(ϕ1 ) + · · · + (an + a0n )P(ϕn ) ≤ (c + c0 ))
            Mult. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c)
                   ⇒ (ba1 P(ϕ1 ) + · · · + ban P(ϕn ) ≤ bc) for any b > 0
     Dichotomy. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ∨ (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≥ c)
           Mono. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c)
                   ⇒ (a1 P(ϕ1 ) + · · · + an P(ϕn ) < b) if b > c
          D. Ibeling

Additionally, let AX↓ be the system formed in exactly the same way, but replacing
|=non-prob with |=↓non-prob .

Note that the non-probabilistic validities |=non-prob and |=↓non-prob , appearing in
Dist, have been completely axiomatized in [6]. The main result is:

Theorem 1. AX (respectively, AX↓ ) is sound and complete for the validities of
L with respect to M (respectively, M↓ ).

Proof. Soundness (of Prob) follows from Lemma 1, Proposition 1, and, for the
almost-surely halting case, Lemma 3. For completeness, consider the general case
of M first. As usual, it suffices to show that any consistent ϕ ∈ L is satisfiable by
some probabilistic simulation model. We put ϕ into a normal form from which we
construct a canonical model. By PC we may suppose ϕ is in disjunctive normal
form. We may further suppose that it is a conjunction of Lineq -literals, as at
least one (conjunctive) clause in the disjunctive normal form must be consistent.
Let a1 , . . . , an ∈ Lcond be the atoms that appear inside any probability P in ϕ,
and let δ1 , . . . , δ2n represent all the formulas of the form l1 ∧ · · · ∧ ln that can
be obtained by setting each li to either ai or ¬ai . We then have the following,
which is a kind of normal form result:

Lemma 4 (Lemma 2.3, [2]). ϕ is provably-in-AX equivalent to a conjunction

                                       (P(δ1 ) ≥ 0) ∧ · · · ∧ (P(δ2n ) ≥ 0)∧
                                               (P(δ1 ) + · · · + P(δ2n ) = 1)∧
                                 (a1,1 P(δ1 ) + · · · + a1,2n P(δ2n ) ≤ c1 )∧
                                                                                . . .∧
                              (am,1 P(δ1 ) + · · · + am,2n P(δ2n ) ≤ cm )∧
                                 (a01,1 P(δ1 ) + · · · + a01,2n P(δ2n ) > c01 )∧
                                                                                . . .∧
                           (a0m0 ,1 P(δ1 ) + · · · + a0m0 ,2n P(δ2n ) > c0m0 )                                (2)

for some integer coefficients c1 , . . . , cm , c1 , . . . , c0m0 , a1,1 , . . . , am,2n , a01,1 , . . . , a0m0 ,2n .

Proof. Let ψ ∈ Lnon-prob be any of the formulas appearing inside of a probability
P in ϕ. Note that P(ψ) = P(ψ ∧ l1 ) + P(ψ ∧ ¬l1 ) by Add. Moving on to l2 ,
we have, provably, P(ψ ∧ l1 ) = P(ψ ∧ l1 ∧ l2 ) + P(ψ ∧ l1 ∧ ¬l2 ), and we may
rewrite P(ψ ∧ ¬l1 ) similarly. Applying this process successively, we have P(ψ) =
P(ψ ∧δ1 )+· · ·+P(ψ ∧δ2n ). For any term in the right-hand side of this inequality,
if ψ ⇒ δi , propositional reasoning by Dist allows us to replace the term by P(δi ),
and if not, by 0. Thus we always have that P(ψ) = b1 P(δ1 ) + · · · + b2n P(δ2n )
for some coefficients bi . Applying this process to each P-term in ϕ and using
Ineq to rewrite the left-hand sides of the inequalities, and conjoining the (clearly
provable) clauses that P(δi ) ≥ 0 for all 1 ≤ i ≤ 2n , and P(δ1 ) + · · · + P(δ2n ) = 1,
we obtain (2).                                                                        t
                                                                                      u
                         Causal Modeling with Probabilistic Simulation Models

    The conjunction (2) can be seen as a system of simultaneous inequalities over
2n unknowns, P(δ1 ), . . . , P(δ2n ). Ineq is actually sound and complete for such
systems (we refer the reader to Section 4 of [2] for the proof of this fact). So if ϕ
is consistent with AX—which includes Ineq—this system must have a solution.
Thus there are values P(δi ) solving (2). We will now construct a probabilistic
simulation model having precisely these probabilities of satisfying each δi . Note
that for any δi with |=non-prob ⊥ ↔ δi it is provable that P(δi ) = 0, and we may
conjoin this to (2). Note also that δi ∧ δj is unsatisfiable for any i 6= j. Given
these two observations, the following Lemma implies the result.

Lemma 5. For any collection of satisfiable Lnon-prob -formulas ϕ1 , . . . , ϕn no
two of which are jointly satisfiable, and any rational probabilities p1 , . . . , pn ≥ 0
such that p1 + · · · + pn = 1, there is a probabilistic simulation model T such that
PT (ϕi ) = pi for all i, 1 ≤ i ≤ n.

Proof. Since the ϕi are satisfiable, there are non-probabilistic simulation models
Tnon-prob,1 , . . . Tnon-prob,n such that for all i = 1, . . . , n, we have Tnon-prob,i |=non-prob
ϕi . Further, we may suppose the machines so constructed use only a bounded
number of memory tape squares.8 Thus let the maximum index of a tape square
used by any of the Tnon-prob,i be N . We now describe T informally. Suppose
without loss of generality that for all i, pi = ai /b for some common denominator
b. Let T draw a random number r from 1 up to b uniformly, and ensure that
T does any auxiliary computations it might need only on squares with indices
at least N + 1. Check whether r ≤ a1 , and if so, let T branch into the code of
Tnon-prob,1 . If not, check if a1 + 1 ≤ r ≤ a1 + a2 and if so, branch into Tnon-prob,2 .
Repeat the process for p3 , . . . , pn . It’s clear that the probability of branching
into each Tnon-prob,i block is exactly pi , and the same is true under any rele-
vant (i.e., involving only memory tape variables that appear in one of the ϕi )
intervention on T: we may suppose any auxiliary computations T might require
use only memory tape squares with indices past N . After branching into the ith
block, the behavior of T is exactly the same as that of Tnon-prob,i , meaning that
any random bit tape fixings that end up causing a branch into this block will be-
long to S(ϕi ). Another random bit tape fixing that causes a branch into another
block, say the jth, cannot belong to S(ϕi ) since ϕi , ϕj are jointly unsatisfiable.
Thus, PT (φi ) = pi for all i.                                                               t
                                                                                             u

Finally, we must see that this model lies in M↓ if the original formula is con-
sistent with AX↓ . [6] has shown that |=↓non-prob [α]β → hαiβ. Then in the proof
of Lemma 5, we may suppose that each Tnon-prob,i block contains only always-
halting code,9 and hence that T does not contain any loops either: thus it almost-
surely halts.                                                                   t
                                                                                u
8
  Why? Since ϕi are satisfiable, they are consistent with the axiomatization for non-
  probabilistic simulation models given by [6], and hence are satisfied by the canonical
  models given in [6]. These models use only boundedly many tape squares.
9
  Since the canonical programs of [6] for M↓non-prob contain only such code.
        D. Ibeling

4   Computational Complexity
Call the problem of deciding if a formula ϕ ∈ L is satisfiable Prob-Sim-Sat(ϕ).
Theorem 2 shows that solving this problem is no more complex than is proposi-
tional satisfiability.
Theorem 2. Prob-Sim-Sat(ϕ) is NP-complete in |ϕ| (where this length is
computed standardly).
Proof. It’s NP-hard since, given any propositional π, the formula P(hiπ) > 0 is
satisfiable iff π is satisfiable (consider a machine that does nothing but write a
satisfying memory tape assignment out). In order to show that the satisfiability
problem is in NP, we give the following nondeterministic satisfiability algorithm:
guess a program from a class of programs (that we will define shortly) that
includes the program constructed in Lemma 5 —call this canonical program
Tϕ —and check (in polynomial time) if it satisfies ϕ. This algorithm decides sat-
isfiability since, by soundness, a satisfiable formula must be consistent, and hence
has a canonical model of the form constructed in Lemma 5. For the remainder of
the proof, by the “length of a number,” we just mean the length of its computer
(binary) representation. The “length of a rational” is the sum of the lengths of
its numerator and its denominator.
     What is the class of probabilistic simulation models that we may limit our
guesses to? For some fixed constants C, D ∈ N, we will define a class Mϕ,C,D .
We will then show that there exist C, D such that the canonical program of
Lemma 5 belongs to Mϕ,C,D for all consistent ϕ. Let Mϕ,C,D be the fragment
of probabilistic simulation models whose code consists of the following:
 1. Code to draw a random number uniformly between 1 and some N , such that
    N has length at most D|ϕ|3 .
 2. At most n = C|ϕ| branches, that is, copies of: an if-statement with condition
    ` ≤ r ≤ u, whose body is a canonical program Pψi for some ψi ∈ Lnon-prob ,
    of the same form as the non-probabilistic canonical models (i.e., in the class
    defined in the proof of Theorem 2 from [6]).
Letting `i , ui be the bounds for the ith copy in (2), we also require that `1 = 1,
and that `i+1 = ui + 1 for all i, and that un = N . The following fact from linear
algebra (we refer the reader to [2] for the proof) helps us to show that for all
consistent ϕ, the canonical program Tϕ belongs to Mϕ,C,D for some C, D.
Lemma 6. A system of m linear inequalities with integer coefficients of length at
most ` that has a nonnegative solution has a nonnegative solution with at most m
variables nonzero, and where the variables have length at most O(m` + m log m).
                                                                               t
                                                                               u
Apply this lemma to (2). Each inequality in (2) originally came from ϕ, so there
are O(|ϕ|) of them. Further, recall that each integer coefficient in (2) came from
summing up a subset of 2n coefficients originally from ϕ, with n is the number of
atoms appearing anywhere inside P expressions in ϕ. As this n is thus O(|ϕ|)—
and hence 2n is O(|ϕ|) in length—and each original coefficient is also O(|ϕ|)
                      Causal Modeling with Probabilistic Simulation Models

in length, each coefficient is O(|ϕ|) in length as well (lengths of products add).
Thus Lemma 6 shows that without loss of generality, we may suppose that the
solutions for the P(δi ) of (2) have O(|ϕ|2 ) length. The common denominator of
these O(|ϕ|) rationals hence has O(|ϕ|3 ) length. The construction of Lemma 5
has one branch for each of them, and hence O(|ϕ|) branches. This shows the
existence of D for part (1) of the definition of Mϕ,C,D and the existence of a
C for part (2). We will abbreviate Mϕ = Mϕ,C,D for some choice of C, D thus
guaranteed.
    It remains to show that given any program T ∈ Mϕ , we can check if T |= ϕ
in polynomial time. It suffices to show that checking if T |= ψ for ψ ∈ Lineq
is polynomial time: if we know whether T |= ψ for every ψ that ϕ is built
out of, we can decide in linear time if T |= ϕ. Thus suppose ψ has the form
a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c. [6] shows that one may check if the Pψi in part
(2) of the definition of Mϕ satisfy any formula of the basic language Lnon-prob
in polynomial time. Then we can easily compute P(ϕi ) as simply the sum of the
probabilities of each branch that satisfies ϕi . Doing the arithmetic to check if ψ
is satisfied is then certainly polynomial time, so we have our result.            t
                                                                                  u


5   Conclusion and Future Work

We have defined and obtained foundational results concerning a very natural
extension of counterfactual intervention on simulation models to the probabilistic
case.
    One critical operation in probability is conditioning, or updating probabili-
ties given that some event is known to have occurred (in the subjective inter-
pretation, updating a belief for known information). One may already define
conditional probabilities in the usual way in the current framework, and our
framework (without interventions) covers the conditional simulation approach
to certain aspects of common-sense reasoning of [3]. In this approach, one limits
oneself to the runs satisfying a certain query; the framework considered here
would be equivalent for any queries expressible as formulas of Lnon-prob . [2] also
give a logic for reasoning about conditional probabilities. Future work would
involve extending this system to probabilistic simulation models and studying
the complexity of reasoning in that setting.
    As [8, 6] note, the simulation model approach invalidates many important
logical principles that are valid in other approaches [5, 11, 9], such as cautious
monotonicity: [A](B ∧ C) → [A ∧ B]C. However the approach is otherwise quite
general, and an important future direction would be to identify and characterize
subclasses of of simulation models that validate this and other similar logical
principles. We have begun investigating this extension. An interesting conse-
quence it has is on the comparison of conditional probability with the probabil-
ities of subjunctive conditionals: while these two probabilities are not in general
equal in the classes M or M↓ , they are equal in certain restricted classes.
    A final direction we want to mention concerns “open-world” reasoning includ-
ing first-order reasoning about models with some domain, where counterfactual
        D. Ibeling

antecedents might alter how many individuals are being considered or which in-
dividuals fall under a property or bear certain relations to each other. Recursion
and the tools of logic programming [4, 10] make this very natural for the simula-
tion model approach, and we would like to understand the first- and higher-order
conditional logics that result in this approach, in both the non-probabilistic and
probabilistic cases. We have also begun exploring this direction.


References
 1. Chater, N., Oaksford, M.: Programs as causal models: Speculations on mental
    programs and mental representation. Cognitive Science 37(6), 1171–1191 (2013)
 2. Fagin, R., Halpern, J.Y., Megiddo, N.: A logic for reasoning about probabilities.
    Information and Computation 87, 78–128 (1990)
 3. Freer, C.E., Roy, D.M., Tenenbaum, J.B.: Towards common-sense reasoning via
    conditional simulation: legacies of turing in artificial intelligence. In: Downey, R.
    (ed.) Turing’s Legacy: Developments from Turing’s Ideas in Logic, Lecture Notes
    in Logic, vol. 42, pp. 195–252. Cambridge University Press (2014)
 4. Goodman, N.D., Tenenbaum, J.B., Gerstenberg, T.: Concepts in a probabilistic
    language of thought. In: Margolis, E., Laurence, S. (eds.) The Conceptual Mind:
    New Directions in the Study of Concepts. MIT Press (2015)
 5. Halpern, J.Y.: Axiomatizing causal reasoning. Journal of AI Research 12, 317–337
    (2000)
 6. Ibeling, D., Icard, T.: On the conditional logic of simulation models. Proc. 27th
    IJCAI (2018)
 7. Icard, T.: Beyond almost-sure termination. Proc. 39th CogSci (2017)
 8. Icard, T.F.: From programs to causal models. In: Cremers, A., van Gessel, T.,
    Roelofsen, F. (eds.) Proceedings of the 21st Amsterdam Colloquium. pp. 35–44
    (2017)
 9. Lewis, D.: Counterfactuals. Harvard University Press (1973)
10. Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D.L., Kolobov, A.: BLOG:
    Probabilistic models with unknown objects. In: Proc. 19th IJCAI. pp. 1352–1359
    (2005)
11. Pearl, J.: Causality. CUP (2009)