Causal Modeling with Probabilistic Simulation Models? Duligur Ibeling Stanford University, Stanford, CA, USA duligur@stanford.edu Abstract. Recent authors have proposed analyzing conditional reason- ing through a notion of intervention on a simulation program, and have found a sound and complete axiomatization of the logic of conditionals in this setting. Here we extend this setting to the case of probabilistic simulation models. We give a natural definition of probability on formu- las of the conditional language, allowing for the expression of counter- factuals, and prove foundational results about this definition. We also find an axiomatization for reasoning about linear inequalities involving probabilities in this setting. We prove soundness, completeness, and NP- completeness of the satisfiability problem for this logic. Keywords: Counterfactuals · conditional reasoning · probabilistic pro- grams · conditional simulation. 1 Introduction Accounts of subjunctive conditionals based on internal causal models offer an alternative to approaches based on ranking possible worlds by similarity [9]. One might, e.g., employ structural equation models (SEMs), i.e. systems of equations connecting the values of relevant variables, as the causal model; the semantics of conditionals are then based on a precise notion of intervention on the SEM [11]. Recently, some authors [8, 10, 4, 3, 1] have proposed using arbitrary programs, rather than systems of equations, as causal models. This approach emphasizes the procedural nature of many internal causal simulations over the purely declar- ative SEMs. It is possible to define precisely this idea of programs as causal models and to generalize the idea of intervention from SEMs to programs [8]. It is also possible to give a sound and complete logic of conditionals in this setting [6]. However, these preliminary results have not fully explored the very important case—from, e.g., the Bayesian Logic modeling language [10] and implicit in the use of probabilistic programs as cognitive models [3]—of conditionals in a prob- abilistic setting, via using stochastic programs as the underlying causal model. ? Thanks to Thomas Icard for helpful discussions. The author was supported by the Sudhakar and Sumithra Ravi Family Graduate Fellowship in the School of Engineer- ing at Stanford University for this work. D. Ibeling In the present contribution we will establish foundational definitions and logical results for this setting, thus extending the causal simulation framework to prob- abilistic simulation programs. Probabilities over a causal modeling language are defined and results showing that they may actually be interpreted as probabili- ties are given. The probabilities are used to give the semantics of a language for probabilistic reasoning, for which an axiomatization is given. The language and axiomatization are extensions of an analogous probabilistic language considered for the purely propositional case by [2]. Soundness and completeness of the axiom system is proven, and the satisfiability problem is found to be NP-complete. 2 Probabilistic Simulation Models and the Logical Language 2.1 Simulation Models We work toward the definition of a language L for expressing probabilities involv- ing probabilistic simulation models. Probabilistic simulation models extend the non-probabilistic1 causal simulation models of [8, 6]. Formally, a non-probabilistic simulation model is a Turing machine2 , and a probabilistic simulation model is a probabilistic Turing machine, i.e., a deterministic Turing machine (that of course still has a read-write memory tape) given read access to a random bit tape whose squares represent the results of independent fair coin flips. The use of Turing machines is meant to allow for complete generality and encompasses, e.g., both logic programming and imperative programming. We sometimes use intuitive pseudocode in describing simulation models; such pseudocode is readily convertible to Turing machine code. We suppose that simulation models are run initially from an empty tape.3 As a simulation model runs, it reads and writes the values of binary variables on its tape squares. Eventually, the model either halts with some resultant tape, or does not halt, depending on the results of the coin flips the model performs in the course of its simulation. Every probabilistic simulation model thus induces a distribution on these possible outcomes. We are interested not only in these outcomes, but also in the dynamics and counterfactual information embodied in the model. That is, we are interested in what would happen were we to hold the 1 The use of “non-probabilistic” rather than “deterministic” is intended to prevent confusion of the probabilistic/non-probabilistic distinction with the deterministic Turing machine/non-deterministic Turing machine distinction. The former distinc- tion is about the presence of a source of randomness while the latter is about the number of possible halting executions. 2 [6] does not require these machines to be deterministic, and isolates an additional logical principle that is valid when the machines are deterministic. However here we will suppose “non-probabilistic simulation model” always refers to one whose Turing machine is deterministic. This definition is more useful for comparison with the probabilistic case, in which all underlying machines are deterministic. 3 [6] also includes an initial input tape in the definition of the model. This difference is inconsequential. Causal Modeling with Probabilistic Simulation Models values of the tape square variables fixed in a particular way that counterfactu- ally differs from the actual values the squares take on—in the distribution over outcomes that results under a particular intervention: Definition 1 (Intervention [8]). Let S be a specification of binary values for a finite number of tape squares: S = {xi }i∈I for a finite index set I ⊆ N. Then the intervention IS is a computable function from Turing machines to Turing machines specified in the following way. Given a machine T, the intervened ma- chine IS (T) does the same thing as T but holds the variables in S to their fixed values specified by S throughout the run. That is, IS (T) first writes xi to square i for all i ∈ I, then runs T while ignoring any writes to any of the squares whose indices are in I. Suppose one fixes the entire random bit tape to some particular sequence in {0, 1}∞ . Then the counterfactual, as well as actual, behavior of a probabilistic simulation model is completely non-probabilistic. We define first a basic language that allows us to express facts about such behavior. Then we will define the probability that a given probabilistic simulation model satisfies a formula of this basic language. Our final language L uses these probabilities—it thus expresses facts about the probabilities that counterfactual properties hold. In all logical expressions we help ourselves to these standard notational conventions: α → β abbreviates ¬α ∨ β, and α ↔ β denotes (α → β) ∧ (β → α). 2.2 The Basic Language Syntax The basic, non-probabilistic language Lnon-prob is a propositional lan- guage over conditionals. Formally: Definition 2. Let X be a set of atoms {X1 , X2 , X3 , . . . } representing the values of the memory tape variables and let Lprop be the propositional language formed by closing X off under conjunction, disjunction, and negation. Let the intervention specification language Lint ⊂ Lprop be the language of purely conjunctive, ordered formulas of unique literals,4 i.e., formulas of the form li1 ∧ · · · ∧ lin for some n ≥ 0, where ij < ij+1 and each lij is either Xij or ¬Xij . > abbreviates the “empty intervention” formula with n = 0. Let Lcond be the conditional language of formulas of the form hαiβ for α ∈ Lint , β ∈ Lprop . The overall basic language Lnon-prob is the language formed by closing off the formulas of Lcond 5 under conjunction, disjunction, and negation. Every formula α ∈ Lint specifies an intervention Iα by giving a list of variables to fix and which values they are to be fixed to. Given a subjunctive conditional 4 The point being that such formulas are in one-to-one correspondence with specifi- cations of interventions, i.e., finite lists of variables along with the values each is to be held fixed to. 5 Unlike [6], we do not admit the basic atoms X as atoms of L. There is no diffi- culty extending the semantics to such atoms, but allowing them would needlessly complicate the proof of Theorem 1. D. Ibeling formula hαiβ ∈ Lcond , we call α the antecedent and β the consequent. We use [α] for the dual of hαi, i.e., [α]β abbreviates ¬hαi(¬β). Note that hiϕ holds in a program if the unmodified program halts with a tape making ϕ true. Semantics The semantics of the basic language are defined from considering a subjunctive conditional to be true in a simulation model when the program so intervened upon as to make its antecedent hold halts with such values of the tape variables as make its consequent hold. For example, consider a simple model that checks if the first memory tape square X0 is 1 and if so writes a 1 into the second tape square X1 , and otherwise simply halts. This program satisfies the formulas hi¬X0 , hi¬X1 , but also the counterfactual formula hX0 i(X0 ∧ X1 ): holding the first memory square fixed to 1 causes a write of the value 1 into the second tape square, thus satisfying the consequent X0 ∧ X1 . Formally: Definition 3. Let T be a non-probabilistic simulation model. Define T |=non-prob hαiβ iff Iα (T) halts with a memory tape whose variable assignment satisfies β. Now suppose T is probabilistic, and fix values for all squares on the random bit tape to some sequence r ∈ {0, 1}∞ . Define T, r |= hαiβ iff Iα (T) when run with its random bit tape fixed to r halts with a resultant memory tape satisfying β. Define (in both cases) satisfaction of arbitrary formulas of Lnon-prob in the familiar way by recursion. In a sense, the validities of the non-probabilistic setting carry over to this setting, as we will now show. For ϕ ∈ Lnon-prob , write |=non-prob ϕ if ϕ is valid in the class of all non-probabilistic simulation models. We will see that all such formulas are still valid for probabilistic simulation models, under Definition 3, once one fixes the random bit tape to a particular sequence. Lemma 1. |=non-prob ϕ if and only if, for all probabilistic simulation models T and all r ∈ {0, 1}∞ , we have that T, r |= ϕ. Proof. Suppose |=non-prob ϕ. Consider some probabilistic simulation model T and sequence r ∈ {0, 1}∞ . ϕ is composed of Lcond -atoms, of the form hαiβ. What is the behavior of Iα (T), r? Either Iα (T), r reads only a finite portion of r or reads an unbounded portion of r (in the latter case, it also does not halt). If only a finite portion is read, let N (a) be the maximal random bit tape square reached of r. Let N be the maximum of the N (a) for all atoms a in ϕ, clearly existent as ϕ has finite length. Construct a Turing machine T0 from T that embeds the contents of r up to index N into its code, replacing any read from r with its value. This is possible in a finite amount of code as we only have to include values up to N in T0 . What if Iα (T), r ends up reading an unbounded portion of r? We note that it is possible to write code in T0 to check if the machine is being run under an α-fixing intervention—i.e., conditional code that runs under Iα (T0 ) and no other intervention.6 Add such code to T0 , including an infinite loop conditional on an 6 For the precise details of this construction, see [6]. Briefly, if one wants to check if some Xi is being held fixed by an intervention, one can try to toggle Xi ; this attempt will be successful iff Xi is not currently being fixed by an intervention. Causal Modeling with Probabilistic Simulation Models α-intervention for each case where Iα (T), r reads an unbounded portion of r. Now, for all atoms hαiβ, T0 |=non-prob hαiβ iff T, r |= hαiβ. As this holds for any atom of ϕ, and |=non-prob ϕ, we have that T, r |= ϕ as desired. Now, suppose that T, r |= ϕ for all probabilistic T, r. We want to see that |=non-prob ϕ. Given a non-probabilistic T, convert T to a probabilistic TM T0 that never reads from its random tape, and take any random tape r. Then T0 , r |= ϕ so that T |=non-prob ϕ. t u 2.3 Adding Probabilities Syntax L is the language of linear inequalities over probabilities that formulas of Lnon-prob hold. More precisely: Definition 4. Let Lineq be the language of formulas of the form a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c (1) for some n ∈ N, and c, a1 , . . . , an ∈ Z, ϕ1 , . . . , ϕn ∈ Lnon-prob . Then L is the language of propositional formulas formed by closing off Lineq under conjunction, disjunction, and negation. We sometimes write inequalities of a different form from (1) with the under- standing that they can be readily converted into some L-formula. For example, an inequality with a > sign is a negation of a Lineq -formula. Semantics Let T be a probabilistic simulation model. We will shortly define a probability PT : Lnon-prob → [0, 1]. Now suppose a given ϕ ∈ Lineq has the form (1). Then T |= ϕ iff the inequality (1) holds when each P(ϕi ) factor takes the value PT (ϕi ). Satisfaction T |= ϕ for arbitrary ϕ ∈ L is then defined familiarly by recursion. Given ϕ ∈ Lnon-prob , the probability PT (ϕ) is simply the (standard) measure of the set of infinite bit sequences r for which T, r |= ϕ. More formally: let Σ be the σ-algebra on {0, 1}∞ generated by cylinder sets and µ be the standard measure defined on Σ.7 Now let S(ϕ) = {r ∈ {0, 1}∞ : T, r |= ϕ}. Then we define PT (ϕ) = µ(S(ϕ)). The following Lemma ensures that S(ϕ) is always measurable, so that this definition is valid. Lemma 2. For any ϕ ∈ Lnon-prob , we have S(ϕ) ∈ Σ. Proof. Proof by induction on the structure of ϕ. If ϕ = ¬ψ, then S(ϕ) is the complement of a set in Σ and hence is in Σ. The case of a conjunction or disjunction is similar since Σ is closed under intersection and union. The base case is that of the atoms. Consider an atom of the form hαiβ. If Iα (T) halts on x with random bit tape fixed to r, then it does so reading only a finite portion of r. Thus S(hαiβ) is the union of cylinder sets extending finite strings on which Iα (T) halts with a result satisfying β, and hence is in Σ. t u 7 That is, as the product measure of Bernoulli(1/2) measures, as defined in, e.g., [3]. D. Ibeling This probability is coherent in the sense that it plays well with the logic of the basic language: Proposition 1. For any probabilistic T we have, 1. PT (ϕ) = 1 if |=non-prob ϕ for ϕ ∈ Lnon-prob 2. PT (ϕ) ≤ PT (ψ) whenever |=non-prob ϕ → ψ for ϕ, ψ ∈ Lnon-prob 3. PT (ϕ) = PT (ϕ ∧ ψ) + PT (ϕ ∧ ¬ψ) for all ϕ, ψ ∈ Lnon-prob Proof. (1) holds since in this case, by Lemma 1, S(ϕ) = {0, 1}∞ . (2) holds since in this case, S(ϕ) ⊆ S(ψ). Finally (3) holds by noting |=non-prob ϕ ↔ ((ϕ ∧ ψ) ∨ (ϕ ∧ ¬ψ)), applying (2), and noting that S(ϕ ∧ ψ) and S(ϕ ∧ ¬ψ) are disjoint. t u A corollary of part (2) is that logical equivalents under |=non-prob preserve prob- ability. 2.4 The Case of Almost-Surely Halting Simulations An interesting special case is that of the simulation models that halt almost- surely, i.e., with probability 1 under every intervention. Call this class M↓ . Fol- lowing the urging of [7] we have not restricted the definition of probabilistic sim- ulation model to such models. We will see that from a logical point of view, this case is a natural probabilistic analogue of the class M↓non-prob of non-probabilistic simulation models that halt under every intervention. By this we mean that we may prove an analogue to Lemma 1. Write |=↓non-prob ϕ if ϕ ∈ Lnon-prob is valid in M↓non-prob . Note that Lemma 1 does not hold if one merely changes all the preconditions to be halting/almost-surely halting: consider a probabilistic sim- ulation model T that repeatedly reads random bits and halts at the first 1 it discovers; this program is almost-surely halting. But if r is an infinite sequence of 0s, then T, r 6|= hi>, even though |=↓non-prob hi>. Crucially, we must move to the perspective of probability and measure to see the analogy: Lemma 3. |=↓non-prob ϕ if and only if, for all T ∈ M↓ , we have T, r |= ϕ for all r ∈ {0, 1}∞ except on a set of measure 0. Proof. Suppose |=↓non-prob ϕ. We claim that for all T ∈ M↓ we have T, r |= ϕ for all r except on a set of measure 0. Again, consider an atom hαiβ appearing in ϕ. The set of r for which Iα (T), r does not halt has measure 0, given that T ∈ M↓ . On each such r, the run of Iα (T), r must read infinitely many bits of r: otherwise, the intervened machine would have a nonzero probability of not halting. Thus, excluding such r, it is possible to repeat the construction of T0 from the proof of Lemma 1 for hαiβ, and in doing this construction we are already ignoring all cases where an unbounded portion of r is read. This means that we do not have to include any infinite loops in T0 , and T0 will be always-halting. If we exclude all the such r arising from all antecedents of atoms of ϕ, then we only exclude a set of measure 0 since there are finitely many atoms. Except for such r, the Causal Modeling with Probabilistic Simulation Models construction works, and T0 has, as before, the same behavior as T. But since |=↓non-prob ϕ, we have that T, r |= ϕ except on the excluded set of measure 0. For the opposite direction, let T ∈ M↓non-prob . We wish to show that T |=non-prob ϕ. Convert T to an identical probabilistic simulation program T0 that never reads from its random tape. We have T0 , r |= ϕ for all r but on a set of measure 0; in particular, for at least one r. This implies T |=non-prob ϕ. t u 3 Axiomatic Systems We will now give an axiomatic system for reasoning in L and prove that it is sound and complete with respect to probabilistic simulation models: it proves all (completeness) and only (soundness) the formulas of L that hold for all prob- abilistic simulation models. We will give an additional system that is sound and complete for validities with respect to the almost-surely halting simulation models M↓ . Definition 5. Let AX be a set of rules and axioms formed by combining the following three modules. 1. PC: propositional reasoning (tautologies and modus ponens) over atoms of L. 2. Prob: the following axioms: NonNeg. P(ϕ) ≥ 0 Norm. P(>) = 1 Add. P(ϕ ∧ ψ) + P(ϕ ∧ ¬ψ) = P(ϕ) Dist. P(ϕ) = P(ψ) whenever |=non-prob ϕ ↔ ψ 3. Ineq, an axiomatization (see [2]) for reasoning about linear inequalities: Zero. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ⇔ (a1 P(ϕ1 ) + · · · + an P(ϕn ) + 0P(ϕn+1 ) ≤ c) Permutation. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ⇔ (aj1 P(ϕj1 ) + · · · + ajn P(ϕjn ) ≤ c) when j1 , . . . , jn are a permutation of 1, . . . , n AddIneq. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ∧ (a01 P(ϕ1 ) + · · · + a0n P(ϕn ) ≤ c0 ) ⇒ ((a1 + a01 )P(ϕ1 ) + · · · + (an + a0n )P(ϕn ) ≤ (c + c0 )) Mult. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ⇒ (ba1 P(ϕ1 ) + · · · + ban P(ϕn ) ≤ bc) for any b > 0 Dichotomy. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ∨ (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≥ c) Mono. (a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c) ⇒ (a1 P(ϕ1 ) + · · · + an P(ϕn ) < b) if b > c D. Ibeling Additionally, let AX↓ be the system formed in exactly the same way, but replacing |=non-prob with |=↓non-prob . Note that the non-probabilistic validities |=non-prob and |=↓non-prob , appearing in Dist, have been completely axiomatized in [6]. The main result is: Theorem 1. AX (respectively, AX↓ ) is sound and complete for the validities of L with respect to M (respectively, M↓ ). Proof. Soundness (of Prob) follows from Lemma 1, Proposition 1, and, for the almost-surely halting case, Lemma 3. For completeness, consider the general case of M first. As usual, it suffices to show that any consistent ϕ ∈ L is satisfiable by some probabilistic simulation model. We put ϕ into a normal form from which we construct a canonical model. By PC we may suppose ϕ is in disjunctive normal form. We may further suppose that it is a conjunction of Lineq -literals, as at least one (conjunctive) clause in the disjunctive normal form must be consistent. Let a1 , . . . , an ∈ Lcond be the atoms that appear inside any probability P in ϕ, and let δ1 , . . . , δ2n represent all the formulas of the form l1 ∧ · · · ∧ ln that can be obtained by setting each li to either ai or ¬ai . We then have the following, which is a kind of normal form result: Lemma 4 (Lemma 2.3, [2]). ϕ is provably-in-AX equivalent to a conjunction (P(δ1 ) ≥ 0) ∧ · · · ∧ (P(δ2n ) ≥ 0)∧ (P(δ1 ) + · · · + P(δ2n ) = 1)∧ (a1,1 P(δ1 ) + · · · + a1,2n P(δ2n ) ≤ c1 )∧ . . .∧ (am,1 P(δ1 ) + · · · + am,2n P(δ2n ) ≤ cm )∧ (a01,1 P(δ1 ) + · · · + a01,2n P(δ2n ) > c01 )∧ . . .∧ (a0m0 ,1 P(δ1 ) + · · · + a0m0 ,2n P(δ2n ) > c0m0 ) (2) for some integer coefficients c1 , . . . , cm , c1 , . . . , c0m0 , a1,1 , . . . , am,2n , a01,1 , . . . , a0m0 ,2n . Proof. Let ψ ∈ Lnon-prob be any of the formulas appearing inside of a probability P in ϕ. Note that P(ψ) = P(ψ ∧ l1 ) + P(ψ ∧ ¬l1 ) by Add. Moving on to l2 , we have, provably, P(ψ ∧ l1 ) = P(ψ ∧ l1 ∧ l2 ) + P(ψ ∧ l1 ∧ ¬l2 ), and we may rewrite P(ψ ∧ ¬l1 ) similarly. Applying this process successively, we have P(ψ) = P(ψ ∧δ1 )+· · ·+P(ψ ∧δ2n ). For any term in the right-hand side of this inequality, if ψ ⇒ δi , propositional reasoning by Dist allows us to replace the term by P(δi ), and if not, by 0. Thus we always have that P(ψ) = b1 P(δ1 ) + · · · + b2n P(δ2n ) for some coefficients bi . Applying this process to each P-term in ϕ and using Ineq to rewrite the left-hand sides of the inequalities, and conjoining the (clearly provable) clauses that P(δi ) ≥ 0 for all 1 ≤ i ≤ 2n , and P(δ1 ) + · · · + P(δ2n ) = 1, we obtain (2). t u Causal Modeling with Probabilistic Simulation Models The conjunction (2) can be seen as a system of simultaneous inequalities over 2n unknowns, P(δ1 ), . . . , P(δ2n ). Ineq is actually sound and complete for such systems (we refer the reader to Section 4 of [2] for the proof of this fact). So if ϕ is consistent with AX—which includes Ineq—this system must have a solution. Thus there are values P(δi ) solving (2). We will now construct a probabilistic simulation model having precisely these probabilities of satisfying each δi . Note that for any δi with |=non-prob ⊥ ↔ δi it is provable that P(δi ) = 0, and we may conjoin this to (2). Note also that δi ∧ δj is unsatisfiable for any i 6= j. Given these two observations, the following Lemma implies the result. Lemma 5. For any collection of satisfiable Lnon-prob -formulas ϕ1 , . . . , ϕn no two of which are jointly satisfiable, and any rational probabilities p1 , . . . , pn ≥ 0 such that p1 + · · · + pn = 1, there is a probabilistic simulation model T such that PT (ϕi ) = pi for all i, 1 ≤ i ≤ n. Proof. Since the ϕi are satisfiable, there are non-probabilistic simulation models Tnon-prob,1 , . . . Tnon-prob,n such that for all i = 1, . . . , n, we have Tnon-prob,i |=non-prob ϕi . Further, we may suppose the machines so constructed use only a bounded number of memory tape squares.8 Thus let the maximum index of a tape square used by any of the Tnon-prob,i be N . We now describe T informally. Suppose without loss of generality that for all i, pi = ai /b for some common denominator b. Let T draw a random number r from 1 up to b uniformly, and ensure that T does any auxiliary computations it might need only on squares with indices at least N + 1. Check whether r ≤ a1 , and if so, let T branch into the code of Tnon-prob,1 . If not, check if a1 + 1 ≤ r ≤ a1 + a2 and if so, branch into Tnon-prob,2 . Repeat the process for p3 , . . . , pn . It’s clear that the probability of branching into each Tnon-prob,i block is exactly pi , and the same is true under any rele- vant (i.e., involving only memory tape variables that appear in one of the ϕi ) intervention on T: we may suppose any auxiliary computations T might require use only memory tape squares with indices past N . After branching into the ith block, the behavior of T is exactly the same as that of Tnon-prob,i , meaning that any random bit tape fixings that end up causing a branch into this block will be- long to S(ϕi ). Another random bit tape fixing that causes a branch into another block, say the jth, cannot belong to S(ϕi ) since ϕi , ϕj are jointly unsatisfiable. Thus, PT (φi ) = pi for all i. t u Finally, we must see that this model lies in M↓ if the original formula is con- sistent with AX↓ . [6] has shown that |=↓non-prob [α]β → hαiβ. Then in the proof of Lemma 5, we may suppose that each Tnon-prob,i block contains only always- halting code,9 and hence that T does not contain any loops either: thus it almost- surely halts. t u 8 Why? Since ϕi are satisfiable, they are consistent with the axiomatization for non- probabilistic simulation models given by [6], and hence are satisfied by the canonical models given in [6]. These models use only boundedly many tape squares. 9 Since the canonical programs of [6] for M↓non-prob contain only such code. D. Ibeling 4 Computational Complexity Call the problem of deciding if a formula ϕ ∈ L is satisfiable Prob-Sim-Sat(ϕ). Theorem 2 shows that solving this problem is no more complex than is proposi- tional satisfiability. Theorem 2. Prob-Sim-Sat(ϕ) is NP-complete in |ϕ| (where this length is computed standardly). Proof. It’s NP-hard since, given any propositional π, the formula P(hiπ) > 0 is satisfiable iff π is satisfiable (consider a machine that does nothing but write a satisfying memory tape assignment out). In order to show that the satisfiability problem is in NP, we give the following nondeterministic satisfiability algorithm: guess a program from a class of programs (that we will define shortly) that includes the program constructed in Lemma 5 —call this canonical program Tϕ —and check (in polynomial time) if it satisfies ϕ. This algorithm decides sat- isfiability since, by soundness, a satisfiable formula must be consistent, and hence has a canonical model of the form constructed in Lemma 5. For the remainder of the proof, by the “length of a number,” we just mean the length of its computer (binary) representation. The “length of a rational” is the sum of the lengths of its numerator and its denominator. What is the class of probabilistic simulation models that we may limit our guesses to? For some fixed constants C, D ∈ N, we will define a class Mϕ,C,D . We will then show that there exist C, D such that the canonical program of Lemma 5 belongs to Mϕ,C,D for all consistent ϕ. Let Mϕ,C,D be the fragment of probabilistic simulation models whose code consists of the following: 1. Code to draw a random number uniformly between 1 and some N , such that N has length at most D|ϕ|3 . 2. At most n = C|ϕ| branches, that is, copies of: an if-statement with condition ` ≤ r ≤ u, whose body is a canonical program Pψi for some ψi ∈ Lnon-prob , of the same form as the non-probabilistic canonical models (i.e., in the class defined in the proof of Theorem 2 from [6]). Letting `i , ui be the bounds for the ith copy in (2), we also require that `1 = 1, and that `i+1 = ui + 1 for all i, and that un = N . The following fact from linear algebra (we refer the reader to [2] for the proof) helps us to show that for all consistent ϕ, the canonical program Tϕ belongs to Mϕ,C,D for some C, D. Lemma 6. A system of m linear inequalities with integer coefficients of length at most ` that has a nonnegative solution has a nonnegative solution with at most m variables nonzero, and where the variables have length at most O(m` + m log m). t u Apply this lemma to (2). Each inequality in (2) originally came from ϕ, so there are O(|ϕ|) of them. Further, recall that each integer coefficient in (2) came from summing up a subset of 2n coefficients originally from ϕ, with n is the number of atoms appearing anywhere inside P expressions in ϕ. As this n is thus O(|ϕ|)— and hence 2n is O(|ϕ|) in length—and each original coefficient is also O(|ϕ|) Causal Modeling with Probabilistic Simulation Models in length, each coefficient is O(|ϕ|) in length as well (lengths of products add). Thus Lemma 6 shows that without loss of generality, we may suppose that the solutions for the P(δi ) of (2) have O(|ϕ|2 ) length. The common denominator of these O(|ϕ|) rationals hence has O(|ϕ|3 ) length. The construction of Lemma 5 has one branch for each of them, and hence O(|ϕ|) branches. This shows the existence of D for part (1) of the definition of Mϕ,C,D and the existence of a C for part (2). We will abbreviate Mϕ = Mϕ,C,D for some choice of C, D thus guaranteed. It remains to show that given any program T ∈ Mϕ , we can check if T |= ϕ in polynomial time. It suffices to show that checking if T |= ψ for ψ ∈ Lineq is polynomial time: if we know whether T |= ψ for every ψ that ϕ is built out of, we can decide in linear time if T |= ϕ. Thus suppose ψ has the form a1 P(ϕ1 ) + · · · + an P(ϕn ) ≤ c. [6] shows that one may check if the Pψi in part (2) of the definition of Mϕ satisfy any formula of the basic language Lnon-prob in polynomial time. Then we can easily compute P(ϕi ) as simply the sum of the probabilities of each branch that satisfies ϕi . Doing the arithmetic to check if ψ is satisfied is then certainly polynomial time, so we have our result. t u 5 Conclusion and Future Work We have defined and obtained foundational results concerning a very natural extension of counterfactual intervention on simulation models to the probabilistic case. One critical operation in probability is conditioning, or updating probabili- ties given that some event is known to have occurred (in the subjective inter- pretation, updating a belief for known information). One may already define conditional probabilities in the usual way in the current framework, and our framework (without interventions) covers the conditional simulation approach to certain aspects of common-sense reasoning of [3]. In this approach, one limits oneself to the runs satisfying a certain query; the framework considered here would be equivalent for any queries expressible as formulas of Lnon-prob . [2] also give a logic for reasoning about conditional probabilities. Future work would involve extending this system to probabilistic simulation models and studying the complexity of reasoning in that setting. As [8, 6] note, the simulation model approach invalidates many important logical principles that are valid in other approaches [5, 11, 9], such as cautious monotonicity: [A](B ∧ C) → [A ∧ B]C. However the approach is otherwise quite general, and an important future direction would be to identify and characterize subclasses of of simulation models that validate this and other similar logical principles. We have begun investigating this extension. An interesting conse- quence it has is on the comparison of conditional probability with the probabil- ities of subjunctive conditionals: while these two probabilities are not in general equal in the classes M or M↓ , they are equal in certain restricted classes. A final direction we want to mention concerns “open-world” reasoning includ- ing first-order reasoning about models with some domain, where counterfactual D. Ibeling antecedents might alter how many individuals are being considered or which in- dividuals fall under a property or bear certain relations to each other. Recursion and the tools of logic programming [4, 10] make this very natural for the simula- tion model approach, and we would like to understand the first- and higher-order conditional logics that result in this approach, in both the non-probabilistic and probabilistic cases. We have also begun exploring this direction. References 1. Chater, N., Oaksford, M.: Programs as causal models: Speculations on mental programs and mental representation. Cognitive Science 37(6), 1171–1191 (2013) 2. Fagin, R., Halpern, J.Y., Megiddo, N.: A logic for reasoning about probabilities. Information and Computation 87, 78–128 (1990) 3. Freer, C.E., Roy, D.M., Tenenbaum, J.B.: Towards common-sense reasoning via conditional simulation: legacies of turing in artificial intelligence. In: Downey, R. (ed.) Turing’s Legacy: Developments from Turing’s Ideas in Logic, Lecture Notes in Logic, vol. 42, pp. 195–252. Cambridge University Press (2014) 4. Goodman, N.D., Tenenbaum, J.B., Gerstenberg, T.: Concepts in a probabilistic language of thought. In: Margolis, E., Laurence, S. (eds.) The Conceptual Mind: New Directions in the Study of Concepts. MIT Press (2015) 5. Halpern, J.Y.: Axiomatizing causal reasoning. Journal of AI Research 12, 317–337 (2000) 6. Ibeling, D., Icard, T.: On the conditional logic of simulation models. Proc. 27th IJCAI (2018) 7. Icard, T.: Beyond almost-sure termination. Proc. 39th CogSci (2017) 8. Icard, T.F.: From programs to causal models. In: Cremers, A., van Gessel, T., Roelofsen, F. (eds.) Proceedings of the 21st Amsterdam Colloquium. pp. 35–44 (2017) 9. Lewis, D.: Counterfactuals. Harvard University Press (1973) 10. Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D.L., Kolobov, A.: BLOG: Probabilistic models with unknown objects. In: Proc. 19th IJCAI. pp. 1352–1359 (2005) 11. Pearl, J.: Causality. CUP (2009)